Getting better results from valgrind leak tracking

Started by Tom Lanealmost 5 years ago23 messages
#1Tom Lane
tgl@sss.pgh.pa.us
1 attachment(s)

[ starting a new thread for this ]

Andres Freund <andres@anarazel.de> writes:

I wonder if it'd be worth starting to explicitly annotate all the places
that do allocations and are fine with leaking them. E.g. by introducing
malloc_permanently() or such. Right now it's hard to use valgrind et al
to detect leaks because of all the false positives due to such "ok to
leak" allocations.

Out of curiosity I poked at this for a little while. It doesn't appear
to me that we leak much at all, at least not if you are willing to take
"still reachable" blocks as not-leaked. Most of the problem is
more subtle than that.

I found just a couple of things that really seem like leaks of permanent
data structures to valgrind:

* Where ps_status.c copies the original "environ" array (on
PS_USE_CLOBBER_ARGV platforms), valgrind thinks that's all leaked,
implying that it doesn't count the "environ" global as a valid
reference to leakable data. I was able to shut that up by also saving
the pointer into an otherwise-unused static variable. (This is sort of
a poor man's implementation of your "malloc_permanently" idea; but I
doubt it's worth working harder, given the small number of use-cases.)

* The postmaster's sock_paths and lock_files lists appear to be leaked,
but we're doing that to ourselves by throwing away the pointers to them
without physically freeing the lists. We can just not do that.

What I found out is that we have a lot of issues that seem to devolve
to valgrind not being sure that a block is referenced. I identified
two main causes of that:

(1) We have a pointer, but it doesn't actually point right at the start
of the block. A primary culprit here is lists of thingies that use the
slist and dlist infrastructure. As an experiment, I moved the dlist_node
fields of some popular structs to the beginning, and verified that that
silences associated complaints. I'm not sure that we want to insist on
put-the-link-first as policy (although if we did, it could provide some
notational savings perhaps). However, unless someone knows of a way to
teach valgrind about this situation, there may be no other way to silence
those leakage complaints. A secondary culprit is people randomly applying
CACHELINEALIGN or the like to a palloc'd address, so that the address we
have isn't pointing right at the block start.

(2) The only pointer to the start of a block is actually somewhere within
the block. This is common in dynahash tables, where we allocate a slab
of entries in a single palloc and then thread them together. Each entry
should have exactly one referencing pointer, but that pointer is more
likely to be elsewhere within the same palloc block than in the external
hash bucket array. AFAICT, all cases except where the slab's first entry
is pointed to by a hash bucket pointer confuse valgrind to some extent.
I was able to hack around this by preventing dynahash from allocating
more than one hash entry per palloc, but I wonder if there's a better way.

Attached is a very crude hack, not meant for commit, that hacks things
up enough to greatly reduce the number of complaints with
"--leak-check=full".

One thing I've failed to silence so far is a bunch of entries like

==00:00:03:56.088 3467702== 1,861 bytes in 67 blocks are definitely lost in loss record 1,290 of 1,418
==00:00:03:56.088 3467702== at 0x950650: MemoryContextAlloc (mcxt.c:827)
==00:00:03:56.088 3467702== by 0x951710: MemoryContextStrdup (mcxt.c:1179)
==00:00:03:56.088 3467702== by 0x91C86E: RelationInitIndexAccessInfo (relcache.c:1444)
==00:00:03:56.088 3467702== by 0x91DA9C: RelationBuildDesc (relcache.c:1200)

which is complaining about the memory context identifiers for system
indexes' rd_indexcxt contexts. Those are surely not being leaked in
any real sense. I suspect that this has something to do with valgrind
not counting the context->ident fields as live pointers, but I don't
have enough valgrind-fu to fix that.

Anyway, the bottom line is that I do not think that we have all that
many uses of the pattern you postulated originally. It's more that
we've designed some valgrind-unfriendly data structures. We need to
improve that situation to make much progress here.

regards, tom lane

Attachments:

valgrind-leak-hacks.patchtext/x-diff; charset=us-ascii; name=valgrind-leak-hacks.patchDownload
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 4c7b1e7bfd..cd984929a6 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -888,7 +888,6 @@ RemoveSocketFiles(void)
 		(void) unlink(sock_path);
 	}
 	/* Since we're about to exit, no need to reclaim storage */
-	sock_paths = NIL;
 }
 
 
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 55c9445898..2abdd44190 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -814,7 +814,7 @@ InitCatCache(int id,
 	 * Note: we rely on zeroing to initialize all the dlist headers correctly
 	 */
 	sz = sizeof(CatCache) + PG_CACHE_LINE_SIZE;
-	cp = (CatCache *) CACHELINEALIGN(palloc0(sz));
+	cp = (CatCache *) palloc0(sz);
 	cp->cc_bucket = palloc0(nbuckets * sizeof(dlist_head));
 
 	/*
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 6546e3c7c7..df39bc77df 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -1713,6 +1713,10 @@ element_alloc(HTAB *hashp, int nelem, int freelist_idx)
 	if (hashp->isfixed)
 		return false;
 
+	/* Force separate allocations to de-confuse valgrind */
+	if (!hashp->isshared)
+		nelem = 1;
+
 	/* Each element has a HASHELEMENT header plus user data. */
 	elementSize = MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(hctl->entrysize);
 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 8b73850d0d..be37e8b312 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -944,7 +944,6 @@ UnlinkLockFiles(int status, Datum arg)
 		/* Should we complain if the unlink fails? */
 	}
 	/* Since we're about to exit, no need to reclaim storage */
-	lock_files = NIL;
 
 	/*
 	 * Lock file removal should always be the last externally visible action
diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c
index 5819faaf2d..2d0ef37f7f 100644
--- a/src/backend/utils/misc/ps_status.c
+++ b/src/backend/utils/misc/ps_status.c
@@ -113,6 +113,11 @@ static size_t ps_buffer_fixed_size; /* size of the constant prefix */
 static int	save_argc;
 static char **save_argv;
 
+/* We use this to convince Valgrind that replacement environ is referenced */
+#ifdef PS_USE_CLOBBER_ARGV
+static char ** volatile fake_environ;
+#endif
+
 
 /*
  * Call this early in startup to save the original argc/argv values.
@@ -192,6 +197,8 @@ save_ps_display_args(int argc, char **argv)
 		}
 		new_environ[i] = NULL;
 		environ = new_environ;
+		/* Valgrind tends to think this memory is leaked, so fool it */
+		fake_environ = new_environ;
 	}
 #endif							/* PS_USE_CLOBBER_ARGV */
 
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index fc7706314b..f32a13f786 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -32,6 +32,7 @@
  */
 typedef struct RegisteredBgWorker
 {
+	slist_node	rw_lnode;		/* list link (first to placate valgrind) */
 	BackgroundWorker rw_worker; /* its registry entry */
 	struct bkend *rw_backend;	/* its BackendList entry, or NULL */
 	pid_t		rw_pid;			/* 0 if not running */
@@ -39,7 +40,6 @@ typedef struct RegisteredBgWorker
 	TimestampTz rw_crashed_at;	/* if not 0, time it last crashed */
 	int			rw_shmem_slot;
 	bool		rw_terminate;
-	slist_node	rw_lnode;		/* list link */
 } RegisteredBgWorker;
 
 extern slist_head BackgroundWorkerList;
diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h
index ddc2762eb3..4f2e631562 100644
--- a/src/include/utils/catcache.h
+++ b/src/include/utils/catcache.h
@@ -85,6 +85,13 @@ typedef struct catcache
 
 typedef struct catctup
 {
+	/*
+	 * Each tuple in a cache is a member of a dlist that stores the elements
+	 * of its hash bucket.  We keep each dlist in LRU order to speed repeated
+	 * lookups.
+	 */
+	dlist_node	cache_elem;		/* list member of per-bucket list */
+
 	int			ct_magic;		/* for identifying CatCTup entries */
 #define CT_MAGIC   0x57261502
 
@@ -96,13 +103,6 @@ typedef struct catctup
 	 */
 	Datum		keys[CATCACHE_MAXKEYS];
 
-	/*
-	 * Each tuple in a cache is a member of a dlist that stores the elements
-	 * of its hash bucket.  We keep each dlist in LRU order to speed repeated
-	 * lookups.
-	 */
-	dlist_node	cache_elem;		/* list member of per-bucket list */
-
 	/*
 	 * A tuple marked "dead" must not be returned by subsequent searches.
 	 * However, it won't be physically deleted from the cache until its
@@ -156,13 +156,13 @@ typedef struct catctup
  */
 typedef struct catclist
 {
+	dlist_node	cache_elem;		/* list member of per-catcache list */
+
 	int			cl_magic;		/* for identifying CatCList entries */
 #define CL_MAGIC   0x52765103
 
 	uint32		hash_value;		/* hash value for lookup keys */
 
-	dlist_node	cache_elem;		/* list member of per-catcache list */
-
 	/*
 	 * Lookup keys for the entry, with the first nkeys elements being valid.
 	 * All by-reference are separately allocated.
#2Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#1)
Re: Getting better results from valgrind leak tracking

Hi,

David, there's a question about a commit of yours below, hence adding
you.

On 2021-03-16 19:36:10 -0400, Tom Lane wrote:

Out of curiosity I poked at this for a little while.

Cool.

It doesn't appear to me that we leak much at all, at least not if you
are willing to take "still reachable" blocks as not-leaked.

Well, I think for any sort of automated testing - which I think would be
useful - we'd really need *no* leaks. I know that I get a few bleats
whenever I forget to set --leak-check=no. It's also not just postgres
itself, but some of the helper tools...

And it's not just valgrind, also gcc/clang sanitizers...

What I found out is that we have a lot of issues that seem to devolve
to valgrind not being sure that a block is referenced. I identified
two main causes of that:

(1) We have a pointer, but it doesn't actually point right at the start
of the block. A primary culprit here is lists of thingies that use the
slist and dlist infrastructure. As an experiment, I moved the dlist_node
fields of some popular structs to the beginning, and verified that that
silences associated complaints. I'm not sure that we want to insist on
put-the-link-first as policy (although if we did, it could provide some
notational savings perhaps). However, unless someone knows of a way to
teach valgrind about this situation, there may be no other way to silence
those leakage complaints. A secondary culprit is people randomly applying
CACHELINEALIGN or the like to a palloc'd address, so that the address we
have isn't pointing right at the block start.

Hm, do you still have a backtrace / suppression for one of those? I
didn't see any in a quick (*) serial installcheck I just ran. Or I
wasn't able to pinpoint them to this issue.

I think the run might have shown a genuine leak:

==2048803== 16 bytes in 1 blocks are definitely lost in loss record 139 of 906
==2048803== at 0x89D2EA: palloc (mcxt.c:975)
==2048803== by 0x2392D3: heap_beginscan (heapam.c:1198)
==2048803== by 0x264E8F: table_beginscan_strat (tableam.h:918)
==2048803== by 0x265994: systable_beginscan (genam.c:453)
==2048803== by 0x83C2D1: SearchCatCacheMiss (catcache.c:1359)
==2048803== by 0x83C197: SearchCatCacheInternal (catcache.c:1299)
==2048803== by 0x83BE9A: SearchCatCache1 (catcache.c:1167)
==2048803== by 0x85876A: SearchSysCache1 (syscache.c:1134)
==2048803== by 0x84CDB3: RelationInitTableAccessMethod (relcache.c:1795)
==2048803== by 0x84F807: RelationBuildLocalRelation (relcache.c:3554)
==2048803== by 0x303C9D: heap_create (heap.c:395)
==2048803== by 0x305790: heap_create_with_catalog (heap.c:1291)
==2048803== by 0x41A327: DefineRelation (tablecmds.c:885)
==2048803== by 0x6C96B6: ProcessUtilitySlow (utility.c:1131)
==2048803== by 0x6C948A: standard_ProcessUtility (utility.c:1034)
==2048803== by 0x6C865F: ProcessUtility (utility.c:525)
==2048803== by 0x6C7409: PortalRunUtility (pquery.c:1159)
==2048803== by 0x6C7636: PortalRunMulti (pquery.c:1305)
==2048803== by 0x6C6B11: PortalRun (pquery.c:779)
==2048803== by 0x6C05AB: exec_simple_query (postgres.c:1173)
==2048803==
{
<insert_a_suppression_name_here>
Memcheck:Leak
match-leak-kinds: definite
fun:palloc
fun:heap_beginscan
fun:table_beginscan_strat
fun:systable_beginscan
fun:SearchCatCacheMiss
fun:SearchCatCacheInternal
fun:SearchCatCache1
fun:SearchSysCache1
fun:RelationInitTableAccessMethod
fun:RelationBuildLocalRelation
fun:heap_create
fun:heap_create_with_catalog
fun:DefineRelation
fun:ProcessUtilitySlow
fun:standard_ProcessUtility
fun:ProcessUtility
fun:PortalRunUtility
fun:PortalRunMulti
fun:PortalRun
fun:exec_simple_query
}

Since 56788d2156fc heap_beginscan() allocates
scan->rs_base.rs_private =
palloc(sizeof(ParallelBlockTableScanWorkerData));
in heap_beginscan(). But doesn't free it in heap_endscan().

In most of the places heap scans are begun inside transient contexts,
but not always. In the above trace for example
RelationBuildLocalRelation switched to CacheMemoryContext, and nothing
switched to something else.

I'm a bit confused about the precise design of rs_private /
ParallelBlockTableScanWorkerData, specifically why it's been added to
TableScanDesc, instead of just adding it to HeapScanDesc? And why is it
allocated unconditionally, instead of just for parallel scans?

I don't think this is a false positive, even though it theoretically
could be freed by resetting CacheMemoryContext (see below)?

I saw a lot of false positives from autovacuum workers, because
AutovacMemCxt is never deleted, and because table_toast_map is created
in TopMemoryContext. Adding an explicit
MemoryContextDelete(AutovacMemCxt) and parenting table_toast_map in that
shut that up.

Which brings me to the question why allocations in CacheMemoryContext,
AutovacMemCxt are considered to be "lost", even though they're still
"reachable" via a context reset. I think valgrind ends up treating
memory allocated via memory contexts that still exist at process end as
lost, regardless of being reachable via the the memory pool (from
valgrinds view). Which I guess actually makes sense, for things like
TopMemoryContext and CacheContext - anything not reachable by means
other than a context reset is effectively lost for those.

For autovac launcher / worker it seems like a sensible thing to just
delete AutovacMemCxt.

(2) The only pointer to the start of a block is actually somewhere within
the block. This is common in dynahash tables, where we allocate a slab
of entries in a single palloc and then thread them together. Each entry
should have exactly one referencing pointer, but that pointer is more
likely to be elsewhere within the same palloc block than in the external
hash bucket array. AFAICT, all cases except where the slab's first entry
is pointed to by a hash bucket pointer confuse valgrind to some extent.
I was able to hack around this by preventing dynahash from allocating
more than one hash entry per palloc, but I wonder if there's a better way.

Hm. For me the number of leaks seem to stay the same with or without
your changes related to this. Is this a USE_VALGRIND build?

I'm using valgrind-3.16.1

Attached is a very crude hack, not meant for commit, that hacks things
up enough to greatly reduce the number of complaints with
"--leak-check=full".

One thing I've failed to silence so far is a bunch of entries like

==00:00:03:56.088 3467702== 1,861 bytes in 67 blocks are definitely lost in loss record 1,290 of 1,418
==00:00:03:56.088 3467702== at 0x950650: MemoryContextAlloc (mcxt.c:827)
==00:00:03:56.088 3467702== by 0x951710: MemoryContextStrdup (mcxt.c:1179)
==00:00:03:56.088 3467702== by 0x91C86E: RelationInitIndexAccessInfo (relcache.c:1444)
==00:00:03:56.088 3467702== by 0x91DA9C: RelationBuildDesc (relcache.c:1200)

which is complaining about the memory context identifiers for system
indexes' rd_indexcxt contexts. Those are surely not being leaked in
any real sense. I suspect that this has something to do with valgrind
not counting the context->ident fields as live pointers, but I don't
have enough valgrind-fu to fix that.

Yea. I suspect it's related to the fact that we mark the memory as a
valgrind mempool, I'll try to investigate.

I do see a bunch of leaks bleats below fun:plpgsql_compile that I don't
yet understand. E.g.

==2054558== 32 bytes in 1 blocks are definitely lost in loss record 284 of 913
==2054558== at 0x89D389: palloc (mcxt.c:975)
==2054558== by 0x518732: new_list (list.c:134)
==2054558== by 0x518C0C: lappend (list.c:341)
==2054558== by 0x83CAE7: SearchCatCacheList (catcache.c:1691)
==2054558== by 0x859A9C: SearchSysCacheList (syscache.c:1447)
==2054558== by 0x313192: FuncnameGetCandidates (namespace.c:975)
==2054558== by 0x313D91: FunctionIsVisible (namespace.c:1450)
==2054558== by 0x7C2891: format_procedure_extended (regproc.c:375)
==2054558== by 0x7C27C3: format_procedure (regproc.c:324)
==2054558== by 0xA7693E1: do_compile (pl_comp.c:348)
==2054558== by 0xA769130: plpgsql_compile (pl_comp.c:224)

and

==2054558== 30 bytes in 4 blocks are definitely lost in loss record 225 of 913
==2054558== at 0x89D389: palloc (mcxt.c:975)
==2054558== by 0x3ADDAE: downcase_identifier (scansup.c:52)
==2054558== by 0x3ADD85: downcase_truncate_identifier (scansup.c:39)
==2054558== by 0x3AB5E4: core_yylex (scan.l:1032)
==2054558== by 0xA789B2D: internal_yylex (pl_scanner.c:321)
==2054558== by 0xA7896E3: plpgsql_yylex (pl_scanner.c:152)
==2054558== by 0xA780015: plpgsql_yyparse (pl_gram.c:1945)
==2054558== by 0xA76A652: do_compile (pl_comp.c:788)
==2054558== by 0xA769130: plpgsql_compile (pl_comp.c:224)
==2054558== by 0xA78948F: plpgsql_validator (pl_handler.c:539)

Based on the quick look I had (dinner is calling) I didn't yet
understand how plpgsql_compile_tmp_cxt error handling works.

Greetings,

Andres Freund

(*) or not so quick, I had to figure out why valgrind was so slow. It
turned out that I had typed shared_buffers=32MB into
shared_buffers=32GB...

#3Zhihong Yu
zyu@yugabyte.com
In reply to: Andres Freund (#2)
Re: Getting better results from valgrind leak tracking

Hi,
For the second last trace involving SearchCatCacheList (catcache.c:1691),
the ctlist's members are stored in cl->members array where cl is returned
at the end of SearchCatCacheList.

Maybe this was not accounted for by valgrind ?

Cheers

On Tue, Mar 16, 2021 at 7:31 PM Andres Freund <andres@anarazel.de> wrote:

Show quoted text

Hi,

David, there's a question about a commit of yours below, hence adding
you.

On 2021-03-16 19:36:10 -0400, Tom Lane wrote:

Out of curiosity I poked at this for a little while.

Cool.

It doesn't appear to me that we leak much at all, at least not if you
are willing to take "still reachable" blocks as not-leaked.

Well, I think for any sort of automated testing - which I think would be
useful - we'd really need *no* leaks. I know that I get a few bleats
whenever I forget to set --leak-check=no. It's also not just postgres
itself, but some of the helper tools...

And it's not just valgrind, also gcc/clang sanitizers...

What I found out is that we have a lot of issues that seem to devolve
to valgrind not being sure that a block is referenced. I identified
two main causes of that:

(1) We have a pointer, but it doesn't actually point right at the start
of the block. A primary culprit here is lists of thingies that use the
slist and dlist infrastructure. As an experiment, I moved the dlist_node
fields of some popular structs to the beginning, and verified that that
silences associated complaints. I'm not sure that we want to insist on
put-the-link-first as policy (although if we did, it could provide some
notational savings perhaps). However, unless someone knows of a way to
teach valgrind about this situation, there may be no other way to silence
those leakage complaints. A secondary culprit is people randomly

applying

CACHELINEALIGN or the like to a palloc'd address, so that the address we
have isn't pointing right at the block start.

Hm, do you still have a backtrace / suppression for one of those? I
didn't see any in a quick (*) serial installcheck I just ran. Or I
wasn't able to pinpoint them to this issue.

I think the run might have shown a genuine leak:

==2048803== 16 bytes in 1 blocks are definitely lost in loss record 139 of
906
==2048803== at 0x89D2EA: palloc (mcxt.c:975)
==2048803== by 0x2392D3: heap_beginscan (heapam.c:1198)
==2048803== by 0x264E8F: table_beginscan_strat (tableam.h:918)
==2048803== by 0x265994: systable_beginscan (genam.c:453)
==2048803== by 0x83C2D1: SearchCatCacheMiss (catcache.c:1359)
==2048803== by 0x83C197: SearchCatCacheInternal (catcache.c:1299)
==2048803== by 0x83BE9A: SearchCatCache1 (catcache.c:1167)
==2048803== by 0x85876A: SearchSysCache1 (syscache.c:1134)
==2048803== by 0x84CDB3: RelationInitTableAccessMethod (relcache.c:1795)
==2048803== by 0x84F807: RelationBuildLocalRelation (relcache.c:3554)
==2048803== by 0x303C9D: heap_create (heap.c:395)
==2048803== by 0x305790: heap_create_with_catalog (heap.c:1291)
==2048803== by 0x41A327: DefineRelation (tablecmds.c:885)
==2048803== by 0x6C96B6: ProcessUtilitySlow (utility.c:1131)
==2048803== by 0x6C948A: standard_ProcessUtility (utility.c:1034)
==2048803== by 0x6C865F: ProcessUtility (utility.c:525)
==2048803== by 0x6C7409: PortalRunUtility (pquery.c:1159)
==2048803== by 0x6C7636: PortalRunMulti (pquery.c:1305)
==2048803== by 0x6C6B11: PortalRun (pquery.c:779)
==2048803== by 0x6C05AB: exec_simple_query (postgres.c:1173)
==2048803==
{
<insert_a_suppression_name_here>
Memcheck:Leak
match-leak-kinds: definite
fun:palloc
fun:heap_beginscan
fun:table_beginscan_strat
fun:systable_beginscan
fun:SearchCatCacheMiss
fun:SearchCatCacheInternal
fun:SearchCatCache1
fun:SearchSysCache1
fun:RelationInitTableAccessMethod
fun:RelationBuildLocalRelation
fun:heap_create
fun:heap_create_with_catalog
fun:DefineRelation
fun:ProcessUtilitySlow
fun:standard_ProcessUtility
fun:ProcessUtility
fun:PortalRunUtility
fun:PortalRunMulti
fun:PortalRun
fun:exec_simple_query
}

Since 56788d2156fc heap_beginscan() allocates
scan->rs_base.rs_private =
palloc(sizeof(ParallelBlockTableScanWorkerData));
in heap_beginscan(). But doesn't free it in heap_endscan().

In most of the places heap scans are begun inside transient contexts,
but not always. In the above trace for example
RelationBuildLocalRelation switched to CacheMemoryContext, and nothing
switched to something else.

I'm a bit confused about the precise design of rs_private /
ParallelBlockTableScanWorkerData, specifically why it's been added to
TableScanDesc, instead of just adding it to HeapScanDesc? And why is it
allocated unconditionally, instead of just for parallel scans?

I don't think this is a false positive, even though it theoretically
could be freed by resetting CacheMemoryContext (see below)?

I saw a lot of false positives from autovacuum workers, because
AutovacMemCxt is never deleted, and because table_toast_map is created
in TopMemoryContext. Adding an explicit
MemoryContextDelete(AutovacMemCxt) and parenting table_toast_map in that
shut that up.

Which brings me to the question why allocations in CacheMemoryContext,
AutovacMemCxt are considered to be "lost", even though they're still
"reachable" via a context reset. I think valgrind ends up treating
memory allocated via memory contexts that still exist at process end as
lost, regardless of being reachable via the the memory pool (from
valgrinds view). Which I guess actually makes sense, for things like
TopMemoryContext and CacheContext - anything not reachable by means
other than a context reset is effectively lost for those.

For autovac launcher / worker it seems like a sensible thing to just
delete AutovacMemCxt.

(2) The only pointer to the start of a block is actually somewhere within
the block. This is common in dynahash tables, where we allocate a slab
of entries in a single palloc and then thread them together. Each entry
should have exactly one referencing pointer, but that pointer is more
likely to be elsewhere within the same palloc block than in the external
hash bucket array. AFAICT, all cases except where the slab's first entry
is pointed to by a hash bucket pointer confuse valgrind to some extent.
I was able to hack around this by preventing dynahash from allocating
more than one hash entry per palloc, but I wonder if there's a better

way.

Hm. For me the number of leaks seem to stay the same with or without
your changes related to this. Is this a USE_VALGRIND build?

I'm using valgrind-3.16.1

Attached is a very crude hack, not meant for commit, that hacks things
up enough to greatly reduce the number of complaints with
"--leak-check=full".

One thing I've failed to silence so far is a bunch of entries like

==00:00:03:56.088 3467702== 1,861 bytes in 67 blocks are definitely lost

in loss record 1,290 of 1,418

==00:00:03:56.088 3467702== at 0x950650: MemoryContextAlloc

(mcxt.c:827)

==00:00:03:56.088 3467702== by 0x951710: MemoryContextStrdup

(mcxt.c:1179)

==00:00:03:56.088 3467702== by 0x91C86E: RelationInitIndexAccessInfo

(relcache.c:1444)

==00:00:03:56.088 3467702== by 0x91DA9C: RelationBuildDesc

(relcache.c:1200)

which is complaining about the memory context identifiers for system
indexes' rd_indexcxt contexts. Those are surely not being leaked in
any real sense. I suspect that this has something to do with valgrind
not counting the context->ident fields as live pointers, but I don't
have enough valgrind-fu to fix that.

Yea. I suspect it's related to the fact that we mark the memory as a
valgrind mempool, I'll try to investigate.

I do see a bunch of leaks bleats below fun:plpgsql_compile that I don't
yet understand. E.g.

==2054558== 32 bytes in 1 blocks are definitely lost in loss record 284 of
913
==2054558== at 0x89D389: palloc (mcxt.c:975)
==2054558== by 0x518732: new_list (list.c:134)
==2054558== by 0x518C0C: lappend (list.c:341)
==2054558== by 0x83CAE7: SearchCatCacheList (catcache.c:1691)
==2054558== by 0x859A9C: SearchSysCacheList (syscache.c:1447)
==2054558== by 0x313192: FuncnameGetCandidates (namespace.c:975)
==2054558== by 0x313D91: FunctionIsVisible (namespace.c:1450)
==2054558== by 0x7C2891: format_procedure_extended (regproc.c:375)
==2054558== by 0x7C27C3: format_procedure (regproc.c:324)
==2054558== by 0xA7693E1: do_compile (pl_comp.c:348)
==2054558== by 0xA769130: plpgsql_compile (pl_comp.c:224)

and

==2054558== 30 bytes in 4 blocks are definitely lost in loss record 225 of
913
==2054558== at 0x89D389: palloc (mcxt.c:975)
==2054558== by 0x3ADDAE: downcase_identifier (scansup.c:52)
==2054558== by 0x3ADD85: downcase_truncate_identifier (scansup.c:39)
==2054558== by 0x3AB5E4: core_yylex (scan.l:1032)
==2054558== by 0xA789B2D: internal_yylex (pl_scanner.c:321)
==2054558== by 0xA7896E3: plpgsql_yylex (pl_scanner.c:152)
==2054558== by 0xA780015: plpgsql_yyparse (pl_gram.c:1945)
==2054558== by 0xA76A652: do_compile (pl_comp.c:788)
==2054558== by 0xA769130: plpgsql_compile (pl_comp.c:224)
==2054558== by 0xA78948F: plpgsql_validator (pl_handler.c:539)

Based on the quick look I had (dinner is calling) I didn't yet
understand how plpgsql_compile_tmp_cxt error handling works.

Greetings,

Andres Freund

(*) or not so quick, I had to figure out why valgrind was so slow. It
turned out that I had typed shared_buffers=32MB into
shared_buffers=32GB...

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#2)
Re: Getting better results from valgrind leak tracking

Andres Freund <andres@anarazel.de> writes:

On 2021-03-16 19:36:10 -0400, Tom Lane wrote:

It doesn't appear to me that we leak much at all, at least not if you
are willing to take "still reachable" blocks as not-leaked.

Well, I think for any sort of automated testing - which I think would be
useful - we'd really need *no* leaks.

That seems both unnecessary and impractical. We have to consider that
everything-still-reachable is an OK final state.

I think the run might have shown a genuine leak:

==2048803== 16 bytes in 1 blocks are definitely lost in loss record 139 of 906
==2048803== at 0x89D2EA: palloc (mcxt.c:975)
==2048803== by 0x2392D3: heap_beginscan (heapam.c:1198)
==2048803== by 0x264E8F: table_beginscan_strat (tableam.h:918)
==2048803== by 0x265994: systable_beginscan (genam.c:453)
==2048803== by 0x83C2D1: SearchCatCacheMiss (catcache.c:1359)
==2048803== by 0x83C197: SearchCatCacheInternal (catcache.c:1299)

I didn't see anything like that after applying the fixes I showed before.
There are a LOT of false positives from the fact that with our HEAD
code, valgrind believes that everything in the catalog caches and
most things in dynahash tables (including the relcache) are unreachable.

I'm not trying to claim there are no leaks anywhere, just that the amount
of noise from those issues swamps all the real problems. I particularly
recommend not believing anything related to catcache or relcache if you
haven't applied that admittedly-hacky patch.

Hm. For me the number of leaks seem to stay the same with or without
your changes related to this. Is this a USE_VALGRIND build?

Yeah ...

I do see a bunch of leaks bleats below fun:plpgsql_compile that I don't
yet understand. E.g.

Those are probably a variant of what you were suggesting above, ie
plpgsql isn't terribly careful not to leak random stuff while building
a long-lived function parse tree. It's supposed to use a temp context
for anything that might leak, but I suspect it's not thorough about it.
We could chase that sort of thing after we clean up the other problems.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#2)
2 attachment(s)
Re: Getting better results from valgrind leak tracking

Andres Freund <andres@anarazel.de> writes:

Hm. For me the number of leaks seem to stay the same with or without
your changes related to this. Is this a USE_VALGRIND build?

Not sure how you arrived at that answer. I attach two logs of individual
backends running with

--leak-check=full --track-origins=yes --read-var-info=yes --error-exitcode=0

The test scenario in both cases was just start up, run "select 2+2;",
quit. The first one is before I'd made any of the changes shown
before, the second is after.

regards, tom lane

Attachments:

initialrun.logtext/plain; charset=us-ascii; name=initialrun.logDownload
somefixes.logtext/plain; charset=us-ascii; name=somefixes.logDownload
#6Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#4)
Re: Getting better results from valgrind leak tracking

Hi,

On Tue, Mar 16, 2021, at 20:01, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2021-03-16 19:36:10 -0400, Tom Lane wrote:

It doesn't appear to me that we leak much at all, at least not if you
are willing to take "still reachable" blocks as not-leaked.

Well, I think for any sort of automated testing - which I think would be
useful - we'd really need *no* leaks.

That seems both unnecessary and impractical. We have to consider that
everything-still-reachable is an OK final state.

I don't consider "still reachable" a leak. Just definitely unreachable. And with a few tweaks that seems like we could achieve that?

I think the run might have shown a genuine leak:

==2048803== 16 bytes in 1 blocks are definitely lost in loss record 139 of 906
==2048803== at 0x89D2EA: palloc (mcxt.c:975)
==2048803== by 0x2392D3: heap_beginscan (heapam.c:1198)
==2048803== by 0x264E8F: table_beginscan_strat (tableam.h:918)
==2048803== by 0x265994: systable_beginscan (genam.c:453)
==2048803== by 0x83C2D1: SearchCatCacheMiss (catcache.c:1359)
==2048803== by 0x83C197: SearchCatCacheInternal (catcache.c:1299)

I didn't see anything like that after applying the fixes I showed before.
There are a LOT of false positives from the fact that with our HEAD
code, valgrind believes that everything in the catalog caches and
most things in dynahash tables (including the relcache) are unreachable.

I think it's actually unreachable memory (unless you count resetting the cache context), based on manually tracing the code... I'll try to repro.

I do see a bunch of leaks bleats below fun:plpgsql_compile that I don't
yet understand. E.g.

Those are probably a variant of what you were suggesting above, ie
plpgsql isn't terribly careful not to leak random stuff while building
a long-lived function parse tree. It's supposed to use a temp context
for anything that might leak, but I suspect it's not thorough about it.

What I meant was that I didn't understand how there's not a leak danger when compilation fails halfway through, given that the context in question is below TopMemoryContext and that I didn't see a relevant TRY block. But that probably there is something cleaning it up that I didn't see.

Andres

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: Getting better results from valgrind leak tracking

"Andres Freund" <andres@anarazel.de> writes:

On Tue, Mar 16, 2021, at 20:01, Tom Lane wrote:

That seems both unnecessary and impractical. We have to consider that
everything-still-reachable is an OK final state.

I don't consider "still reachable" a leak. Just definitely unreachable.

OK, we're in violent agreement then -- I must've misread what you wrote.

I think the run might have shown a genuine leak:

I didn't see anything like that after applying the fixes I showed before.

I think it's actually unreachable memory (unless you count resetting the cache context), based on manually tracing the code... I'll try to repro.

I agree that having to reset CacheContext is not something we
should count as "still reachable", and I'm pretty sure that the
existing valgrind infrastructure doesn't count it that way.

As for the particular point about ParallelBlockTableScanWorkerData,
I agree with your question to David about why that's in TableScanDesc
not HeapScanDesc, but I can't get excited about it not being freed in
heap_endscan. That's mainly because I do not believe that anything as
complex as a heap or indexscan should be counted on to be zero-leakage.
The right answer is to not do such operations in long-lived contexts.
So if we're running such a thing while switched into CacheContext,
*that* is the bug, not that heap_endscan didn't free this particular
allocation.

regards, tom lane

#8Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#6)
Re: Getting better results from valgrind leak tracking

Hi,

On 2021-03-16 20:50:17 -0700, Andres Freund wrote:

What I meant was that I didn't understand how there's not a leak
danger when compilation fails halfway through, given that the context
in question is below TopMemoryContext and that I didn't see a relevant
TRY block. But that probably there is something cleaning it up that I
didn't see.

Looks like it's an actual leak:

postgres[2058957][1]=# DO $do$BEGIN CREATE OR REPLACE FUNCTION foo() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN frakbar;END;$$;^C
postgres[2058957][1]=# SELECT count(*), SUM(total_bytes) FROM pg_backend_memory_contexts WHERE name = 'PL/pgSQL function';
┌───────┬────────┐
│ count │ sum │
├───────┼────────┤
│ 0 │ (null) │
└───────┴────────┘
(1 row)

Time: 1.666 ms
postgres[2058957][1]=# CREATE OR REPLACE FUNCTION foo() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN frakbar;END;$$;
ERROR: 42601: syntax error at or near "frakbar"
LINE 1: ...ON foo() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN frakbar;EN...
^
LOCATION: scanner_yyerror, scan.l:1176
Time: 5.463 ms
postgres[2058957][1]=# CREATE OR REPLACE FUNCTION foo() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN frakbar;END;$$;
ERROR: 42601: syntax error at or near "frakbar"
LINE 1: ...ON foo() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN frakbar;EN...
^
LOCATION: scanner_yyerror, scan.l:1176
Time: 1.223 ms
postgres[2058957][1]=# CREATE OR REPLACE FUNCTION foo() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN frakbar;END;$$;
ERROR: 42601: syntax error at or near "frakbar"
LINE 1: ...ON foo() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN frakbar;EN...
^
LOCATION: scanner_yyerror, scan.l:1176
Time: 1.194 ms
postgres[2058957][1]=# SELECT count(*), SUM(total_bytes) FROM pg_backend_memory_contexts WHERE name = 'PL/pgSQL function';
┌───────┬───────┐
│ count │ sum │
├───────┼───────┤
│ 3 │ 24576 │
└───────┴───────┘
(1 row)

Something like

DO $do$ BEGIN FOR i IN 1 .. 10000 LOOP BEGIN EXECUTE $cf$CREATE OR REPLACE FUNCTION foo() RETURNS VOID LANGUAGE plpgsql AS $f$BEGIN frakbar; END;$f$;$cf$; EXCEPTION WHEN others THEN END; END LOOP; END;$do$;

will show the leak visible in top too (albeit slowly - a more
complicated statement will leak more quickly I think).

postgres[2059268][1]=# SELECT count(*), SUM(total_bytes) FROM pg_backend_memory_contexts WHERE name = 'PL/pgSQL function';
┌───────┬──────────┐
│ count │ sum │
├───────┼──────────┤
│ 10000 │ 81920000 │
└───────┴──────────┘
(1 row)

The leak appears to be not new, I see it in 9.6 as well. This seems like
a surprisingly easy to trigger leak...

Looks like there's something else awry. The above DO statement takes
2.2s on an 13 assert build, but 32s on a master assert build. Spending a
lot of time doing dependency lookups:

-   94.62%     0.01%  postgres  postgres            [.] CreateFunction
   - 94.61% CreateFunction
      - 94.56% ProcedureCreate
         - 89.68% deleteDependencyRecordsFor
            - 89.38% systable_getnext
               - 89.33% index_getnext_slot
                  - 56.00% index_fetch_heap
                     + 54.64% table_index_fetch_tuple
                       0.09% heapam_index_fetch_tuple
                  + 28.53% index_getnext_tid
                    2.77% ItemPointerEquals
                    0.10% table_index_fetch_tuple
                    0.09% btgettuple
                 0.03% index_getnext_tid

1000 iterations: 521ms
1000 iterations: 533ms
2000 iterations: 1670ms
3000 iterations: 3457ms
3000 iterations: 3457ms
10000 iterations: 31794ms

The quadratic seeming nature made me wonder if someone broke killtuples
in this situation. And it seem that someone was me, in 623a9ba . We need
to bump xactCompletionCount in the subxid abort case as well...

Greetings,

Andres Freund

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#8)
Re: Getting better results from valgrind leak tracking

Andres Freund <andres@anarazel.de> writes:

On 2021-03-16 20:50:17 -0700, Andres Freund wrote:

What I meant was that I didn't understand how there's not a leak
danger when compilation fails halfway through, given that the context
in question is below TopMemoryContext and that I didn't see a relevant
TRY block. But that probably there is something cleaning it up that I
didn't see.

Looks like it's an actual leak:

Yeah, I believe that. On the other hand, I'm not sure that such cases
represent any real problem for production usage. I'm inclined to focus
on non-error scenarios first.

(Having said that, we probably have the ability to fix such things
relatively painlessly now, by reparenting an initially-temporary
context once we're done parsing.)

Meanwhile, I'm still trying to understand why valgrind is whining
about the rd_indexcxt identifier strings. AFAICS it shouldn't.

regards, tom lane

#10Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#9)
Re: Getting better results from valgrind leak tracking

Hi,

On Wed, Mar 17, 2021, at 07:16, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2021-03-16 20:50:17 -0700, Andres Freund wrote:

Meanwhile, I'm still trying to understand why valgrind is whining
about the rd_indexcxt identifier strings. AFAICS it shouldn't.

I found a way around that late last night. Need to mark the context itself as an allocation. But I made a mess on the way to that and need to clean the patch up before sending it (and need to drop my girlfriend off first).

Andres

#11Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#10)
1 attachment(s)
Re: Getting better results from valgrind leak tracking

Hi,

(really need to fix my mobile phone mail program to keep the CC list...)

On 2021-03-17 08:15:43 -0700, Andres Freund wrote:

On Wed, Mar 17, 2021, at 07:16, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2021-03-16 20:50:17 -0700, Andres Freund wrote:

Meanwhile, I'm still trying to understand why valgrind is whining
about the rd_indexcxt identifier strings. AFAICS it shouldn't.

I found a way around that late last night. Need to mark the context
itself as an allocation. But I made a mess on the way to that and need
to clean the patch up before sending it (and need to drop my
girlfriend off first).

Unfortunately I didn't immediately find a way to do this while keeping
the MEMPOOL_CREATE/DESTROY in mcxt.c. The attached patch moves the pool
creation into the memory context implementations, "allocates" the
context itself as part of that pool, and changes the reset
implementation from MEMPOOL_DESTROY + MEMPOOL_CREATE to instead do
MEMPOOL_TRIM. That leaves the memory context itself valid (and thus
tracking ->ident etc), but marks all the other memory as freed.

This is just a first version, it probably needs more work, and
definitely a few comments...

After this, your changes, and the previously mentioned fixes, I get far
fewer false positives. Also found a crash / memory leak in pgstat.c due
to the new replication slot stats, but I'll start a separate thread.

There are a few leak warnings around guc.c that look like they might be
real, not false positives, and thus a bit concerning. Looks like several
guc check hooks don't bother to free the old *extra before allocating a
new one.

I suspect we might get better results from valgrind, not just for leaks
but also undefined value tracking, if we changed the way we represent
pools to utilize VALGRIND_MEMPOOL_METAPOOL |
VALGRIND_MEMPOOL_AUTO_FREE. E.g. aset.c would associate AllocBlock using
VALGRIND_MEMPOOL_ALLOC and then mcxt.c would use
VALGRIND_MALLOCLIKE_BLOCK for the individual chunk allocation.

https://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.mempools

I played with naming the allocations underlying aset.c using
VALGRIND_CREATE_BLOCK(block, strlen(context->name), context->name).
That does produce better undefined-value warnings, but it seems that
e.g. the leak detector doen't have that information around. Nor does it
seem to be usable for use-afte-free. At least the latter likely because
I had to VALGRIND_DISCARD by that point...

Greetings,

Andres Freund

Attachments:

0001-Make-memory-contexts-themselves-more-visible-to-valg.patchtext/x-diff; charset=us-asciiDownload
From c7e69e4bccfc4da97b0b999399732e3dc806b67e Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 17 Mar 2021 10:46:39 -0700
Subject: [PATCH] Make memory contexts themselves more visible to valgrind.

---
 src/include/utils/memdebug.h        |  1 +
 src/backend/utils/mmgr/aset.c       | 22 ++++++++++++++++++++++
 src/backend/utils/mmgr/generation.c |  8 ++++++++
 src/backend/utils/mmgr/mcxt.c       |  5 -----
 src/backend/utils/mmgr/slab.c       |  8 ++++++++
 5 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/src/include/utils/memdebug.h b/src/include/utils/memdebug.h
index e88b4c6e8ef..5988bff8839 100644
--- a/src/include/utils/memdebug.h
+++ b/src/include/utils/memdebug.h
@@ -23,6 +23,7 @@
 #define VALGRIND_CHECK_MEM_IS_DEFINED(addr, size)			do {} while (0)
 #define VALGRIND_CREATE_MEMPOOL(context, redzones, zeroed)	do {} while (0)
 #define VALGRIND_DESTROY_MEMPOOL(context)					do {} while (0)
+#define VALGRIND_MEMPOOL_TRIM(context, ptr, size)			do {} while (0)
 #define VALGRIND_MAKE_MEM_DEFINED(addr, size)				do {} while (0)
 #define VALGRIND_MAKE_MEM_NOACCESS(addr, size)				do {} while (0)
 #define VALGRIND_MAKE_MEM_UNDEFINED(addr, size)				do {} while (0)
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index ec6c130d0fb..9793ddf4a3d 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -433,6 +433,12 @@ AllocSetContextCreateInternal(MemoryContext parent,
 		{
 			/* Remove entry from freelist */
 			set = freelist->first_free;
+
+			VALGRIND_CREATE_MEMPOOL(set, 0, false);
+			VALGRIND_MEMPOOL_ALLOC(set, set, sizeof(AllocSetContext));
+			/* the contents are still valid, but valgrind can't know that */
+			VALGRIND_MAKE_MEM_DEFINED(set, sizeof(AllocSetContext));
+
 			freelist->first_free = (AllocSet) set->header.nextchild;
 			freelist->num_free--;
 
@@ -477,6 +483,8 @@ AllocSetContextCreateInternal(MemoryContext parent,
 						   name)));
 	}
 
+	VALGRIND_CREATE_MEMPOOL(set, 0, false);
+	VALGRIND_MEMPOOL_ALLOC(set, set, sizeof(AllocSetContext));
 	/*
 	 * Avoid writing code that can fail between here and MemoryContextCreate;
 	 * we'd leak the header/initial block if we ereport in this stretch.
@@ -611,6 +619,8 @@ AllocSetReset(MemoryContext context)
 
 	Assert(context->mem_allocated == keepersize);
 
+	VALGRIND_MEMPOOL_TRIM(set, set, sizeof(AllocSetAlloc));
+
 	/* Reset block size allocation sequence, too */
 	set->nextBlockSize = set->initBlockSize;
 }
@@ -652,6 +662,16 @@ AllocSetDelete(MemoryContext context)
 		if (!context->isReset)
 			MemoryContextResetOnly(context);
 
+		VALGRIND_DESTROY_MEMPOOL(context);
+
+		/*
+		 * Still need to access the memory marked as invalid due to the
+		 * DESTROY. We leave the memory accessible, as otherwise valgrind will
+		 * complain about having leaked the memory underlying the cached
+		 * sets...
+		 */
+		VALGRIND_MAKE_MEM_DEFINED(set, sizeof(AllocSetContext));
+
 		/*
 		 * If the freelist is full, just discard what's already in it.  See
 		 * comments with context_freelists[].
@@ -699,6 +719,8 @@ AllocSetDelete(MemoryContext context)
 
 	Assert(context->mem_allocated == keepersize);
 
+	VALGRIND_DESTROY_MEMPOOL(context);
+
 	/* Finally, free the context header, including the keeper block */
 	free(set);
 }
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 2b900347645..55d1f6a6526 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -235,6 +235,9 @@ GenerationContextCreate(MemoryContext parent,
 						   name)));
 	}
 
+	VALGRIND_CREATE_MEMPOOL(set, 0, false);
+	VALGRIND_MEMPOOL_ALLOC(set, set, sizeof(GenerationContext));
+
 	/*
 	 * Avoid writing code that can fail between here and MemoryContextCreate;
 	 * we'd leak the header if we ereport in this stretch.
@@ -293,6 +296,8 @@ GenerationReset(MemoryContext context)
 	set->block = NULL;
 
 	Assert(dlist_is_empty(&set->blocks));
+
+	VALGRIND_MEMPOOL_TRIM(set, set, sizeof(GenerationContext));
 }
 
 /*
@@ -304,6 +309,9 @@ GenerationDelete(MemoryContext context)
 {
 	/* Reset to release all the GenerationBlocks */
 	GenerationReset(context);
+
+	VALGRIND_DESTROY_MEMPOOL(context);
+
 	/* And free the context header */
 	free(context);
 }
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 84472b9158e..27f969872bc 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -174,8 +174,6 @@ MemoryContextResetOnly(MemoryContext context)
 
 		context->methods->reset(context);
 		context->isReset = true;
-		VALGRIND_DESTROY_MEMPOOL(context);
-		VALGRIND_CREATE_MEMPOOL(context, 0, false);
 	}
 }
 
@@ -245,7 +243,6 @@ MemoryContextDelete(MemoryContext context)
 
 	context->methods->delete_context(context);
 
-	VALGRIND_DESTROY_MEMPOOL(context);
 }
 
 /*
@@ -782,8 +779,6 @@ MemoryContextCreate(MemoryContext node,
 		node->nextchild = NULL;
 		node->allowInCritSection = false;
 	}
-
-	VALGRIND_CREATE_MEMPOOL(node, 0, false);
 }
 
 /*
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 9213be7c956..c51ede1e906 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -237,6 +237,9 @@ SlabContextCreate(MemoryContext parent,
 						   name)));
 	}
 
+	VALGRIND_CREATE_MEMPOOL(slab, 0, false);
+	VALGRIND_MEMPOOL_ALLOC(slab, slab, sizeof(SlabContext));
+
 	/*
 	 * Avoid writing code that can fail between here and MemoryContextCreate;
 	 * we'd leak the header if we ereport in this stretch.
@@ -315,6 +318,8 @@ SlabReset(MemoryContext context)
 
 	Assert(slab->nblocks == 0);
 	Assert(context->mem_allocated == 0);
+
+	VALGRIND_MEMPOOL_TRIM(slab, slab, sizeof(SlabContext));
 }
 
 /*
@@ -326,6 +331,9 @@ SlabDelete(MemoryContext context)
 {
 	/* Reset to release all the SlabBlocks */
 	SlabReset(context);
+
+	VALGRIND_DESTROY_MEMPOOL(context);
+
 	/* And free the context header */
 	free(context);
 }
-- 
2.29.2.540.g3cf59784d4

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#11)
Re: Getting better results from valgrind leak tracking

Andres Freund <andres@anarazel.de> writes:

I found a way around that late last night. Need to mark the context
itself as an allocation. But I made a mess on the way to that and need
to clean the patch up before sending it (and need to drop my
girlfriend off first).

Unfortunately I didn't immediately find a way to do this while keeping
the MEMPOOL_CREATE/DESTROY in mcxt.c. The attached patch moves the pool
creation into the memory context implementations, "allocates" the
context itself as part of that pool, and changes the reset
implementation from MEMPOOL_DESTROY + MEMPOOL_CREATE to instead do
MEMPOOL_TRIM. That leaves the memory context itself valid (and thus
tracking ->ident etc), but marks all the other memory as freed.

Huh, interesting. I wonder why that makes the ident problem go away?
I'd supposed that valgrind would see the context headers as ordinary
memory belonging to the global "malloc" pool, so that any pointers
inside them ought to be considered valid.

Anyway, I don't have a problem with rearranging the responsibility
like this. It gives the individual allocators more freedom to do
odd stuff, at the cost of very minor duplication of valgrind calls.

I agree we need more comments -- would you like me to have a go at
writing them?

One thing I was stewing over last night is that a MemoryContextReset
will mess up any context identifier assigned with
MemoryContextCopyAndSetIdentifier. I'd left that as a problem to
fix later, because we don't currently have a need to reset contexts
that use copied identifiers. But that assumption obviously will bite
us someday, so maybe now is a good time to think about it.

The very simplest fix would be to allocate non-constant idents with
malloc; which'd require adding a flag to track whether context->ident
needs to be free()d. We have room for another bool near the top of
struct MemoryContextData (and at some point we could turn those
bool fields into a flags word). The only real cost here is one
more free() while destroying a labeled context, which is probably
negligible.

Other ideas are possible but they seem to require getting the individual
mcxt methods involved, and I doubt it's worth the complexity.

There are a few leak warnings around guc.c that look like they might be
real, not false positives, and thus a bit concerning. Looks like several
guc check hooks don't bother to free the old *extra before allocating a
new one.

I'll take a look, but I'm pretty certain that guc.c, not the hooks, is
responsible for freeing those. Might be another case of valgrind not
understanding what's happening.

I suspect we might get better results from valgrind, not just for leaks
but also undefined value tracking, if we changed the way we represent
pools to utilize VALGRIND_MEMPOOL_METAPOOL |
VALGRIND_MEMPOOL_AUTO_FREE.

Don't really see why that'd help? I mean, it could conceivably help
catch bugs in the allocators themselves, but I don't follow the argument
that it'd improve anything else. Defined is defined, as far as I can
tell from the valgrind manual.

regards, tom lane

#13Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#12)
Re: Getting better results from valgrind leak tracking

Hi,

On 2021-03-17 18:07:36 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

I found a way around that late last night. Need to mark the context
itself as an allocation. But I made a mess on the way to that and need
to clean the patch up before sending it (and need to drop my
girlfriend off first).

Unfortunately I didn't immediately find a way to do this while keeping
the MEMPOOL_CREATE/DESTROY in mcxt.c. The attached patch moves the pool
creation into the memory context implementations, "allocates" the
context itself as part of that pool, and changes the reset
implementation from MEMPOOL_DESTROY + MEMPOOL_CREATE to instead do
MEMPOOL_TRIM. That leaves the memory context itself valid (and thus
tracking ->ident etc), but marks all the other memory as freed.

Huh, interesting. I wonder why that makes the ident problem go away?
I'd supposed that valgrind would see the context headers as ordinary
memory belonging to the global "malloc" pool, so that any pointers
inside them ought to be considered valid.

I'm didn't quite understand either at the time of writing the change. It
just seemed the only explanation for some the behaviour I was seeing, so
I tried it. Just to be initially be rebuffed due to errors when
accessing the recycled sets...

I spent a bit of time looking at valgrind, and it looks to be explicit
behaviour:

memcheck/mc_leakcheck.c

static MC_Chunk**
find_active_chunks(Int* pn_chunks)
{
// Our goal is to construct a set of chunks that includes every
// mempool chunk, and every malloc region that *doesn't* contain a
// mempool chunk.
...
// Then we loop over the mempool tables. For each chunk in each
// pool, we set the entry in the Bool array corresponding to the
// malloc chunk containing the mempool chunk.
VG_(HT_ResetIter)(MC_(mempool_list));
while ( (mp = VG_(HT_Next)(MC_(mempool_list))) ) {
VG_(HT_ResetIter)(mp->chunks);
while ( (mc = VG_(HT_Next)(mp->chunks)) ) {

// We'll need to record this chunk.
n_chunks++;

// Possibly invalidate the malloc holding the beginning of this chunk.
m = find_chunk_for(mc->data, mallocs, n_mallocs);
if (m != -1 && malloc_chunk_holds_a_pool_chunk[m] == False) {
tl_assert(n_chunks > 0);
n_chunks--;
malloc_chunk_holds_a_pool_chunk[m] = True;
}

I think that means it explicitly ignores the entire malloced allocation
whenever there's *any* mempool chunk in it, instead considering only the
mempool chunks. So once aset allocats something in the init block, the
context itself is ignored for leak checking purposes. But that wouldn't
be the case if we didn't have the init block...

I guess that makes sense, but would definitely be nice to have had
documented...

Anyway, I don't have a problem with rearranging the responsibility
like this. It gives the individual allocators more freedom to do
odd stuff, at the cost of very minor duplication of valgrind calls.

Yea, similar.

I agree we need more comments -- would you like me to have a go at
writing them?

Gladly!

One thing I was stewing over last night is that a MemoryContextReset
will mess up any context identifier assigned with
MemoryContextCopyAndSetIdentifier.

Valgrind should catch such problems. Not having the danger would be
better, of course.

We could also add an assertion at reset time that the identifier isn't
in the current context.

The very simplest fix would be to allocate non-constant idents with
malloc; which'd require adding a flag to track whether context->ident
needs to be free()d. We have room for another bool near the top of
struct MemoryContextData (and at some point we could turn those
bool fields into a flags word). The only real cost here is one
more free() while destroying a labeled context, which is probably
negligible.

Hm. A separate malloc()/free() could conceivably actually show up in
profiles at some point.

What if we instead used that flag to indicate that MemoryContextReset()
needs to save the identifier? Then any cost would only be paid if the
context is actually reset.

There are a few leak warnings around guc.c that look like they might be
real, not false positives, and thus a bit concerning. Looks like several
guc check hooks don't bother to free the old *extra before allocating a
new one.

I'll take a look, but I'm pretty certain that guc.c, not the hooks, is
responsible for freeing those.

Yea, I had misattributed the leak to the callbacks. One of the things I
saw definitely is a leak: if call_string_check_hook() ereport(ERRORs)
the guc_strdup() of newval->stringval is lost.

There's another set of them that seems to be related to paralellism. But
I've not hunted it down yet.

I think it might be worth adding a VALGRIND_DO_CHANGED_LEAK_CHECK() at
the end of a transaction or such? That way it'd not be quite as hard to
pinpoint the source of a leak to individual statements as it is right
now.

Might be another case of valgrind not understanding what's happening.

Those allocations seem very straightforward, plain malloc/free that is
referenced by a pointer to the start of the allocation. So that doesn't
seem very likely?

I suspect we might get better results from valgrind, not just for leaks
but also undefined value tracking, if we changed the way we represent
pools to utilize VALGRIND_MEMPOOL_METAPOOL |
VALGRIND_MEMPOOL_AUTO_FREE.

Don't really see why that'd help? I mean, it could conceivably help
catch bugs in the allocators themselves, but I don't follow the argument
that it'd improve anything else. Defined is defined, as far as I can
tell from the valgrind manual.

I think it might improve the attribution of some use-after-free errors
to the memory context. Looks like it also might reduce the cost of
running valgrind a bit.

Greetings,

Andres Freund

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#13)
Re: Getting better results from valgrind leak tracking

Andres Freund <andres@anarazel.de> writes:

On 2021-03-17 18:07:36 -0400, Tom Lane wrote:

Huh, interesting. I wonder why that makes the ident problem go away?

I spent a bit of time looking at valgrind, and it looks to be explicit
behaviour:

// Our goal is to construct a set of chunks that includes every
// mempool chunk, and every malloc region that *doesn't* contain a
// mempool chunk.

Ugh.

I guess that makes sense, but would definitely be nice to have had
documented...

Indeed. So this started happening to us when we merged the aset
header with its first allocation block.

There are a few leak warnings around guc.c that look like they might be
real, not false positives, and thus a bit concerning. Looks like several
guc check hooks don't bother to free the old *extra before allocating a
new one.

I'll take a look, but I'm pretty certain that guc.c, not the hooks, is
responsible for freeing those.

I believe I've just tracked down the cause of that. Those errors
are (as far as I've seen) only happening in parallel workers, and
the reason is this gem in RestoreGUCState:

/* See comment at can_skip_gucvar(). */
for (i = 0; i < num_guc_variables; i++)
if (!can_skip_gucvar(guc_variables[i]))
InitializeOneGUCOption(guc_variables[i]);

where InitializeOneGUCOption zeroes out that GUC variable, causing
it to lose track of any allocations it was responsible for.

At minimum, somebody didn't understand the difference between "initialize"
and "reset". But I suspect we could just nuke this loop altogether.
I've not yet tried to grok the comment that purports to justify it,
but I fail to see why it'd ever be useful to drop values inherited
from the postmaster.

regards, tom lane

#15Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#14)
Re: Getting better results from valgrind leak tracking

Hi,

On 2021-03-17 21:30:48 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2021-03-17 18:07:36 -0400, Tom Lane wrote:

Huh, interesting. I wonder why that makes the ident problem go away?

I spent a bit of time looking at valgrind, and it looks to be explicit
behaviour:

// Our goal is to construct a set of chunks that includes every
// mempool chunk, and every malloc region that *doesn't* contain a
// mempool chunk.

Ugh.

I guess that makes sense, but would definitely be nice to have had
documented...

Indeed. So this started happening to us when we merged the aset
header with its first allocation block.

Yea.

I have the feeling that valgrinds error detection capability in our
codebase used to be higher too. I wonder if that could be related
somehow. Or maybe it's just a figment of my imagination.

I believe I've just tracked down the cause of that. Those errors
are (as far as I've seen) only happening in parallel workers, and
the reason is this gem in RestoreGUCState:

/* See comment at can_skip_gucvar(). */
for (i = 0; i < num_guc_variables; i++)
if (!can_skip_gucvar(guc_variables[i]))
InitializeOneGUCOption(guc_variables[i]);

where InitializeOneGUCOption zeroes out that GUC variable, causing
it to lose track of any allocations it was responsible for.

Ouch. At least it's a short lived issue rather than something permanently
leaking memory on every SIGHUP...

At minimum, somebody didn't understand the difference between "initialize"
and "reset". But I suspect we could just nuke this loop altogether.
I've not yet tried to grok the comment that purports to justify it,
but I fail to see why it'd ever be useful to drop values inherited
from the postmaster.

I can't really make sense of it either. I think it may be trying to
restore the GUC state to what it would have been in postmaster,
disregarding all the settings that were set as part of PostgresInit()
etc?

Greetings,

Andres Freund

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#15)
Re: Getting better results from valgrind leak tracking

Andres Freund <andres@anarazel.de> writes:

On 2021-03-17 21:30:48 -0400, Tom Lane wrote:

I believe I've just tracked down the cause of that. Those errors
are (as far as I've seen) only happening in parallel workers, and
the reason is this gem in RestoreGUCState: ...

Ouch. At least it's a short lived issue rather than something permanently
leaking memory on every SIGHUP...

Yeah. This leak is really insignificant in practice, although I'm
suspicious that randomly init'ing GUCs like this might have semantic
issues that we've not detected yet.

I've not yet tried to grok the comment that purports to justify it,
but I fail to see why it'd ever be useful to drop values inherited
from the postmaster.

I can't really make sense of it either. I think it may be trying to
restore the GUC state to what it would have been in postmaster,
disregarding all the settings that were set as part of PostgresInit()
etc?

At least in a non-EXEC_BACKEND build, the most reliable way to reproduce
the postmaster's settings is to do nothing whatsoever. And I think the
same is true for EXEC_BACKEND, really, because the guc.c subsystem is
responsible for restoring what would have been the inherited-via-fork
settings. So I'm really not sure what this is on about, and I'm too
tired to try to figure it out tonight.

In other news, I found that there's a genuine leak in
RelationBuildLocalRelation: RelationInitTableAccessMethod
was added in a spot where CurrentMemoryContext is CacheMemoryContext,
which is bad because it does a syscache lookup that can lead to
a table access which can leak some memory. Seems easy to fix though.

The plpgsql situation looks like a mess. As a short-term answer,
I'm inclined to recommend adding an exclusion that will ignore anything
allocated within plpgsql_compile(). Doing better will require a fair
amount of rewriting. (Although I suppose we could also consider adding
an on_proc_exit callback that runs through and frees all the function
cache entries.)

regards, tom lane

#17Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#7)
Re: Getting better results from valgrind leak tracking

Hi,

On 2021-03-17 00:01:55 -0400, Tom Lane wrote:

As for the particular point about ParallelBlockTableScanWorkerData,
I agree with your question to David about why that's in TableScanDesc
not HeapScanDesc, but I can't get excited about it not being freed in
heap_endscan. That's mainly because I do not believe that anything as
complex as a heap or indexscan should be counted on to be zero-leakage.
The right answer is to not do such operations in long-lived contexts.
So if we're running such a thing while switched into CacheContext,
*that* is the bug, not that heap_endscan didn't free this particular
allocation.

I agree that it's a bad idea to do scans in non-transient contexts. It
does however seems like there's a number of places that do...

I added the following hacky definition of "permanent" contexts

/*
* NB: Only for assertion use.
*
* TopMemoryContext itself obviously is permanent. Treat CacheMemoryContext
* and all its children as permanent too.
*
* XXX: Might be worth adding this as an explicit flag on the context?
*/
bool
MemoryContextIsPermanent(MemoryContext c)
{
if (c == TopMemoryContext)
return true;

while (c)
{
if (c == CacheMemoryContext)
return true;
c = c->parent;
}

return false;
}

and checked that the CurrentMemoryContext is not permanent in
SearchCatCacheInternal() and systable_beginscan(). Hit a number of
times.

The most glaring case is the RelationInitTableAccessMethod() call in
RelationBuildLocalRelation(). Seems like the best fix is to just move
the MemoryContextSwitchTo() to just before the
RelationInitTableAccessMethod(). Although I wonder if we shouldn't go
further, and move it to much earlier, somewhere after the rd_rel
allocation.

There's plenty other hits, but I think I should get back to working on
making the shared memory stats patch committable. I really wouldn't want
it to slip yet another year.

But I think it might make sense to add a flag indicating contexts that
shouldn't be used for non-transient data. Seems like we fairly regularly
have "bugs" around this?

Greetings,

Andres Freund

#18Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#16)
Re: Getting better results from valgrind leak tracking

On 2021-03-17 22:33:59 -0400, Tom Lane wrote:

I've not yet tried to grok the comment that purports to justify it,
but I fail to see why it'd ever be useful to drop values inherited
from the postmaster.

I can't really make sense of it either. I think it may be trying to
restore the GUC state to what it would have been in postmaster,
disregarding all the settings that were set as part of PostgresInit()
etc?

At least in a non-EXEC_BACKEND build, the most reliable way to reproduce
the postmaster's settings is to do nothing whatsoever. And I think the
same is true for EXEC_BACKEND, really, because the guc.c subsystem is
responsible for restoring what would have been the inherited-via-fork
settings. So I'm really not sure what this is on about, and I'm too
tired to try to figure it out tonight.

The restore thing runs after we've already set and initialized GUCs,
including things like user/database default GUCs. Is see things like

==2251779== 4,560 bytes in 1 blocks are definitely lost in loss record 383 of 406
==2251779== at 0x483877F: malloc (vg_replace_malloc.c:307)
==2251779== by 0x714A45: ConvertTimeZoneAbbrevs (datetime.c:4556)
==2251779== by 0x88DE95: load_tzoffsets (tzparser.c:465)
==2251779== by 0x88511D: check_timezone_abbreviations (guc.c:11565)
==2251779== by 0x884633: call_string_check_hook (guc.c:11232)
==2251779== by 0x87CB57: parse_and_validate_value (guc.c:7012)
==2251779== by 0x87DD5F: set_config_option (guc.c:7630)
==2251779== by 0x88397F: ProcessGUCArray (guc.c:10784)
==2251779== by 0x32BCCF: ApplySetting (pg_db_role_setting.c:256)
==2251779== by 0x874CA2: process_settings (postinit.c:1163)
==2251779== by 0x874A0B: InitPostgres (postinit.c:1048)
==2251779== by 0x60129A: BackgroundWorkerInitializeConnectionByOid (postmaster.c:5681)
==2251779== by 0x2BB283: ParallelWorkerMain (parallel.c:1374)
==2251779== by 0x5EBA6A: StartBackgroundWorker (bgworker.c:864)
==2251779== by 0x6014FE: do_start_bgworker (postmaster.c:5802)
==2251779== by 0x6018D2: maybe_start_bgworkers (postmaster.c:6027)
==2251779== by 0x600811: sigusr1_handler (postmaster.c:5190)
==2251779== by 0x4DD513F: ??? (in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so)
==2251779== by 0x556E865: select (select.c:41)
==2251779== by 0x5FC0CB: ServerLoop (postmaster.c:1700)
==2251779== by 0x5FBA68: PostmasterMain (postmaster.c:1408)
==2251779== by 0x4F8BFD: main (main.c:209)

The BackgroundWorkerInitializeConnectionByOid() in that trace is before
the RestoreGUCState() later in ParallelWorkerMain(). But that doesn't
really explain the approach.

In other news, I found that there's a genuine leak in
RelationBuildLocalRelation: RelationInitTableAccessMethod
was added in a spot where CurrentMemoryContext is CacheMemoryContext,
which is bad because it does a syscache lookup that can lead to
a table access which can leak some memory. Seems easy to fix though.

Yea, that's the one I was talking about re
ParallelBlockTableScanWorkerData not being freed. While I think we
should also add that pfree, I think you're right, and we shouldn't do
RelationInitTableAccessMethod() while in CacheMemoryContext.

The plpgsql situation looks like a mess. As a short-term answer,
I'm inclined to recommend adding an exclusion that will ignore anything
allocated within plpgsql_compile(). Doing better will require a fair
amount of rewriting. (Although I suppose we could also consider adding
an on_proc_exit callback that runs through and frees all the function
cache entries.)

The error variant of this one seems like it might actually be a
practically relevant leak? As well as increasing memory usage for
compiled plpgsql functions unnecessarily, of course. The latter would be
good to fix, but the former seems like it might be a practical issue for
poolers and the like?

So I think we should do at least the reparenting thing to address that?

Greetings,

Andres Freund

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#17)
Re: Getting better results from valgrind leak tracking

Andres Freund <andres@anarazel.de> writes:

The most glaring case is the RelationInitTableAccessMethod() call in
RelationBuildLocalRelation(). Seems like the best fix is to just move
the MemoryContextSwitchTo() to just before the
RelationInitTableAccessMethod(). Although I wonder if we shouldn't go
further, and move it to much earlier, somewhere after the rd_rel
allocation.

Yeah, same thing I did locally. Not sure if it's worth working harder.

There's plenty other hits, but I think I should get back to working on
making the shared memory stats patch committable. I really wouldn't want
it to slip yet another year.

+1, so far there's little indication that we're finding any serious leaks
here. Might be best to set it all aside till there's more time.

regards, tom lane

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#19)
1 attachment(s)
Re: Getting better results from valgrind leak tracking

I wrote:

Andres Freund <andres@anarazel.de> writes:

There's plenty other hits, but I think I should get back to working on
making the shared memory stats patch committable. I really wouldn't want
it to slip yet another year.

+1, so far there's little indication that we're finding any serious leaks
here. Might be best to set it all aside till there's more time.

Well, I failed to resist the temptation to keep poking at this issue,
mainly because I felt that it'd be a good idea to make sure we'd gotten
our arms around all of the detectable issues, even if we choose not to
fix them right away. The attached patch, combined with your preceding
memory context patch, eliminates all but a very small number of the leak
complaints in the core regression tests.

The remaing leak warnings that I see are:

1. WaitForReplicationWorkerAttach leaks the BackgroundWorkerHandle it's
passed. I'm not sure which function should clean that up, but in any
case it's only 16 bytes one time in one process, so it's hardly exciting.

2. There's lots of leakage in text search dictionary initialization
functions. This is hard to get around completely, because the API for
those functions has them being called in the dictionary's long-lived
context. In any case, the leaks are not large (modulo comments below),
and they would get cleaned up if the dictionary cache were thrown away.

3. partition_prune.sql repeatably produces this warning:

==00:00:44:39.764 3813570== 32,768 bytes in 1 blocks are possibly lost in loss record 2,084 of 2,096
==00:00:44:39.764 3813570== at 0x4C30F0B: malloc (vg_replace_malloc.c:307)
==00:00:44:39.764 3813570== by 0x94315A: AllocSetAlloc (aset.c:941)
==00:00:44:39.764 3813570== by 0x94B52F: MemoryContextAlloc (mcxt.c:804)
==00:00:44:39.764 3813570== by 0x8023DA: LockAcquireExtended (lock.c:845)
==00:00:44:39.764 3813570== by 0x7FFC7E: LockRelationOid (lmgr.c:116)
==00:00:44:39.764 3813570== by 0x5CB89F: findDependentObjects (dependency.c:962)
==00:00:44:39.764 3813570== by 0x5CBA66: findDependentObjects (dependency.c:1060)
==00:00:44:39.764 3813570== by 0x5CBA66: findDependentObjects (dependency.c:1060)
==00:00:44:39.764 3813570== by 0x5CCB72: performMultipleDeletions (dependency.c:409)
==00:00:44:39.764 3813570== by 0x66F574: RemoveRelations (tablecmds.c:1395)
==00:00:44:39.764 3813570== by 0x81C986: ProcessUtilitySlow.isra.8 (utility.c:1698)
==00:00:44:39.764 3813570== by 0x81BCF2: standard_ProcessUtility (utility.c:1034)

which evidently is warning that some LockMethodLocalHash entry is losing
track of its lockOwners array. But I sure don't see how that could
happen, nor why it'd only happen in this test. Could this be a false
report?

As you can see from the patch's additions to src/tools/valgrind.supp,
I punted on the issues around pl/pgsql's function-compile-time leaks.
We know that's an issue, but again the leaks are fairly small and
they are confined within function cache entries, so I'm not sure
how hard we should work on that.

(An idea for silencing both this and the dictionary leak warnings is
to install an on-proc-exit callback to drop the cached objects'
contexts.)

Anyway, looking through the patch, I found several non-cosmetic issues:

* You were right to suspect that there was residual leakage in guc.c's
handling of error cases. ProcessGUCArray and call_string_check_hook
are both guilty of leaking malloc'd strings if an error in a proposed
GUC setting is detected.

* I still haven't tried to wrap my brain around the question of what
RestoreGUCState really needs to be doing. I have, however, found that
check-world passes just fine with the InitializeOneGUCOption calls
diked out entirely, so that's what's in this patch.

* libpqwalreceiver.c leaks a malloc'd error string when
libpqrcv_check_conninfo detects an erroneous conninfo string.

* spell.c leaks a compiled regular expression if an ispell dictionary
cache entry is dropped. I fixed this by adding a memory context reset
callback to release the regex. This is potentially a rather large
leak, if the regex is complex (though typically it wouldn't be).

* BuildEventTriggerCache leaks working storage into
EventTriggerCacheContext.

* Likewise, load_domaintype_info leaks working storage into
a long-lived cache context.

* RelationDestroyRelation leaks rd_statlist; the patch that added
that failed to emulate the rd_indexlist prototype very well.

* As previously noted, RelationInitTableAccessMethod causes leaks.

* I made some effort to remove easily-removable leakage in
lookup_ts_dictionary_cache itself, although given the leaks in
its callees, this isn't terribly exciting.

I am suspicious that there's something still not quite right in the
memory context changes, because I noticed that the sanity_check.sql
test (and no other ones) complained about row_description_context being
leaked. I realized that the warning was because my compiler optimizes
away the row_description_context static variable altogether, leaving no
pointer to the context behind. I hacked around that in this patch by
marking that variable volatile. However, that explanation just begs
the question of why every run didn't show the same warning. I suppose
that valgrind was considering the context to be referenced by some
child or sibling context pointer, but if that's the explanation then
we should never have seen the warning. I'm forced to the conclusion
that valgrind is counting some but not all child/sibling context
pointers, which sure seems like a bug. Maybe we need the two-level-
mempool mechanism after all to get that to work better.

Anyway, I think we need to commit and even back-patch the portion
of the attached changes that are in
* libpqwalreceiver.c
* spell.h / spell.c
* relcache.c
* guc.c (except the quick hack in RestoreGUCState)
Those are all genuine leaks that are in places where they could be
repeated and thus potentially add up to something significant.

There's a weaker case for applying the changes in evtcache.c,
ts_cache.c, and typcache.c. Those are all basically leaving some cruft
behind in a long-lived cache entry. But the cruft isn't huge and it
would be recovered at cache flush, so I'm not convinced it amounts to a
real-world problem.

The rest of this is either working around valgrind shortcomings or
jumping through a hoop to convince it that some data structure that's
still around at exit is still referenced. Maybe we should commit some
of it under "#ifdef USE_VALGRIND" tests. I really want to find some
other answer than moving the dlist_node fields, though.

Comments?

regards, tom lane

Attachments:

all-valgrind-hacks.patchtext/x-diff; charset=us-ascii; name=all-valgrind-hacks.patchDownload
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 4c7b1e7bfd..cd984929a6 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -888,7 +888,6 @@ RemoveSocketFiles(void)
 		(void) unlink(sock_path);
 	}
 	/* Since we're about to exit, no need to reclaim storage */
-	sock_paths = NIL;
 }
 
 
diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index d1a1f47a78..3cadfecfb0 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -22,6 +22,7 @@
 #include "utils/builtins.h"
 
 static shm_mq_handle *pq_mq_handle;
+static shm_mq_handle *volatile dummy_mq_handle = NULL;
 static bool pq_mq_busy = false;
 static pid_t pq_mq_parallel_leader_pid = 0;
 static pid_t pq_mq_parallel_leader_backend_id = InvalidBackendId;
@@ -64,6 +65,9 @@ pq_redirect_to_shm_mq(dsm_segment *seg, shm_mq_handle *mqh)
 static void
 pq_cleanup_redirect_to_shm_mq(dsm_segment *seg, Datum arg)
 {
+	/* fake out valgrind */
+	if (pq_mq_handle)
+		dummy_mq_handle = pq_mq_handle;
 	pq_mq_handle = NULL;
 	whereToSendOutput = DestNone;
 }
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 23ef23c13e..c4e2e1d228 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -850,6 +850,15 @@ AutoVacLauncherShutdown(void)
 			(errmsg_internal("autovacuum launcher shutting down")));
 	AutoVacuumShmem->av_launcherpid = 0;
 
+	/*
+	 * If valgrind'ing, free all the cruft in AutovacMemCxt, to avoid leak
+	 * complaints.
+	 */
+#ifdef USE_VALGRIND
+	MemoryContextSwitchTo(TopMemoryContext);
+	MemoryContextDelete(AutovacMemCxt);
+#endif
+
 	proc_exit(0);				/* done */
 }
 
@@ -2045,11 +2054,12 @@ do_autovacuum(void)
 	/* create hash table for toast <-> main relid mapping */
 	ctl.keysize = sizeof(Oid);
 	ctl.entrysize = sizeof(av_relation);
+	ctl.hcxt = AutovacMemCxt;
 
 	table_toast_map = hash_create("TOAST to main relid map",
 								  100,
 								  &ctl,
-								  HASH_ELEM | HASH_BLOBS);
+								  HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
 	/*
 	 * Scan pg_class to determine which tables to vacuum.
@@ -2596,11 +2606,6 @@ deleted:
 	}
 	LWLockRelease(AutovacuumLock);
 
-	/*
-	 * We leak table_toast_map here (among other things), but since we're
-	 * going away soon, it's not a problem.
-	 */
-
 	/*
 	 * Update pg_database.datfrozenxid, and truncate pg_xact if possible. We
 	 * only need to do this once, not after each table.
@@ -2626,6 +2631,14 @@ deleted:
 
 	/* Finally close out the last transaction. */
 	CommitTransactionCommand();
+
+	/*
+	 * If valgrind'ing, free all the cruft in AutovacMemCxt, to avoid leak
+	 * complaints.
+	 */
+#ifdef USE_VALGRIND
+	MemoryContextDelete(AutovacMemCxt);
+#endif
 }
 
 /*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index e8af05c04e..2814ee7609 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -176,13 +176,13 @@
  */
 typedef struct bkend
 {
+	dlist_node	elem;			/* list link in BackendList */
 	pid_t		pid;			/* process id of backend */
 	int32		cancel_key;		/* cancel key for cancels for this backend */
 	int			child_slot;		/* PMChildSlot for this backend, if any */
 	int			bkend_type;		/* child process flavor, see above */
 	bool		dead_end;		/* is it going to send an error and quit? */
 	bool		bgworker_notify;	/* gets bgworker start/stop notifications */
-	dlist_node	elem;			/* list link in BackendList */
 } Backend;
 
 static dlist_head BackendList = DLIST_STATIC_INIT(BackendList);
@@ -2011,6 +2011,7 @@ ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done)
 	if (proto == CANCEL_REQUEST_CODE)
 	{
 		processCancelRequest(port, buf);
+		pfree(buf);
 		/* Not really an error, but we don't want to proceed further */
 		return STATUS_ERROR;
 	}
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index f74378110a..021c1b36f3 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -246,9 +246,15 @@ libpqrcv_check_conninfo(const char *conninfo)
 
 	opts = PQconninfoParse(conninfo, &err);
 	if (opts == NULL)
+	{
+		/* The error string is malloc'd, so we must free it explicitly */
+		char	   *errcopy = err ? pstrdup(err) : "out of memory";
+
+		PQfreemem(err);
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("invalid connection string syntax: %s", err)));
+				 errmsg("invalid connection string syntax: %s", errcopy)));
+	}
 
 	PQconninfoFree(opts);
 }
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2b1b68109f..0ceeb8f911 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -176,7 +176,7 @@ static bool RecoveryConflictRetryable = true;
 static ProcSignalReason RecoveryConflictReason;
 
 /* reused buffer to pass to SendRowDescriptionMessage() */
-static MemoryContext row_description_context = NULL;
+static volatile MemoryContext row_description_context = NULL;
 static StringInfoData row_description_buf;
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c
index 9b9a9afaa8..0b9d8353a9 100644
--- a/src/backend/tsearch/spell.c
+++ b/src/backend/tsearch/spell.c
@@ -654,6 +654,17 @@ FindWord(IspellDict *Conf, const char *word, const char *affixflag, int flag)
 	return 0;
 }
 
+/*
+ * Context reset/delete callback for a regular-expression AFFIX
+ */
+static void
+regex_affix_deletion_callback(void *arg)
+{
+	aff_regex_struct *affregex = (aff_regex_struct *) arg;
+
+	pg_regfree(&(affregex->regex));
+}
+
 /*
  * Adds a new affix rule to the Affix field.
  *
@@ -716,6 +727,7 @@ NIAddAffix(IspellDict *Conf, const char *flag, char flagflags, const char *mask,
 		int			err;
 		pg_wchar   *wmask;
 		char	   *tmask;
+		aff_regex_struct *pregex;
 
 		Affix->issimple = 0;
 		Affix->isregis = 0;
@@ -729,18 +741,32 @@ NIAddAffix(IspellDict *Conf, const char *flag, char flagflags, const char *mask,
 		wmask = (pg_wchar *) tmpalloc((masklen + 1) * sizeof(pg_wchar));
 		wmasklen = pg_mb2wchar_with_len(tmask, wmask, masklen);
 
-		err = pg_regcomp(&(Affix->reg.regex), wmask, wmasklen,
+		/*
+		 * The regex engine stores its stuff using malloc not palloc, so we
+		 * must arrange to explicitly clean up the regex when the dictionary's
+		 * context is cleared.  That means the regex_t has to stay in a fixed
+		 * location within the context; we can't keep it directly in the AFFIX
+		 * struct, since we may sort and resize the array of AFFIXes.
+		 */
+		Affix->reg.pregex = pregex = palloc(sizeof(aff_regex_struct));
+
+		err = pg_regcomp(&(pregex->regex), wmask, wmasklen,
 						 REG_ADVANCED | REG_NOSUB,
 						 DEFAULT_COLLATION_OID);
 		if (err)
 		{
 			char		errstr[100];
 
-			pg_regerror(err, &(Affix->reg.regex), errstr, sizeof(errstr));
+			pg_regerror(err, &(pregex->regex), errstr, sizeof(errstr));
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
 					 errmsg("invalid regular expression: %s", errstr)));
 		}
+
+		pregex->mcallback.func = regex_affix_deletion_callback;
+		pregex->mcallback.arg = (void *) pregex;
+		MemoryContextRegisterResetCallback(CurrentMemoryContext,
+										   &pregex->mcallback);
 	}
 
 	Affix->flagflags = flagflags;
@@ -2133,7 +2159,7 @@ CheckAffix(const char *word, size_t len, AFFIX *Affix, int flagflags, char *neww
 		data = (pg_wchar *) palloc((newword_len + 1) * sizeof(pg_wchar));
 		data_len = pg_mb2wchar_with_len(newword, data, newword_len);
 
-		if (pg_regexec(&(Affix->reg.regex), data, data_len,
+		if (pg_regexec(&(Affix->reg.pregex->regex), data, data_len,
 					   0, NULL, 0, NULL, 0) == REG_OKAY)
 		{
 			pfree(data);
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 55c9445898..2abdd44190 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -814,7 +814,7 @@ InitCatCache(int id,
 	 * Note: we rely on zeroing to initialize all the dlist headers correctly
 	 */
 	sz = sizeof(CatCache) + PG_CACHE_LINE_SIZE;
-	cp = (CatCache *) CACHELINEALIGN(palloc0(sz));
+	cp = (CatCache *) palloc0(sz);
 	cp->cc_bucket = palloc0(nbuckets * sizeof(dlist_head));
 
 	/*
diff --git a/src/backend/utils/cache/evtcache.c b/src/backend/utils/cache/evtcache.c
index 460b720a65..cf2c26a60f 100644
--- a/src/backend/utils/cache/evtcache.c
+++ b/src/backend/utils/cache/evtcache.c
@@ -111,9 +111,6 @@ BuildEventTriggerCache(void)
 									  (Datum) 0);
 	}
 
-	/* Switch to correct memory context. */
-	oldcontext = MemoryContextSwitchTo(EventTriggerCacheContext);
-
 	/* Prevent the memory context from being nuked while we're rebuilding. */
 	EventTriggerCacheState = ETCS_REBUILD_STARTED;
 
@@ -170,6 +167,9 @@ BuildEventTriggerCache(void)
 		else
 			continue;
 
+		/* Switch to correct memory context. */
+		oldcontext = MemoryContextSwitchTo(EventTriggerCacheContext);
+
 		/* Allocate new cache item. */
 		item = palloc0(sizeof(EventTriggerCacheItem));
 		item->fnoid = form->evtfoid;
@@ -187,6 +187,9 @@ BuildEventTriggerCache(void)
 			entry->triggerlist = lappend(entry->triggerlist, item);
 		else
 			entry->triggerlist = list_make1(item);
+
+		/* Restore previous memory context. */
+		MemoryContextSwitchTo(oldcontext);
 	}
 
 	/* Done with pg_event_trigger scan. */
@@ -194,9 +197,6 @@ BuildEventTriggerCache(void)
 	index_close(irel, AccessShareLock);
 	relation_close(rel, AccessShareLock);
 
-	/* Restore previous memory context. */
-	MemoryContextSwitchTo(oldcontext);
-
 	/* Install new cache. */
 	EventTriggerCache = cache;
 
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..d90b2812af 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2392,6 +2392,7 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
 	FreeTriggerDesc(relation->trigdesc);
 	list_free_deep(relation->rd_fkeylist);
 	list_free(relation->rd_indexlist);
+	list_free(relation->rd_statlist);
 	bms_free(relation->rd_indexattr);
 	bms_free(relation->rd_keyattr);
 	bms_free(relation->rd_pkattr);
@@ -3542,6 +3543,13 @@ RelationBuildLocalRelation(const char *relname,
 
 	rel->rd_rel->relam = accessmtd;
 
+	/*
+	 * RelationInitTableAccessMethod will do syscache lookups, so we
+	 * mustn't run it in CacheMemoryContext.  Fortunately, the remaining
+	 * steps don't require a long-lived current context.
+	 */
+	MemoryContextSwitchTo(oldcxt);
+
 	if (relkind == RELKIND_RELATION ||
 		relkind == RELKIND_SEQUENCE ||
 		relkind == RELKIND_TOASTVALUE ||
@@ -3565,11 +3573,6 @@ RelationBuildLocalRelation(const char *relname,
 	 */
 	EOXactListAdd(rel);
 
-	/*
-	 * done building relcache entry.
-	 */
-	MemoryContextSwitchTo(oldcxt);
-
 	/* It's fully valid */
 	rel->rd_isvalid = true;
 
diff --git a/src/backend/utils/cache/ts_cache.c b/src/backend/utils/cache/ts_cache.c
index 384107b6ba..a983478f69 100644
--- a/src/backend/utils/cache/ts_cache.c
+++ b/src/backend/utils/cache/ts_cache.c
@@ -314,11 +314,18 @@ lookup_ts_dictionary_cache(Oid dictId)
 
 		if (OidIsValid(template->tmplinit))
 		{
+			FmgrInfo	initflinfo;
 			List	   *dictoptions;
 			Datum		opt;
 			bool		isnull;
 			MemoryContext oldcontext;
 
+			/*
+			 * Lookup the init method while using caller's context, else we
+			 * might leak cruft in the private memory context.
+			 */
+			fmgr_info(template->tmplinit, &initflinfo);
+
 			/*
 			 * Init method runs in dictionary's private memory context, and we
 			 * make sure the options are stored there too
@@ -333,9 +340,17 @@ lookup_ts_dictionary_cache(Oid dictId)
 			else
 				dictoptions = deserialize_deflist(opt);
 
+			/*
+			 * We don't currently have any real use for the dictoptions list
+			 * after calling the init method, but the dictionary might
+			 * continue to reference parts of it, so we can't free it.  Keep a
+			 * pointer to it so valgrind doesn't complain that it's leaked.
+			 */
+			entry->dictOptions = dictoptions;
+
 			entry->dictData =
-				DatumGetPointer(OidFunctionCall1(template->tmplinit,
-												 PointerGetDatum(dictoptions)));
+				DatumGetPointer(FunctionCall1(&initflinfo,
+											  PointerGetDatum(dictoptions)));
 
 			MemoryContextSwitchTo(oldcontext);
 		}
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 4915ef5934..570f2986a7 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -1091,9 +1091,6 @@ load_domaintype_info(TypeCacheEntry *typentry)
 				dcc->dccRefCount = 0;
 			}
 
-			/* Create node trees in DomainConstraintCache's context */
-			oldcxt = MemoryContextSwitchTo(dcc->dccContext);
-
 			check_expr = (Expr *) stringToNode(constring);
 
 			/*
@@ -1108,10 +1105,13 @@ load_domaintype_info(TypeCacheEntry *typentry)
 			 */
 			check_expr = expression_planner(check_expr);
 
+			/* Create node trees in DomainConstraintCache's context */
+			oldcxt = MemoryContextSwitchTo(dcc->dccContext);
+
 			r = makeNode(DomainConstraintState);
 			r->constrainttype = DOM_CONSTRAINT_CHECK;
 			r->name = pstrdup(NameStr(c->conname));
-			r->check_expr = check_expr;
+			r->check_expr = copyObject(check_expr);
 			r->check_exprstate = NULL;
 
 			MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 6546e3c7c7..df39bc77df 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -1713,6 +1713,10 @@ element_alloc(HTAB *hashp, int nelem, int freelist_idx)
 	if (hashp->isfixed)
 		return false;
 
+	/* Force separate allocations to de-confuse valgrind */
+	if (!hashp->isshared)
+		nelem = 1;
+
 	/* Each element has a HASHELEMENT header plus user data. */
 	elementSize = MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(hctl->entrysize);
 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 8b73850d0d..be37e8b312 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -944,7 +944,6 @@ UnlinkLockFiles(int status, Datum arg)
 		/* Should we complain if the unlink fails? */
 	}
 	/* Since we're about to exit, no need to reclaim storage */
-	lock_files = NIL;
 
 	/*
 	 * Lock file removal should always be the last externally visible action
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 855076b1fd..c2e662a7b6 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -10639,10 +10639,12 @@ RestoreGUCState(void *gucstate)
 	int			i;
 	ErrorContextCallback error_context_callback;
 
+#if 0
 	/* See comment at can_skip_gucvar(). */
 	for (i = 0; i < num_guc_variables; i++)
 		if (!can_skip_gucvar(guc_variables[i]))
 			InitializeOneGUCOption(guc_variables[i]);
+#endif
 
 	/* First item is the length of the subsequent data */
 	memcpy(&len, gucstate, sizeof(len));
@@ -10754,6 +10756,8 @@ ProcessGUCArray(ArrayType *array,
 		char	   *s;
 		char	   *name;
 		char	   *value;
+		char	   *namecopy;
+		char	   *valuecopy;
 
 		d = array_ref(array, 1, &i,
 					  -1 /* varlenarray */ ,
@@ -10778,13 +10782,18 @@ ProcessGUCArray(ArrayType *array,
 			continue;
 		}
 
-		(void) set_config_option(name, value,
+		/* free strings immediately to avoid permanent leak if error */
+		namecopy = pstrdup(name);
+		free(name);
+		valuecopy = pstrdup(value);
+		free(value);
+
+		(void) set_config_option(namecopy, valuecopy,
 								 context, source,
 								 action, true, 0, false);
 
-		free(name);
-		if (value)
-			free(value);
+		pfree(namecopy);
+		pfree(valuecopy);
 		pfree(s);
 	}
 }
@@ -11216,34 +11225,50 @@ static bool
 call_string_check_hook(struct config_string *conf, char **newval, void **extra,
 					   GucSource source, int elevel)
 {
+	volatile bool result = true;
+
 	/* Quick success if no hook */
 	if (!conf->check_hook)
 		return true;
 
-	/* Reset variables that might be set by hook */
-	GUC_check_errcode_value = ERRCODE_INVALID_PARAMETER_VALUE;
-	GUC_check_errmsg_string = NULL;
-	GUC_check_errdetail_string = NULL;
-	GUC_check_errhint_string = NULL;
+	/*
+	 * If elevel is ERROR, or if the check_hook itself throws an elog
+	 * (undesirable, but not always avoidable), make sure we don't leak the
+	 * already-malloc'd newval string.
+	 */
+	PG_TRY();
+	{
+		/* Reset variables that might be set by hook */
+		GUC_check_errcode_value = ERRCODE_INVALID_PARAMETER_VALUE;
+		GUC_check_errmsg_string = NULL;
+		GUC_check_errdetail_string = NULL;
+		GUC_check_errhint_string = NULL;
 
-	if (!conf->check_hook(newval, extra, source))
+		if (!conf->check_hook(newval, extra, source))
+		{
+			ereport(elevel,
+					(errcode(GUC_check_errcode_value),
+					 GUC_check_errmsg_string ?
+					 errmsg_internal("%s", GUC_check_errmsg_string) :
+					 errmsg("invalid value for parameter \"%s\": \"%s\"",
+							conf->gen.name, *newval ? *newval : ""),
+					 GUC_check_errdetail_string ?
+					 errdetail_internal("%s", GUC_check_errdetail_string) : 0,
+					 GUC_check_errhint_string ?
+					 errhint("%s", GUC_check_errhint_string) : 0));
+			/* Flush any strings created in ErrorContext */
+			FlushErrorState();
+			result = false;
+		}
+	}
+	PG_CATCH();
 	{
-		ereport(elevel,
-				(errcode(GUC_check_errcode_value),
-				 GUC_check_errmsg_string ?
-				 errmsg_internal("%s", GUC_check_errmsg_string) :
-				 errmsg("invalid value for parameter \"%s\": \"%s\"",
-						conf->gen.name, *newval ? *newval : ""),
-				 GUC_check_errdetail_string ?
-				 errdetail_internal("%s", GUC_check_errdetail_string) : 0,
-				 GUC_check_errhint_string ?
-				 errhint("%s", GUC_check_errhint_string) : 0));
-		/* Flush any strings created in ErrorContext */
-		FlushErrorState();
-		return false;
+		free(*newval);
+		PG_RE_THROW();
 	}
+	PG_END_TRY();
 
-	return true;
+	return result;
 }
 
 static bool
diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c
index 5819faaf2d..2d0ef37f7f 100644
--- a/src/backend/utils/misc/ps_status.c
+++ b/src/backend/utils/misc/ps_status.c
@@ -113,6 +113,11 @@ static size_t ps_buffer_fixed_size; /* size of the constant prefix */
 static int	save_argc;
 static char **save_argv;
 
+/* We use this to convince Valgrind that replacement environ is referenced */
+#ifdef PS_USE_CLOBBER_ARGV
+static char ** volatile fake_environ;
+#endif
+
 
 /*
  * Call this early in startup to save the original argc/argv values.
@@ -192,6 +197,8 @@ save_ps_display_args(int argc, char **argv)
 		}
 		new_environ[i] = NULL;
 		environ = new_environ;
+		/* Valgrind tends to think this memory is leaked, so fool it */
+		fake_environ = new_environ;
 	}
 #endif							/* PS_USE_CLOBBER_ARGV */
 
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index fc7706314b..f32a13f786 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -32,6 +32,7 @@
  */
 typedef struct RegisteredBgWorker
 {
+	slist_node	rw_lnode;		/* list link (first to placate valgrind) */
 	BackgroundWorker rw_worker; /* its registry entry */
 	struct bkend *rw_backend;	/* its BackendList entry, or NULL */
 	pid_t		rw_pid;			/* 0 if not running */
@@ -39,7 +40,6 @@ typedef struct RegisteredBgWorker
 	TimestampTz rw_crashed_at;	/* if not 0, time it last crashed */
 	int			rw_shmem_slot;
 	bool		rw_terminate;
-	slist_node	rw_lnode;		/* list link */
 } RegisteredBgWorker;
 
 extern slist_head BackgroundWorkerList;
diff --git a/src/include/tsearch/dicts/spell.h b/src/include/tsearch/dicts/spell.h
index 9847e30208..e03ed42372 100644
--- a/src/include/tsearch/dicts/spell.h
+++ b/src/include/tsearch/dicts/spell.h
@@ -81,6 +81,17 @@ typedef struct spell_struct
 
 #define SPELLHDRSZ	(offsetof(SPELL, word))
 
+/*
+ * If an affix uses a regex, we have to store that separately in a struct
+ * that won't move around when arrays of affixes are enlarged or sorted.
+ * This is so that it can be found to be cleaned up at context destruction.
+ */
+typedef struct aff_regex_struct
+{
+	regex_t		regex;
+	MemoryContextCallback mcallback;
+} aff_regex_struct;
+
 /*
  * Represents an entry in an affix list.
  */
@@ -97,7 +108,7 @@ typedef struct aff_struct
 	char	   *repl;
 	union
 	{
-		regex_t		regex;
+		aff_regex_struct *pregex;
 		Regis		regis;
 	}			reg;
 } AFFIX;
diff --git a/src/include/tsearch/ts_cache.h b/src/include/tsearch/ts_cache.h
index 888f7028b1..c5617bab5b 100644
--- a/src/include/tsearch/ts_cache.h
+++ b/src/include/tsearch/ts_cache.h
@@ -59,6 +59,7 @@ typedef struct TSDictionaryCacheEntry
 	FmgrInfo	lexize;
 
 	MemoryContext dictCtx;		/* memory context to store private data */
+	List	   *dictOptions;
 	void	   *dictData;
 } TSDictionaryCacheEntry;
 
diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h
index ddc2762eb3..a304981467 100644
--- a/src/include/utils/catcache.h
+++ b/src/include/utils/catcache.h
@@ -85,6 +85,13 @@ typedef struct catcache
 
 typedef struct catctup
 {
+	/*
+	 * Each tuple in a cache is a member of a dlist that stores the elements
+	 * of its hash bucket.  We keep each dlist in LRU order to speed repeated
+	 * lookups.
+	 */
+	dlist_node	cache_elem;		/* list member of per-bucket list */
+
 	int			ct_magic;		/* for identifying CatCTup entries */
 #define CT_MAGIC   0x57261502
 
@@ -96,13 +103,6 @@ typedef struct catctup
 	 */
 	Datum		keys[CATCACHE_MAXKEYS];
 
-	/*
-	 * Each tuple in a cache is a member of a dlist that stores the elements
-	 * of its hash bucket.  We keep each dlist in LRU order to speed repeated
-	 * lookups.
-	 */
-	dlist_node	cache_elem;		/* list member of per-bucket list */
-
 	/*
 	 * A tuple marked "dead" must not be returned by subsequent searches.
 	 * However, it won't be physically deleted from the cache until its
@@ -156,13 +156,13 @@ typedef struct catctup
  */
 typedef struct catclist
 {
+	dlist_node	cache_elem;		/* list member of per-catcache list */
+
 	int			cl_magic;		/* for identifying CatCList entries */
 #define CL_MAGIC   0x52765103
 
 	uint32		hash_value;		/* hash value for lookup keys */
 
-	dlist_node	cache_elem;		/* list member of per-catcache list */
-
 	/*
 	 * Lookup keys for the entry, with the first nkeys elements being valid.
 	 * All by-reference are separately allocated.
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index ff09c63a02..afacd94924 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -95,6 +95,9 @@ extern int	plan_cache_mode;
  */
 typedef struct CachedPlanSource
 {
+	/* If CachedPlanSource has been saved, it is a member of a global list */
+	dlist_node	node;			/* list link, if is_saved */
+
 	int			magic;			/* should equal CACHEDPLANSOURCE_MAGIC */
 	struct RawStmt *raw_parse_tree; /* output of raw_parser(), or NULL */
 	const char *query_string;	/* source text of query */
@@ -125,8 +128,6 @@ typedef struct CachedPlanSource
 	bool		is_saved;		/* has CachedPlanSource been "saved"? */
 	bool		is_valid;		/* is the query_list currently valid? */
 	int			generation;		/* increments each time we create a plan */
-	/* If CachedPlanSource has been saved, it is a member of a global list */
-	dlist_node	node;			/* list link, if is_saved */
 	/* State kept to help decide whether to use custom or generic plans: */
 	double		generic_cost;	/* cost of generic plan, or -1 if not known */
 	double		total_custom_cost;	/* total cost of custom plans so far */
@@ -174,6 +175,8 @@ typedef struct CachedPlan
  */
 typedef struct CachedExpression
 {
+	dlist_node	node;			/* link in global list of CachedExpressions */
+
 	int			magic;			/* should equal CACHEDEXPR_MAGIC */
 	Node	   *expr;			/* planned form of expression */
 	bool		is_valid;		/* is the expression still valid? */
@@ -181,7 +184,6 @@ typedef struct CachedExpression
 	List	   *relationOids;	/* OIDs of relations the expr depends on */
 	List	   *invalItems;		/* other dependencies, as PlanInvalItems */
 	MemoryContext context;		/* context containing this CachedExpression */
-	dlist_node	node;			/* link in global list of CachedExpressions */
 } CachedExpression;
 
 
diff --git a/src/tools/valgrind.supp b/src/tools/valgrind.supp
index e3a179d210..c399b787fa 100644
--- a/src/tools/valgrind.supp
+++ b/src/tools/valgrind.supp
@@ -198,3 +198,22 @@
    Memcheck:Cond
    fun:PyObject_Realloc
 }
+
+# Temporary hack to ignore leakage in pl/pgsql compiler
+{
+   ignore-plpgsql-leaks
+   Memcheck:Leak
+   ...
+   fun:do_compile
+}
+
+# Snowball likes to work with a pointer that doesn't point to the start
+# of its palloc chunk; see create_s/lose_s.
+{
+   ignore-snowball-weirdness
+   Memcheck:Leak
+   match-leak-kinds: possible
+   fun:palloc
+   fun:create_s
+   fun:SN_create_env
+}
#21David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#2)
1 attachment(s)
Re: Getting better results from valgrind leak tracking

On Wed, 17 Mar 2021 at 15:31, Andres Freund <andres@anarazel.de> wrote:

I'm a bit confused about the precise design of rs_private /
ParallelBlockTableScanWorkerData, specifically why it's been added to
TableScanDesc, instead of just adding it to HeapScanDesc? And why is it
allocated unconditionally, instead of just for parallel scans?

That's a good point. In hindsight, I didn't spend enough effort
questioning that design in the original patch. I see now that the
rs_private field makes very little sense as we can just store what's
private to heapam in HeapScanDescData.

I've done that in the attached. I added the
ParallelBlockTableScanWorkerData as a pointer field in
HeapScanDescData and change it so we only allocate memory for it for
just parallel scans. The field is left as NULL for non-parallel
scans.

I've also added a pfree in heap_endscan() to free the memory when the
pointer is not NULL. I'm hoping that'll fix the valgrind warning, but
I've not run it to check.

David

Attachments:

parallel_chunk_fix.patchtext/plain; charset=US-ASCII; name=parallel_chunk_fix.patchDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 90711b2fcd..595310ba1b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -540,7 +540,7 @@ heapgettup(HeapScanDesc scan,
 				ParallelBlockTableScanDesc pbscan =
 				(ParallelBlockTableScanDesc) scan->rs_base.rs_parallel;
 				ParallelBlockTableScanWorker pbscanwork =
-				(ParallelBlockTableScanWorker) scan->rs_base.rs_private;
+				scan->rs_parallelworkerdata;
 
 				table_block_parallelscan_startblock_init(scan->rs_base.rs_rd,
 														 pbscanwork, pbscan);
@@ -748,7 +748,7 @@ heapgettup(HeapScanDesc scan,
 			ParallelBlockTableScanDesc pbscan =
 			(ParallelBlockTableScanDesc) scan->rs_base.rs_parallel;
 			ParallelBlockTableScanWorker pbscanwork =
-			(ParallelBlockTableScanWorker) scan->rs_base.rs_private;
+			scan->rs_parallelworkerdata;
 
 			page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd,
 													 pbscanwork, pbscan);
@@ -864,7 +864,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 				ParallelBlockTableScanDesc pbscan =
 				(ParallelBlockTableScanDesc) scan->rs_base.rs_parallel;
 				ParallelBlockTableScanWorker pbscanwork =
-				(ParallelBlockTableScanWorker) scan->rs_base.rs_private;
+				scan->rs_parallelworkerdata;
 
 				table_block_parallelscan_startblock_init(scan->rs_base.rs_rd,
 														 pbscanwork, pbscan);
@@ -1057,7 +1057,7 @@ heapgettup_pagemode(HeapScanDesc scan,
 			ParallelBlockTableScanDesc pbscan =
 			(ParallelBlockTableScanDesc) scan->rs_base.rs_parallel;
 			ParallelBlockTableScanWorker pbscanwork =
-			(ParallelBlockTableScanWorker) scan->rs_base.rs_private;
+			scan->rs_parallelworkerdata;
 
 			page = table_block_parallelscan_nextpage(scan->rs_base.rs_rd,
 													 pbscanwork, pbscan);
@@ -1194,8 +1194,6 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	scan->rs_base.rs_nkeys = nkeys;
 	scan->rs_base.rs_flags = flags;
 	scan->rs_base.rs_parallel = parallel_scan;
-	scan->rs_base.rs_private =
-		palloc(sizeof(ParallelBlockTableScanWorkerData));
 	scan->rs_strategy = NULL;	/* set in initscan */
 
 	/*
@@ -1231,6 +1229,15 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 	/* we only need to set this up once */
 	scan->rs_ctup.t_tableOid = RelationGetRelid(relation);
 
+	/*
+	 * Allocate memory to keep track of page allocation for parallel workers
+	 * when doing a parallel scan.
+	 */
+	if (parallel_scan != NULL)
+		scan->rs_parallelworkerdata = palloc(sizeof(ParallelBlockTableScanWorkerData));
+	else
+		scan->rs_parallelworkerdata = NULL;
+
 	/*
 	 * we do this here instead of in initscan() because heap_rescan also calls
 	 * initscan() and we don't want to allocate memory again
@@ -1306,6 +1313,9 @@ heap_endscan(TableScanDesc sscan)
 	if (scan->rs_strategy != NULL)
 		FreeAccessStrategy(scan->rs_strategy);
 
+	if (scan->rs_parallelworkerdata != NULL)
+		pfree(scan->rs_parallelworkerdata);
+
 	if (scan->rs_base.rs_flags & SO_TEMP_SNAPSHOT)
 		UnregisterSnapshot(scan->rs_base.rs_snapshot);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bc0936bc2d..d803f27787 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -65,6 +65,12 @@ typedef struct HeapScanDescData
 
 	HeapTupleData rs_ctup;		/* current tuple in scan, if any */
 
+	/*
+	 * For parallel scans to store page allocation data.  NULL when not
+	 * performing a parallel scan.
+	 */
+	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	int			rs_cindex;		/* current tuple's index in vistuples */
 	int			rs_ntuples;		/* number of visible tuples on page */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 0ef6d8edf7..17a161c69a 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -46,7 +46,6 @@ typedef struct TableScanDescData
 	 */
 	uint32		rs_flags;
 
-	void	   *rs_private;		/* per-worker private memory for AM to use */
 	struct ParallelTableScanDescData *rs_parallel;	/* parallel scan
 													 * information */
 } TableScanDescData;
#22Andres Freund
andres@anarazel.de
In reply to: David Rowley (#21)
Re: Getting better results from valgrind leak tracking

Hi,

On 2021-03-29 11:48:47 +1300, David Rowley wrote:

I've done that in the attached. I added the
ParallelBlockTableScanWorkerData as a pointer field in
HeapScanDescData and change it so we only allocate memory for it for
just parallel scans. The field is left as NULL for non-parallel
scans.

LGTM.

I've also added a pfree in heap_endscan() to free the memory when the
pointer is not NULL. I'm hoping that'll fix the valgrind warning, but
I've not run it to check.

Cool. I think that's a good thing to do. The leak itself should already
be fixed, and was more my fault...

commit 415ffdc2205e209b6a73fb42a3fdd6e57e16c7b2
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: 2021-03-18 20:50:56 -0400

Don't run RelationInitTableAccessMethod in a long-lived context.

Greetings,

Andres Freund

#23David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#22)
Re: Getting better results from valgrind leak tracking

On Tue, 30 Mar 2021 at 06:38, Andres Freund <andres@anarazel.de> wrote:

On 2021-03-29 11:48:47 +1300, David Rowley wrote:

I've done that in the attached. I added the
ParallelBlockTableScanWorkerData as a pointer field in
HeapScanDescData and change it so we only allocate memory for it for
just parallel scans. The field is left as NULL for non-parallel
scans.

LGTM.

Thanks for having a look.

Pushed.

David