Why our Valgrind reports suck

Started by Tom Laneabout 1 year ago31 messageshackers

tgl@sss.pgh.pa.us

about 1 year ago

A nearby thread [1]/messages/by-id/CAA9OW9e4RbpgQd8NSzpW6BgJQNpKGEFoohWhkbEL8=Z5obvD0Q@mail.gmail.com reminded me to wonder why we seem to have
so many false-positive leaks reported by Valgrind these days.
For example, at exit of a backend that's executed a couple of
trivial queries, I see

==00:00:00:25.515 260013== LEAK SUMMARY:
==00:00:00:25.515 260013== definitely lost: 3,038 bytes in 90 blocks
==00:00:00:25.515 260013== indirectly lost: 4,431 bytes in 61 blocks
==00:00:00:25.515 260013== possibly lost: 390,242 bytes in 852 blocks
==00:00:00:25.515 260013== still reachable: 579,139 bytes in 1,457 blocks
==00:00:00:25.515 260013== suppressed: 0 bytes in 0 blocks

so about a thousand "leaked" blocks, all but a couple of which
are false positives --- including nearly all the "definitely"
leaked ones.

Some testing and reading of the Valgrind manual [2]https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks turned up a
number of answers, which mostly boil down to us using very
Valgrind-unfriendly data structures. Per [2]https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks,

There are two ways a block can be reached. The first is with a
"start-pointer", i.e. a pointer to the start of the block. The
second is with an "interior-pointer", i.e. a pointer to the middle
of the block.

[ A block is reported as "possibly lost" if ] a chain of one or
more pointers to the block has been found, but at least one of the
pointers is an interior-pointer.

We have a number of places that allocate space in such a way that
blocks can only be reached by "interior pointers", leading to
those blocks being reported as possibly lost:

* MemoryContextAllocAligned does this more or less by definition.

* Dynahash tables often end up looking like this, since the first
element in each block created by element_alloc will be the tail end of
its freelist, and thus will be reachable only via interior pointers
later in the block, except when it's currently allocated.

* Blocks that are reached via slists or dlists are like this
unless the slist_node or dlist_node is at the front, which is not
our typical practice.

(There may be more cases, but those are what I identified in the
leak report quoted above.)

Another odd thing, which I have not found the explanation for, is
that strings made by MemoryContextCopyAndSetIdentifier() show up
as "definitely lost". This is nonsense, because they are surely
referenced by the context's "ident" field; but apparently Valgrind
isn't counting that for some reason.

I'd be okay with using a suppression pattern to hide the
MemoryContextCopyAndSetIdentifier cases, but that doesn't seem
like a very palatable answer for these other cases: too much
risk of hiding actual leaks.

I don't see a way to avoid the problem for MemoryContextAllocAligned
with the current context-type-agnostic implementation of that. We
could probably fix it though if we pushed it down a layer, so that the
alignment padding space could be treated as part of the chunk header.
Might be able to waste less space that way, too.

Dynahash tables could be fixed by expending a little extra storage
to chain all the element-pool blocks together explicitly, which
seems probably acceptable to do in USE_VALGRIND builds. (Maybe
while we are at that, we could fix things so that currently-unused
element slots are marked NOACCESS.)

I don't have an answer for slist/dlist usages other than rearranging
all the related structs. Anybody see a better way?

Or we could do nothing, but I think there is value in having
less clutter in Valgrind reports. Thoughts?

regards, tom lane

[1]: /messages/by-id/CAA9OW9e4RbpgQd8NSzpW6BgJQNpKGEFoohWhkbEL8=Z5obvD0Q@mail.gmail.com
[2]: https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks

Yasir

yasir.hussain.shah@gmail.com

about 1 year ago

In reply to: Tom Lane (#1)

Re: Why our Valgrind reports suck

On Fri, May 9, 2025 at 7:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

A nearby thread [1] reminded me to wonder why we seem to have
so many false-positive leaks reported by Valgrind these days.
For example, at exit of a backend that's executed a couple of
trivial queries, I see

==00:00:00:25.515 260013== LEAK SUMMARY:
==00:00:00:25.515 260013== definitely lost: 3,038 bytes in 90 blocks
==00:00:00:25.515 260013== indirectly lost: 4,431 bytes in 61 blocks
==00:00:00:25.515 260013== possibly lost: 390,242 bytes in 852 blocks
==00:00:00:25.515 260013== still reachable: 579,139 bytes in 1,457
blocks
==00:00:00:25.515 260013== suppressed: 0 bytes in 0 blocks

so about a thousand "leaked" blocks, all but a couple of which
are false positives --- including nearly all the "definitely"
leaked ones.

Some testing and reading of the Valgrind manual [2] turned up a
number of answers, which mostly boil down to us using very
Valgrind-unfriendly data structures. Per [2],

There are two ways a block can be reached. The first is with a
"start-pointer", i.e. a pointer to the start of the block. The
second is with an "interior-pointer", i.e. a pointer to the middle
of the block.

[ A block is reported as "possibly lost" if ] a chain of one or
more pointers to the block has been found, but at least one of the
pointers is an interior-pointer.

We have a number of places that allocate space in such a way that
blocks can only be reached by "interior pointers", leading to
those blocks being reported as possibly lost:

* MemoryContextAllocAligned does this more or less by definition.

* Dynahash tables often end up looking like this, since the first
element in each block created by element_alloc will be the tail end of
its freelist, and thus will be reachable only via interior pointers
later in the block, except when it's currently allocated.

* Blocks that are reached via slists or dlists are like this
unless the slist_node or dlist_node is at the front, which is not
our typical practice.

(There may be more cases, but those are what I identified in the
leak report quoted above.)

Another odd thing, which I have not found the explanation for, is
that strings made by MemoryContextCopyAndSetIdentifier() show up
as "definitely lost". This is nonsense, because they are surely
referenced by the context's "ident" field; but apparently Valgrind
isn't counting that for some reason.

I'd be okay with using a suppression pattern to hide the
MemoryContextCopyAndSetIdentifier cases, but that doesn't seem
like a very palatable answer for these other cases: too much
risk of hiding actual leaks.

I don't see a way to avoid the problem for MemoryContextAllocAligned
with the current context-type-agnostic implementation of that. We
could probably fix it though if we pushed it down a layer, so that the
alignment padding space could be treated as part of the chunk header.
Might be able to waste less space that way, too.

Dynahash tables could be fixed by expending a little extra storage
to chain all the element-pool blocks together explicitly, which
seems probably acceptable to do in USE_VALGRIND builds. (Maybe
while we are at that, we could fix things so that currently-unused
element slots are marked NOACCESS.)

I don't have an answer for slist/dlist usages other than rearranging
all the related structs. Anybody see a better way?

Or we could do nothing, but I think there is value in having
less clutter in Valgrind reports. Thoughts?

regards, tom lane

[1]
/messages/by-id/CAA9OW9e4RbpgQd8NSzpW6BgJQNpKGEFoohWhkbEL8=Z5obvD0Q@mail.gmail.com
[2] https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks

Thanks for the detailed analysis, this was very informative. I agree that
reducing noise in Valgrind reports would be valuable, especially for
catching real leaks. Having a clearer signal from Valgrind would definitely
help long term.

Regards,

Yasir
Data Bene

Andres Freund

andres@anarazel.de

about 1 year ago

In reply to: Tom Lane (#1)

Re: Why our Valgrind reports suck

Hi,

On 2025-05-08 22:04:06 -0400, Tom Lane wrote:

A nearby thread [1] reminded me to wonder why we seem to have
so many false-positive leaks reported by Valgrind these days.
For example, at exit of a backend that's executed a couple of
trivial queries, I see

==00:00:00:25.515 260013== LEAK SUMMARY:
==00:00:00:25.515 260013== definitely lost: 3,038 bytes in 90 blocks
==00:00:00:25.515 260013== indirectly lost: 4,431 bytes in 61 blocks
==00:00:00:25.515 260013== possibly lost: 390,242 bytes in 852 blocks
==00:00:00:25.515 260013== still reachable: 579,139 bytes in 1,457 blocks
==00:00:00:25.515 260013== suppressed: 0 bytes in 0 blocks

so about a thousand "leaked" blocks, all but a couple of which
are false positives --- including nearly all the "definitely"
leaked ones.

Some testing and reading of the Valgrind manual [2] turned up a
number of answers, which mostly boil down to us using very
Valgrind-unfriendly data structures. Per [2],

There are two ways a block can be reached. The first is with a
"start-pointer", i.e. a pointer to the start of the block. The
second is with an "interior-pointer", i.e. a pointer to the middle
of the block.

[ A block is reported as "possibly lost" if ] a chain of one or
more pointers to the block has been found, but at least one of the
pointers is an interior-pointer.

Huh. We use the memory pool client requests to inform valgrind about memory
contexts. I seem to recall that that "hid" many leak warnings from valgrind. I
wonder if we somehow broke (or weakened) that.

We currently don't reset TopMemoryContext at exit, which, obviously, does
massively increase the number of leaks. But OTOH, without that there's not a
whole lot of value in the leak check...

Greetings,

Andres Freund

andres@anarazel.de

about 1 year ago

In reply to: Andres Freund (#3)

Re: Why our Valgrind reports suck

Hi,

On 2025-05-09 11:29:43 -0400, Andres Freund wrote:

We currently don't reset TopMemoryContext at exit, which, obviously, does
massively increase the number of leaks. But OTOH, without that there's not a
whole lot of value in the leak check...

Briefly looking through the leaks indeed quickly found a real seeming leak,
albeit of limited size:
ProcessStartupPacket() does
buf = palloc(len + 1);
in TopMemoryContext() without ever freeing it.

I have wondered if we ought to have some infrastructure to tear down all
relcache, catcache entries (and other similar things) before shutdown if
MEMORY_CONTEXT_CHECKING is enabled. That would make it a lot easier to see
leaks at shutdown. We certainly have had leaks in relcache etc...

Greetings,

Andres Freund

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Andres Freund (#4)

Re: Why our Valgrind reports suck

Andres Freund <andres@anarazel.de> writes:

Briefly looking through the leaks indeed quickly found a real seeming leak,
albeit of limited size:
ProcessStartupPacket() does
buf = palloc(len + 1);
in TopMemoryContext() without ever freeing it.

Yeah, I saw that too. Didn't seem worth doing anything about it
unless we make pretty massive cleanups elsewhere.

I have wondered if we ought to have some infrastructure to tear down all
relcache, catcache entries (and other similar things) before shutdown if
MEMORY_CONTEXT_CHECKING is enabled. That would make it a lot easier to see
leaks at shutdown. We certainly have had leaks in relcache etc...

I'd be content if all that stuff was shown as "still reachable".

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Andres Freund (#3)

Re: Why our Valgrind reports suck

Andres Freund <andres@anarazel.de> writes:

On 2025-05-08 22:04:06 -0400, Tom Lane wrote:

A nearby thread [1] reminded me to wonder why we seem to have
so many false-positive leaks reported by Valgrind these days.

Huh. We use the memory pool client requests to inform valgrind about memory
contexts. I seem to recall that that "hid" many leak warnings from valgrind. I
wonder if we somehow broke (or weakened) that.

The problem with dynahash has been there since day one, so I think
we've just gotten used to ignoring "leak" reports associated with
hash tables --- I have, anyway. But the other triggers I listed
have appeared within the last five-ish years, if memory serves,
and we just didn't notice because of the existing dynahash noise.

We currently don't reset TopMemoryContext at exit, which, obviously, does
massively increase the number of leaks. But OTOH, without that there's not a
whole lot of value in the leak check...

Hmm. Yeah, we could just reset or delete TopMemoryContext, but
surely that would be counterproductive. It would mask any real
leak of palloc'd blocks. I'm a little suspicious of your other
idea of shutting down the caches, for the same reason: I wonder
if it wouldn't hide leaks rather than help find them.

One thing I noticed while reading the Valgrind manual is that
they describe a facility for "two level" tracking of custom
allocators such as ours. Apparently, what you're really supposed
to do is use VALGRIND_MEMPOOL_ALLOC to mark the malloc blocks
that the allocator works in, and VALGRIND_MALLOCLIKE_BLOCK to
mark the sub-blocks handed out by the allocator. I wonder if this
feature postdates our implementation of Valgrind support, and I
wonder even more if using it would improve our results.

I did experiment with marking context headers as accessible with
VALGRIND_MEMPOOL_ALLOC, and that made the complaints about
MemoryContextCopyAndSetIdentifier strings go away, confirming
that Valgrind is simply not considering the context->ident
pointers. Unfortunately it also added a bunch of other failures,
so that evidently is Not The Right Thing. I suspect what is
going on is related to this bit in valgrind.h:

For Memcheck users: if you use VALGRIND_MALLOCLIKE_BLOCK to carve out
custom blocks from within a heap block, B, that has been allocated with
malloc/calloc/new/etc, then block B will be *ignored* during leak-checking
-- the custom blocks will take precedence.

We're not using VALGRIND_MALLOCLIKE_BLOCK (yet, anyway), but I'm
suspecting that Valgrind probably also ignores heap blocks that
match VALGRIND_CREATE_MEMPOOL requests, except for the portions
thereof that are covered by VALGRIND_MEMPOOL_ALLOC requests.

Anyway, I'm now feeling motivated to go try some experiments.
Watch this space ...

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Tom Lane (#6)

Re: Why our Valgrind reports suck

I wrote:

One thing I noticed while reading the Valgrind manual is that
they describe a facility for "two level" tracking of custom
allocators such as ours.

And, since there's nothing new under the sun around here,
we already had a discussion about that back in 2021:

/messages/by-id/3471359.1615937770@sss.pgh.pa.us

That thread seems to have led to fixing some specific bugs,
but we never committed any of the discussed valgrind infrastructure
improvements. I'll have a go at resurrecting that...

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Tom Lane (#7)

Re: Why our Valgrind reports suck

I wrote:

And, since there's nothing new under the sun around here,
we already had a discussion about that back in 2021:
/messages/by-id/3471359.1615937770@sss.pgh.pa.us
That thread seems to have led to fixing some specific bugs,
but we never committed any of the discussed valgrind infrastructure
improvements. I'll have a go at resurrecting that...

Okay, here is a patch series that updates the
0001-Make-memory-contexts-themselves-more-visible-to-valg.patch
patch you posted in that thread, and makes various follow-up
fixes that either fix or paper over various leaks. Some of it
is committable I think, but other parts are just WIP. Anyway,
as of the 0010 patch we can run through the core regression tests
and see no more than a couple of kilobytes total reported leakage
in any process, except for two tests that expose leaks in TS
dictionary building. (That's fixable but I ran out of time,
and I wanted to get this posted before Montreal.) There is
work left to do before we can remove the suppressions added in
0002, but this is already huge progress compared to where we were.

A couple of these patches are bug fixes that need to be applied and
even back-patched. In particular, I had not realized that autovacuum
leaks a nontrivial amount of memory per relation processed (cf 0009),
and apparently has done for a few releases now. This is horrid in
databases with many tables, and I'm surprised that we've not gotten
complaints about it.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Tom Lane (#8)

Re: Why our Valgrind reports suck

I wrote:

Okay, here is a patch series that updates the
0001-Make-memory-contexts-themselves-more-visible-to-valg.patch
patch you posted in that thread,

I forgot to mention that I did try to implement the "two-level
pool" scheme that the Valgrind documentation talks about, and
could not make it work. There seem to be undocumented interactions
between the outer and inner chunks, and it's not real clear to me
that there's not outright bugs. Anyway, AFAICS that scheme
would bring us no immediate advantages anyway, compared to the
flat approach of just adding mempool chunks for the allocators'
management areas.

regards, tom lane

#10

Alvaro Herrera

alvherre@2ndquadrant.com

about 1 year ago

In reply to: Tom Lane (#8)

Re: Why our Valgrind reports suck

On 2025-May-11, Tom Lane wrote:

Okay, here is a patch series that updates the
0001-Make-memory-contexts-themselves-more-visible-to-valg.patch
patch you posted in that thread, and makes various follow-up
fixes that either fix or paper over various leaks.

Wow, that's a lot of extra fixes. I did a quick test run with all
patches save the last, then run tests "test_setup plpgsql tablespace" to
see if I'd get a leak report about plpgsql (to verify the setup was
correct), and I did. Rerunning after applying that patch silences what
seems to be most of them.

One source of leaks still present is LLVM. It's not large numbers,

140303.log:==00:00:12:42.244 140303== possibly lost: 18,336 bytes in 207 blocks
140442.log:==00:00:13:43.070 140442== possibly lost: 18,456 bytes in 210 blocks
140729.log:==00:00:16:09.802 140729== possibly lost: 64,120 bytes in 840 blocks
140892.log:==00:00:17:39.300 140892== possibly lost: 18,128 bytes in 202 blocks
141001.log:==00:00:18:31.631 141001== possibly lost: 18,128 bytes in 202 blocks
141031.log:==00:00:19:03.528 141031== possibly lost: 64,232 bytes in 717 blocks
141055.log:==00:00:20:11.536 141055== possibly lost: 18,128 bytes in 202 blocks
141836.log:==00:00:29:10.344 141836== definitely lost: 29,666 bytes in 3,175 blocks
141836.log:==00:00:29:10.344 141836== indirectly lost: 13,977 bytes in 1,138 blocks
142061.log:==00:00:32:26.264 142061== possibly lost: 18,128 bytes in 202 blocks

(The installcheck run is still going, but 90 tests in, those are the
only reports of ~10kB or more).

In particular, I had not realized that autovacuum
leaks a nontrivial amount of memory per relation processed (cf 0009),
and apparently has done for a few releases now. This is horrid in
databases with many tables, and I'm surprised that we've not gotten
complaints about it.

In PGConf Germany we got a lightning talk[1]https://www.postgresql.eu/events/pgconfde2025/sessions/session/6625/slides/681/06%20-%20LT%20-%20Jonas%20Marasus%20-%20War%20Story%20How%20Big%20Is%20Too%20Big%20For%20a%20Schema.pdf that reported a problem
that might be explained by this: with 10 million of relations, the OOM
killer gets invoked on autovacuum workers on the reported case, so
essentially autovacuum doesn't work at all. So clearly there is somebody
that would appreciate that this problem is fixed.

He actually blames it on relcache, but who knows how correct that is.
Not much to see on the slides other than they turn to vacuumdb instead,
but they're here:

[1]: https://www.postgresql.eu/events/pgconfde2025/sessions/session/6625/slides/681/06%20-%20LT%20-%20Jonas%20Marasus%20-%20War%20Story%20How%20Big%20Is%20Too%20Big%20For%20a%20Schema.pdf

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Computing is too important to be left to men." (Karen Spärck Jones)

#11

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Alvaro Herrera (#10)

Re: Why our Valgrind reports suck

=?utf-8?Q?=C3=81lvaro?= Herrera <alvherre@kurilemu.de> writes:

On 2025-May-11, Tom Lane wrote:

In particular, I had not realized that autovacuum
leaks a nontrivial amount of memory per relation processed (cf 0009),
and apparently has done for a few releases now. This is horrid in
databases with many tables, and I'm surprised that we've not gotten
complaints about it.

In PGConf Germany we got a lightning talk[1] that reported a problem
that might be explained by this: with 10 million of relations, the OOM
killer gets invoked on autovacuum workers on the reported case, so
essentially autovacuum doesn't work at all. So clearly there is somebody
that would appreciate that this problem is fixed.

Interesting.

He actually blames it on relcache, but who knows how correct that is.

I would not be surprised if the relcache is bloated too, but Valgrind
wouldn't think that's a leak. I wonder if it'd be worth setting up
a mechanism for autovacuum to drop the relcache entry for a table
once it's done with it.

regards, tom lane

#12

Yasir

yasir.hussain.shah@gmail.com

about 1 year ago

In reply to: Tom Lane (#8)

Re: Why our Valgrind reports suck

On Mon, May 12, 2025 at 12:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

And, since there's nothing new under the sun around here,
we already had a discussion about that back in 2021:

/messages/by-id/3471359.1615937770@sss.pgh.pa.us

That thread seems to have led to fixing some specific bugs,
but we never committed any of the discussed valgrind infrastructure
improvements. I'll have a go at resurrecting that...

Okay, here is a patch series that updates the
0001-Make-memory-contexts-themselves-more-visible-to-valg.patch
patch you posted in that thread, and makes various follow-up
fixes that either fix or paper over various leaks. Some of it
is committable I think, but other parts are just WIP. Anyway,
as of the 0010 patch we can run through the core regression tests
and see no more than a couple of kilobytes total reported leakage
in any process, except for two tests that expose leaks in TS
dictionary building. (That's fixable but I ran out of time,
and I wanted to get this posted before Montreal.) There is
work left to do before we can remove the suppressions added in
0002, but this is already huge progress compared to where we were.

A couple of these patches are bug fixes that need to be applied and
even back-patched. In particular, I had not realized that autovacuum
leaks a nontrivial amount of memory per relation processed (cf 0009),
and apparently has done for a few releases now. This is horrid in
databases with many tables, and I'm surprised that we've not gotten
complaints about it.

regards, tom lane

Thanks for sharing the patch series. I've applied the patches on my end and
rerun the tests. Valgrind now reports 8 bytes leakage only, and the
previously noisy outputs are almost entirely gone.
Here's valgrind output:

==00:00:01:50.385 90463== LEAK SUMMARY:
==00:00:01:50.385 90463== definitely lost: 8 bytes in 1 blocks
==00:00:01:50.385 90463== indirectly lost: 0 bytes in 0 blocks
==00:00:01:50.385 90463== possibly lost: 0 bytes in 0 blocks
==00:00:01:50.385 90463== still reachable: 1,182,132 bytes in 2,989
blocks
==00:00:01:50.385 90463== suppressed: 0 bytes in 0 blocks
==00:00:01:50.385 90463== Rerun with --leak-check=full to see details of
leaked memory
==00:00:01:50.385 90463==
==00:00:01:50.385 90463== For lists of detected and suppressed errors,
rerun with: -s
==00:00:01:50.385 90463== ERROR SUMMARY: 0 errors from 0 contexts
(suppressed: 34 from 3)

Regards,

Yasir Hussain
Data Bene

#13

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Tom Lane (#11)

Re: Why our Valgrind reports suck

Here's a v2 patchset that reaches the goal of zero reported leaks
in the core regression tests, with some caveats:

* Rather than completely fixing the function-cache and
TS-dictionary-cache issues, I just added suppression rules to
hide them. I am not convinced it is worth working harder than that.
The patchset does include some fixes that clean up low-hanging fruit
in that area, but going further seems like a lot of work (and risk of
bugs) for fairly minimal gain. The core regression tests show less
than 10K "suppressed" space in all test sessions but three, and those
three are still under 100K.

* The patch series assumes that the ModifyTable fix discussed at [1]/messages/by-id/213261.1747611093@sss.pgh.pa.us
is already applied.

* I still observe leaks in ProcessGetMemoryContextInterrupt, but
I think the consensus is we should just revert that as not yet ready
for prime time [2]/messages/by-id/594293.1747708165@sss.pgh.pa.us.

0001 is the same as before except I did more work on the comments.
I concluded that we were overloading the term "chunk" too much,
so I invented the term "vchunk" for Valgrind's notion of chunks.
(Feel free to suggest other terminology.)

0002 is new work to fix up MemoryContextAllocAligned so it doesn't
cause possible-leak complaints.

The rest are more or less bite-sized fixes of individual problems.
Probably we could squash a lot of them for final commit, but I
thought it'd be easier to review like this. Note that I'm not
expecting 0013 to get applied in this form [3]/messages/by-id/605328.1747710381@sss.pgh.pa.us, but without it we
have various gripes about memory leaked from plancache entries.

regards, tom lane

[1]: /messages/by-id/213261.1747611093@sss.pgh.pa.us
[2]: /messages/by-id/594293.1747708165@sss.pgh.pa.us
[3]: /messages/by-id/605328.1747710381@sss.pgh.pa.us

#14

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Tom Lane (#13)

Re: Why our Valgrind reports suck

I wrote:

Here's a v2 patchset that reaches the goal of zero reported leaks
in the core regression tests, with some caveats:

Oh, another caveat is that I ran this with a fairly minimalistic set
of build options. In a more complete build, I observed a small leak
in xml.c, which I posted a fix for in another thread [1]/messages/by-id/1358967.1747858817@sss.pgh.pa.us.

I also see the leakage Alvaro mentioned with --with-llvm. I'm not
sure how real that is though, because AFAICS all of it is reported
as "possibly lost", not "definitely lost" or "indirectly lost".
In Valgrind-speak that means there are pointers leading to the
chunk, just not pointing right at its start. So this could be the
result of allocation tricks being played by the C++ compiler.
The Valgrind manual talks about some heuristics they use to handle
C++ coding patterns, but they don't seem to help in my environment.
In any case, the allocation points are mostly far enough down into
LLVM functions that if the leaks are real, I'd tend to call them
LLVM's bugs not ours.

I haven't really tried our non-core test suites yet. Out of curiosity
I ran the plperl, plpython, and pltcl suites. All of them show pretty
enormous amounts of "possibly lost" data, which again seems likely to
be an artifact of allocation games within those libraries rather than
real leaks. I wonder if they have "valgrind friendly" build options
that we'd need to use to get sane results?

regards, tom lane

[1]: /messages/by-id/1358967.1747858817@sss.pgh.pa.us

#15

Yasir

yasir.hussain.shah@gmail.com

about 1 year ago

In reply to: Tom Lane (#13)

Re: Why our Valgrind reports suck

On Wed, May 21, 2025 at 10:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here's a v2 patchset that reaches the goal of zero reported leaks
in the core regression tests, with some caveats:

* Rather than completely fixing the function-cache and
TS-dictionary-cache issues, I just added suppression rules to
hide them. I am not convinced it is worth working harder than that.
The patchset does include some fixes that clean up low-hanging fruit
in that area, but going further seems like a lot of work (and risk of
bugs) for fairly minimal gain. The core regression tests show less
than 10K "suppressed" space in all test sessions but three, and those
three are still under 100K.

* The patch series assumes that the ModifyTable fix discussed at [1]
is already applied.

* I still observe leaks in ProcessGetMemoryContextInterrupt, but
I think the consensus is we should just revert that as not yet ready
for prime time [2].

0001 is the same as before except I did more work on the comments.
I concluded that we were overloading the term "chunk" too much,
so I invented the term "vchunk" for Valgrind's notion of chunks.
(Feel free to suggest other terminology.)

0002 is new work to fix up MemoryContextAllocAligned so it doesn't
cause possible-leak complaints.

I tested with the provided v2 patch series making sure mentioned [1]
applied.
More than 800 backend valgrind output files generated against regression,
among which 237 files contain suppressed: > 0 entries, of which 5 files
also contain "definitely lost: > 0 bytes entries".
The Maximum leak I found in these valgrind output files is 960 bytes only.

Whilst, the original issue I posted [link
</messages/by-id/CAA9OW9e536dJanVKZRd_GKQ4wN_m5rhsMnrL6ZvaWagzLwv3=g@mail.gmail.com>%5D
is fixed. There are no leaks.

Regards,

Yasir Hussain
Data Bene

Show quoted text

The rest are more or less bite-sized fixes of individual problems.
Probably we could squash a lot of them for final commit, but I
thought it'd be easier to review like this. Note that I'm not
expecting 0013 to get applied in this form [3], but without it we
have various gripes about memory leaked from plancache entries.

regards, tom lane

[1]
/messages/by-id/213261.1747611093@sss.pgh.pa.us
[2]
/messages/by-id/594293.1747708165@sss.pgh.pa.us
[3]
/messages/by-id/605328.1747710381@sss.pgh.pa.us

#16

Andres Freund

andres@anarazel.de

about 1 year ago

In reply to: Tom Lane (#13)

Re: Why our Valgrind reports suck

Hi,

0001:

This is rather wild, I really have only the dimmest memory of that other
thread even though I apparently resorted to reading valgrind's source code...

I think the vchunk/vpool naming, while not necessarily elegant, is helpful.

0002: Makes sense.

0003:
0004:
0005:

Ugh, that's rather gross. Not that I have a better idea. So it's probably
worth doing ...

0006: LGTM

0007:

+	/* Run the rest in xact context to avoid Valgrind leak complaints */
+	MemoryContextSwitchTo(TopTransactionContext);

It seems like it also protects at least somewhat against actual leaks?

0008: LGTM

0009:

Seems like we really ought to do more here. But it's a substantial
improvement, so let's not let perfect be the enemy of good.

0010:
0011:

Not great, but better than not doing anything.

0012:

Hm. Kinda feels like we should just always do it the USE_VALGRIND
approach. ISTM that if domain typecache entry building is a bottleneck, there
are way bigger problems than a copyObject()...

0014:

I guess I personally would try to put the freeing code into a helper, but
it's a close call.

0015:

I'd probably put the list_free() after the
LWLockRelease(LogicalRepWorkerLock)?

0016:

Agreed with the concern stated in the commit message, but addressing that
would obviously be a bigger project.

0017:

A tad weird to leave the comments above the removed = NILs in place, even
though it's obviously still correct.

0018: LGTM.

But this is the last step to get to zero reported leaks in a run of the core
regression tests, so let's do it.

I assume that's just about the core tests, not more? I.e. I can't make skink
enable leak checking?

Greetings,

Andres Freund

#17

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Andres Freund (#16)

Re: Why our Valgrind reports suck

Andres Freund <andres@anarazel.de> writes:

[ review ]

Thanks for the comments! I'll go through them and post an updated
version tomorrow. The cfbot is already nagging me for a rebase
now that 0013 is moot.

But this is the last step to get to zero reported leaks in a run of the core
regression tests, so let's do it.

I assume that's just about the core tests, not more? I.e. I can't make skink
enable leak checking?

No, we're not there yet. I've identified some other backend issues (in
postgres_fdw in particular), and I've not looked at frontend programs
at all. For most frontend programs, I'm dubious how much we care.

Actually the big problem is I don't know what to do about
plperl/plpython/pltcl. I suppose the big-hammer approach
would be to put in suppression patterns covering those,
at least till such time as someone has a better idea.

I'm envisioning this patch series as v19 work, were you
thinking we should be more aggressive?

regards, tom lane

#18

Andres Freund

andres@anarazel.de

about 1 year ago

In reply to: Tom Lane (#17)

Re: Why our Valgrind reports suck

Hi,

On 2025-05-22 21:48:24 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

But this is the last step to get to zero reported leaks in a run of the core
regression tests, so let's do it.

I assume that's just about the core tests, not more? I.e. I can't make skink
enable leak checking?

No, we're not there yet. I've identified some other backend issues (in
postgres_fdw in particular), and I've not looked at frontend programs
at all. For most frontend programs, I'm dubious how much we care.

Skink only tests backend stuff anyway, but the other backend issues make it a
no go for no...

I'm envisioning this patch series as v19 work, were you
thinking we should be more aggressive?

Mostly agreed - but I am wondering if the AV fix should be backpatched?

Greetings,

Andres Freund

#19

Peter Geoghegan

pg@bowt.ie

about 1 year ago

In reply to: Andres Freund (#18)

Re: Why our Valgrind reports suck

On Fri, May 23, 2025 at 11:42 AM Andres Freund <andres@anarazel.de> wrote:

I'm envisioning this patch series as v19 work, were you
thinking we should be more aggressive?

Mostly agreed - but I am wondering if the AV fix should be backpatched?

I think that it probably should be.

--
Peter Geoghegan

#20

Tom Lane

tgl@sss.pgh.pa.us

about 1 year ago

In reply to: Peter Geoghegan (#19)

Re: Why our Valgrind reports suck

Peter Geoghegan <pg@bowt.ie> writes:

On Fri, May 23, 2025 at 11:42 AM Andres Freund <andres@anarazel.de> wrote:

I'm envisioning this patch series as v19 work, were you
thinking we should be more aggressive?