Report a potential memory leak in setup_config()

Started by Nonameabout 4 years ago15 messagesbugs
Jump to latest
#1Noname
wliang@stu.xidian.edu.cn

Hiall,

IfindapotentialmemoryleakinPostgresSQL14.1, whichisinthefunctionsetup_config (./src/bin/initdb/initdb.c).

Specifically, atline1095, functionpretty_wal_size() iscalled, whichallocatesachunkofmemorybyusing pg_mallocandreturnsit. However, thereturnchunkisdirectlypassedtosnprintfasits3rdparameter. Asaresult, thereisamemoryleak.

1053staticvoid
1054setup_config(void)
1055 {
...
1094 snprintf(repltok, sizeof(repltok), "min_wal_size = %s",
1095 pretty_wal_size(DEFAULT_MIN_WAL_SEGS));
...
1347 }

1036staticchar *
1037pretty_wal_size(intsegment_count)
1038 {
1039 int sz = wal_segment_size_mb * segment_count;
1040 char *result = pg_malloc(14);
1041
1042 if ((sz % 1024) == 0)
1043 snprintf(result, 14, "%dGB", sz / 1024);
1044 else
1045 snprintf(result, 14, "%dMB", sz);
1046
1047 returnresult;
1048 }

Webelievewecanfixthisproblembyaddingavariabletoreceivethereturnofpretty_wal_size() andemployingpg_free() tofreetheleakedmemory.

I'mlookingforwardtoyourconfirmation.

Best,

Wentao

#2Daniel Gustafsson
daniel@yesql.se
In reply to: Noname (#1)
Re: Report a potential memory leak in setup_config()

On 15 Feb 2022, at 02:49, wliang@stu.xidian.edu.cn wrote:

Specifically, at line 1095, function pretty_wal_size() is called, which allocates a chunk of memory by using pg_malloc and returns it. However, the return chunk is directly passed to snprintf as its 3rd parameter. As a result, there is a memory leak.

PostgreSQL isn't all too concerned about small static leaks in short lived
programs, like initdb. Memory will be freed shortly when the program exits.
Complicating the code to save 28 bytes seems hardly worth it, but if you feel
strongly about it I suggest proposing a patch to fix it.

--
Daniel Gustafsson https://vmware.com/

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#2)
Re: Report a potential memory leak in setup_config()

Daniel Gustafsson <daniel@yesql.se> writes:

On 15 Feb 2022, at 02:49, wliang@stu.xidian.edu.cn wrote:

Specifically, at line 1095, function pretty_wal_size() is called, which allocates a chunk of memory by using pg_malloc and returns it. However, the return chunk is directly passed to snprintf as its 3rd parameter. As a result, there is a memory leak.

PostgreSQL isn't all too concerned about small static leaks in short lived
programs, like initdb. Memory will be freed shortly when the program exits.
Complicating the code to save 28 bytes seems hardly worth it, but if you feel
strongly about it I suggest proposing a patch to fix it.

Yeah. A quick run of initdb under valgrind reports

==3051337== LEAK SUMMARY:
==3051337== definitely lost: 893,446 bytes in 47 blocks
==3051337== indirectly lost: 4,490 bytes in 51 blocks
==3051337== possibly lost: 0 bytes in 0 blocks
==3051337== still reachable: 2,403 bytes in 22 blocks
==3051337== suppressed: 0 bytes in 0 blocks

It might be worth trying to knock that down a bit, but I wouldn't
start with a one-time leak of 28 bytes. It looks like the biggest
offender is that we don't bother trying to reclaim the lines
malloc'd by readfile() and replace_token(). Fixing that is *maybe*
worth the trouble, but TBH no one has complained about initdb's
memory consumption.

regards, tom lane

#4Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#3)
Re: Report a potential memory leak in setup_config()

Hi,

On 2022-02-15 11:33:26 -0500, Tom Lane wrote:

It might be worth trying to knock that down a bit, but I wouldn't
start with a one-time leak of 28 bytes. It looks like the biggest
offender is that we don't bother trying to reclaim the lines
malloc'd by readfile() and replace_token(). Fixing that is *maybe*
worth the trouble, but TBH no one has complained about initdb's
memory consumption.

This reminds me: I recently noticed that the replace_token() stuff is a good
portion of the userspace time on windows. IIRC in debug builds only, I assume
because of some slow checking in the debug C runtime. It doesn't matter too
much overall, because the really expensive bit is the filesystem operations
being very slow (*).

It's a bit insane that we allocate the lines[] quite so many times, when
processing the same file. ~12k lines in postgres.bki * 8 replace_token() in
bootstrap_template1() starts to add up. The replacement patterns either are
compile time constants which we just should handle in genbki.pl, or have
exactly 1 replacement....

Greetings,

Andres Freund

*it looks like ntfs does *synchronous* journalling on every metadata
operation, which seems odd for an extremely widely used filesystem in
2022. But what do I know.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#4)
Re: Report a potential memory leak in setup_config()

Andres Freund <andres@anarazel.de> writes:

On 2022-02-15 11:33:26 -0500, Tom Lane wrote:

It might be worth trying to knock that down a bit, but I wouldn't
start with a one-time leak of 28 bytes. It looks like the biggest
offender is that we don't bother trying to reclaim the lines
malloc'd by readfile() and replace_token(). Fixing that is *maybe*
worth the trouble, but TBH no one has complained about initdb's
memory consumption.

It's a bit insane that we allocate the lines[] quite so many times, when
processing the same file.

Yeah, I noticed that --- why don't we reuse the array of pointers?
Not sure it'd save much compared to freeing the strings, but it is
mighty low-hanging fruit.

The replacement patterns either are
compile time constants which we just should handle in genbki.pl, or have
exactly 1 replacement....

Mmm, really? I thought most of them were data that we don't know
until initdb runs. Anything that really is known at build time,
sure, genbki.pl ought to take care of.

regards, tom lane

#6Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#5)
Re: Report a potential memory leak in setup_config()

Hi,

On 2022-02-15 20:45:46 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2022-02-15 11:33:26 -0500, Tom Lane wrote:

It might be worth trying to knock that down a bit, but I wouldn't
start with a one-time leak of 28 bytes. It looks like the biggest
offender is that we don't bother trying to reclaim the lines
malloc'd by readfile() and replace_token(). Fixing that is *maybe*
worth the trouble, but TBH no one has complained about initdb's
memory consumption.

It's a bit insane that we allocate the lines[] quite so many times, when
processing the same file.

Yeah, I noticed that --- why don't we reuse the array of pointers?
Not sure it'd save much compared to freeing the strings, but it is
mighty low-hanging fruit.

The number of replacements is low enough that the memory for the changed
strings themselves doesn't actually matter much, I think. replace_token()
doesn't allocate memory for unchanged strings...

I think we'd see memory usage of quite different proportions otherwise - my
postgres.bki is 900kB. 9 copies of that would start to add up...

for k in NAMEDATALEN SIZEOF_POINTER ALIGNOF_POINTER FLOAT8PASSBYVAL POSTGRES ENCODING LC_COLLATE LC_CTYPE;do echo $k: $(grep -c $k ./src/backend/catalog/postgres.bki);done
NAMEDATALEN: 5
SIZEOF_POINTER: 2
ALIGNOF_POINTER: 2
FLOAT8PASSBYVAL: 8
POSTGRES: 1
ENCODING: 1
LC_COLLATE: 1
LC_CTYPE: 1

The replacement patterns either are
compile time constants which we just should handle in genbki.pl, or have
exactly 1 replacement....

Mmm, really? I thought most of them were data that we don't know
until initdb runs. Anything that really is known at build time,
sure, genbki.pl ought to take care of.

Only POSTGRES, ENCODING, LC_COLLATE, LC_CTYPE of the above list are runtime
variable, right? And those just affect two rows in total...

I was pondering initdb's design a bunch lately. So I started a -hackers thread:
/messages/by-id/20220216021219.ygzrtb3hd5bn7olz@alap3.anarazel.de

Greetings,

Andres Freund

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: Report a potential memory leak in setup_config()

Andres Freund <andres@anarazel.de> writes:

I was pondering initdb's design a bunch lately. So I started a -hackers thread:
/messages/by-id/20220216021219.ygzrtb3hd5bn7olz@alap3.anarazel.de

Yeah, I read that. I think making the backend read these files directly
makes a ton of sense, so I'm no longer very excited about improving the
performance or memory consumption of the replace_token stuff. I think
we should put this thread on hold until someone does that, and then
maybe take another look to see if initdb still leaks anything worth
worrying about.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
Re: Report a potential memory leak in setup_config()

I wrote:

Yeah, I read that. I think making the backend read these files directly
makes a ton of sense, so I'm no longer very excited about improving the
performance or memory consumption of the replace_token stuff.

Actually, wait a second. I grant your point about postgres.bki, but
a whole lot of the replace_token calls in initdb are messing with
the configuration files, which (a) is something I don't see changing,
and (b) pretty much none of that could be pushed back to build time.
So even though the config files are a lot smaller than postgres.bki,
maybe there's still a point in polishing replace_token a bit.

regards, tom lane

#9Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#8)
Re: Report a potential memory leak in setup_config()

Hi,

On 2022-02-15 21:38:11 -0500, Tom Lane wrote:

I wrote:

Yeah, I read that. I think making the backend read these files directly
makes a ton of sense, so I'm no longer very excited about improving the
performance or memory consumption of the replace_token stuff.

Actually, wait a second. I grant your point about postgres.bki, but
a whole lot of the replace_token calls in initdb are messing with
the configuration files, which (a) is something I don't see changing,
and (b) pretty much none of that could be pushed back to build time.
So even though the config files are a lot smaller than postgres.bki,
maybe there's still a point in polishing replace_token a bit.

Agreed. postgresql.conf.sample isn't that small either and I can't see that
moving to the backend.

When I'd looked at this last I apparently (looking at git stash, I ended up
discarding this) decided that the best way would be to change replace_token's
API to one that just processes one line at a time, with an outer loop that
processes all tokens in a line. I'm not really sure why anymore.

Definitely not finished and contains a lot of timing cruft... But it does
markedly reduce the memory usage, despite only converting bootstrap.bki
replacement:

without REPLACE_FAST:
==2217045== HEAP SUMMARY:
==2217045== in use at exit: 894,012 bytes in 119 blocks
==2217045== total heap usage: 26,284 allocs, 26,165 frees, 3,127,341 bytes allocated
==2217045==
==2217045== LEAK SUMMARY:
==2217045== definitely lost: 887,136 bytes in 46 blocks
==2217045== indirectly lost: 4,453 bytes in 50 blocks
==2217045== possibly lost: 0 bytes in 0 blocks
==2217045== still reachable: 2,423 bytes in 23 blocks
==2217045== suppressed: 0 bytes in 0 blocks
==2217045== Rerun with --leak-check=full to see details of leaked memory

with:
==2215712== HEAP SUMMARY:
==2215712== in use at exit: 120,490 bytes in 90 blocks
==2215712== total heap usage: 26,257 allocs, 26,167 frees, 2,393,822 bytes allocated
==2215712==
==2215712== LEAK SUMMARY:
==2215712== definitely lost: 116,096 bytes in 38 blocks
==2215712== indirectly lost: 1,971 bytes in 29 blocks
==2215712== possibly lost: 0 bytes in 0 blocks
==2215712== still reachable: 2,423 bytes in 23 blocks
==2215712== suppressed: 0 bytes in 0 blocks
==2215712== Rerun with --leak-check=full to see details of leaked memory

I think I was a bit underwhelmed by the wins on the runtime side on linux. A
profile without children looks quite different before / after. But initdb's
own cpu time is just such a small part of the overall runtime...

profile before:

-   46.91%     0.00%  initdb   initdb                [.] main
   - main
      - 42.99% initialize_data_directory
         - 25.45% bootstrap_template1
            + 10.59% replace_token
            + 7.98% readfile
            + 3.47% __GI___libc_free (inlined)
            + 2.63% __GI__IO_fputs (inlined)
            + 0.14% pclose_check
            + 0.14% progress
profile after:
-   42.17%     0.00%  initdb   initdb                [.] main
   - main
      - 37.98% initialize_data_directory
         - 20.32% bootstrap_template1
            + 9.11% readfile
            + 6.37% replace_one_token
            + 3.16% __GI__IO_fputs (inlined)
            + 0.79% __GI___libc_free (inlined)
            + 0.17% check_ok
            + 0.14% progress
            + 0.10% pclose_check

after both the biggest remaining cpu user is the strstr().

Both profiles are with the change included in the patch to avoid fflush()ing
for each line. That's noticeable as it turns out.

Anyway, I think this angle is kind of moot if we move this stuff to the
backend, and likely to don't perform replacement on any of the big files.

Greetings,

Andres Freund

Attachments:

initdb-stuff.difftext/x-diff; charset=us-asciiDownload+181-3
#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#9)
Re: Report a potential memory leak in setup_config()

Andres Freund <andres@anarazel.de> writes:

When I'd looked at this last I apparently (looking at git stash, I ended up
discarding this) decided that the best way would be to change replace_token's
API to one that just processes one line at a time, with an outer loop that
processes all tokens in a line. I'm not really sure why anymore.

Hmm, I did it the other way, as attached. This gets it down to

==3254266== LEAK SUMMARY:
==3254266== definitely lost: 342 bytes in 17 blocks
==3254266== indirectly lost: 152 bytes in 2 blocks
==3254266== possibly lost: 0 bytes in 0 blocks
==3254266== still reachable: 2,403 bytes in 22 blocks
==3254266== suppressed: 0 bytes in 0 blocks

It seems that actually the pointer arrays *are* a big chunk of
the leakage, because the individual strings get freed in the
output loops! That's a bit ugly but I feel no need to change it.

regards, tom lane

Attachments:

alternate-initdb-cleanup.patchtext/x-diff; charset=us-ascii; name=alternate-initdb-cleanup.patchDownload+70-78
#11Noname
wliang@stu.xidian.edu.cn
In reply to: Tom Lane (#10)
Re: Report a potential memory leak in setup_config()

Hi,
I have read all your discussion.
Thank you so much for your steady effort on this problem.
I would like to know the name of the software which is used to calculate the size of leaked memory.
It could be a great help to me to have a powerful tool like that.

Thanks again.

regards,
Wentao

#12Noname
wliang@stu.xidian.edu.cn
In reply to: Noname (#1)
Re: Report a potential memory leak in setup_config()

By the way,

This html files are the bug reports produced by static detection of clang's MallocChecker.

All of those bugs are the memory leak bugs in the initdb.c.

Some of them shares the same reason to the one I reported yesterday, while some of them are not.

Since you guys have already got the patch and fixed the problem, I don't know if it is too late to send this email.

However, if you can confirm those bugs really exist, it will be a great help to my study.

Once again, I want to give you my sincerely thanks.

regards,

Wentao

Attachments:

report-0c40f6.htmltext/html; name=report-0c40f6.htmlDownload
report-4ded84.htmltext/html; name=report-4ded84.htmlDownload
report-55f328.htmltext/html; name=report-55f328.htmlDownload
report-a28ac0.htmltext/html; name=report-a28ac0.htmlDownload
report-da7c7f.htmltext/html; name=report-da7c7f.htmlDownload
report-e80982.htmltext/html; name=report-e80982.htmlDownload
#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#11)
Re: Report a potential memory leak in setup_config()

wliang@stu.xidian.edu.cn writes:

I would like to know the name of the software which is used to calculate the size of leaked memory.

I was using valgrind, and I think Andres was too:

https://valgrind.org

regards, tom lane

#14Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#10)
Re: Report a potential memory leak in setup_config()

Hi,

On 2022-02-15 23:30:07 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

When I'd looked at this last I apparently (looking at git stash, I ended up
discarding this) decided that the best way would be to change replace_token's
API to one that just processes one line at a time, with an outer loop that
processes all tokens in a line. I'm not really sure why anymore.

Hmm, I did it the other way, as attached.

My goal when I did this was to improve the performance, rather than reduce the
memory usage (this was a few months back). It's clearly better to perform all
the replacements of a line soon after each other, rather than iterate over all
the lines, once for each replacement. The latter is guaranteed to not have the
data in L2/L1 anymore.

But if we move to not needing replacement for postgres.bki anymore, that's not
an important goal anymore.

Not that it matters anymore: At least for postgres.bki, it doesn't seem to
make sense to use a line based approach in the first place? Seems like it
should really be a processing the input file as a whole, doing all the
replacements, into a resizable output buffer.

I didn't review it in detail, but I think your approach makes sense.

This gets it down to

==3254266== LEAK SUMMARY:
==3254266== definitely lost: 342 bytes in 17 blocks
==3254266== indirectly lost: 152 bytes in 2 blocks
==3254266== possibly lost: 0 bytes in 0 blocks
==3254266== still reachable: 2,403 bytes in 22 blocks
==3254266== suppressed: 0 bytes in 0 blocks

That's presumably not about the approach, but about doing it for all
replacements, rather than just bootstrap, like I did :)

It seems that actually the pointer arrays *are* a big chunk of
the leakage, because the individual strings get freed in the
output loops!

Right, isn't that what I had said?

Greetings,

Andres Freund

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#14)
Re: Report a potential memory leak in setup_config()

Andres Freund <andres@anarazel.de> writes:

On 2022-02-15 23:30:07 -0500, Tom Lane wrote:

Hmm, I did it the other way, as attached.

My goal when I did this was to improve the performance, rather than reduce the
memory usage (this was a few months back). It's clearly better to perform all
the replacements of a line soon after each other, rather than iterate over all
the lines, once for each replacement. The latter is guaranteed to not have the
data in L2/L1 anymore.

I'm skeptical that that effect is large enough to be interesting ...

But if we move to not needing replacement for postgres.bki anymore, that's not
an important goal anymore.

... and as you say, it won't matter if we push this processing to
the backend. So I'd prefer to go with the less-invasive change.

Not that it matters anymore: At least for postgres.bki, it doesn't seem to
make sense to use a line based approach in the first place? Seems like it
should really be a processing the input file as a whole, doing all the
replacements, into a resizable output buffer.

If we push this to the backend then it'd all have to be rethought.
I kind of like Peter's idea that maybe the run-time-variable items
could be handled by doing UPDATEs after the initial bootstrap.
(Very far down the road, maybe that would put us in a position where
the bootstrap phase could be replaced by copying some prefab data files.)

It seems that actually the pointer arrays *are* a big chunk of
the leakage, because the individual strings get freed in the
output loops!

Right, isn't that what I had said?

I hadn't absorbed the existence of the per-string free() calls;
the fact that the reported leakage was close to the size of
postgres.bki misled me into thinking we were leaking the
strings too.

regards, tom lane