Fixing memory leaks in postgres_fdw
Running contrib/postgres_fdw's regression tests under Valgrind shows
two different sources of memory leaks. One is a basically-cosmetic
issue in InitPgFdwOptions, but the other is real and troublesome.
The DirectModify code path relies on PG_TRY blocks to ensure that it
releases the PGresult for the foreign modify command, but that can't
work because (at least in cases with RETURNING data) the PGresult has
to survive across successive calls to postgresIterateDirectModify.
If an error occurs in the query in between those steps, we have no
opportunity to clean up.
I thought of fixing this by using a memory context reset callback
to ensure that the PGresult is cleaned up when the executor's context
goes away, and that seems to work nicely (see 0001 attached).
However, I feel like this is just a POC, because now that we have that
concept we might be able to use it elsewhere in postgres_fdw to
eliminate most or even all of its reliance on PG_TRY. That should be
faster as well as much less bug-prone. But I'm putting it up at this
stage for comments, in case anyone thinks this is not the direction to
head in.
0002 attached deals with the other thing. If you apply these
on top of my valgrind-cleanup patches at [1]/messages/by-id/2884224.1748035274@sss.pgh.pa.us, you'll see that
contrib/postgres_fdw's tests go through leak-free.
regards, tom lane
Attachments:
v1-0001-Fix-memory-leakage-in-postgres_fdw-s-DirectModify.patchtext/x-diff; charset=us-ascii; name*0=v1-0001-Fix-memory-leakage-in-postgres_fdw-s-DirectModify.p; name*1=atchDownload+64-31
v1-0002-Silence-leakage-complaint-about-postgres_fdw-s-In.patchtext/x-diff; charset=us-ascii; name*0=v1-0002-Silence-leakage-complaint-about-postgres_fdw-s-In.p; name*1=atchDownload+12-22
Hi,
On Sat, May 24, 2025 at 10:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The DirectModify code path relies on PG_TRY blocks to ensure that it
releases the PGresult for the foreign modify command, but that can't
work because (at least in cases with RETURNING data) the PGresult has
to survive across successive calls to postgresIterateDirectModify.
If an error occurs in the query in between those steps, we have no
opportunity to clean up.
Ugh.
I thought of fixing this by using a memory context reset callback
to ensure that the PGresult is cleaned up when the executor's context
goes away, and that seems to work nicely (see 0001 attached).
However, I feel like this is just a POC, because now that we have that
concept we might be able to use it elsewhere in postgres_fdw to
eliminate most or even all of its reliance on PG_TRY. That should be
faster as well as much less bug-prone. But I'm putting it up at this
stage for comments, in case anyone thinks this is not the direction to
head in.
I think that that is a good idea; +1 for removing the reliance not
only in DirectModify but in other places. I think that that would be
also useful if extending batch INSERT to cases with RETURNING data in
postgres_fdw.
Thanks for working on this!
Best regards,
Etsuro Fujita
Etsuro Fujita <etsuro.fujita@gmail.com> writes:
On Sat, May 24, 2025 at 10:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I thought of fixing this by using a memory context reset callback
to ensure that the PGresult is cleaned up when the executor's context
goes away, and that seems to work nicely (see 0001 attached).
However, I feel like this is just a POC, because now that we have that
concept we might be able to use it elsewhere in postgres_fdw to
eliminate most or even all of its reliance on PG_TRY. That should be
faster as well as much less bug-prone. But I'm putting it up at this
stage for comments, in case anyone thinks this is not the direction to
head in.
I think that that is a good idea; +1 for removing the reliance not
only in DirectModify but in other places. I think that that would be
also useful if extending batch INSERT to cases with RETURNING data in
postgres_fdw.
Here is an attempt at making a bulletproof fix by having all backend
users of libpq go through a wrapper layer that provides the memory
context callback. Perhaps this is more code churn than we want to
accept; I'm not sure. I thought about avoiding most of the niggling
code changes by adding
#define PGresult BEPGresult
#define PQclear BEPQclear
#define PQresultStatus BEPQresultStatus
and so forth at the bottom of the new header file, but I'm afraid
that would create a lot of confusion.
There is a lot yet to do towards getting rid of no-longer-needed
PG_TRYs and other complication, but I decided to stop here pending
comments on the notational decisions I made.
One point that people might find particularly dubious is that
I put the new stuff into a new header file libpq-be-fe.h, rather
than adding it to libpq-be-fe-helpers.h which would seem more
obvious. The reason for that is the code layout in postgres_fdw.
postgres_fdw.h needs to include libpq-fe.h to get the PGresult
typedef, and with these changes it instead needs to get BEPGresult.
But only connection.c currently includes libpq-be-fe-helpers.h,
and I didn't like the idea of making all of postgres_fdw's .c
files include that. Maybe that's not worth worrying about though.
The 0002 patch is the same as before.
regards, tom lane
Attachments:
v2-0001-Fix-memory-leakage-in-postgres_fdw-s-DirectModify.patchtext/x-diff; charset=us-ascii; name*0=v2-0001-Fix-memory-leakage-in-postgres_fdw-s-DirectModify.p; name*1=atchDownload+531-337
v2-0002-Silence-leakage-complaint-about-postgres_fdw-s-In.patchtext/x-diff; charset=us-ascii; name*0=v2-0002-Silence-leakage-complaint-about-postgres_fdw-s-In.p; name*1=atchDownload+12-22
I wrote:
Here is an attempt at making a bulletproof fix by having all backend
users of libpq go through a wrapper layer that provides the memory
context callback. Perhaps this is more code churn than we want to
accept; I'm not sure. I thought about avoiding most of the niggling
code changes by adding
#define PGresult BEPGresult
#define PQclear BEPQclear
#define PQresultStatus BEPQresultStatus
and so forth at the bottom of the new header file, but I'm afraid
that would create a lot of confusion.
I tried that, and it leads to such a less-messy patch that I think
we should probably do it this way, confusion or no. One argument
that can be made in favor is that we don't really want random
notational differences between frontend and backend code that's
doing the same thing.
Also, I'd been struggling with the assumption that we should
palloc the wrapper object before calling PQgetResult; there
doesn't seem to be any nice way to make that transparent to
callers. I realized that we can make it simple as long as
we're willing to assume that allocating with MCXT_ALLOC_NO_OOM
can't throw an error. But we assume that in other usages too.
Hence, v3 attached. The 0002 patch is still the same as before.
regards, tom lane
Attachments:
v3-0001-Fix-memory-leakage-in-postgres_fdw-s-DirectModify.patchtext/x-diff; charset=us-ascii; name*0=v3-0001-Fix-memory-leakage-in-postgres_fdw-s-DirectModify.p; name*1=atchDownload+327-86
v3-0002-Silence-leakage-complaint-about-postgres_fdw-s-In.patchtext/x-diff; charset=us-ascii; name*0=v3-0002-Silence-leakage-complaint-about-postgres_fdw-s-In.p; name*1=atchDownload+12-22
Here's a v4 that is actually more or less feature-complete:
it removes no-longer-needed complexity such as PG_TRY blocks.
I've checked that Valgrind shows no leaks in the postgres_fdw
and dblink tests after applying this on top of my other
patch series.
0001 is like the previous version except that I took out some
inessential simplifications to get to the minimum possible
patch. Then 0002 does all the simplifications. Removal of
PG_TRY blocks implies reindenting a lot of code, but I made
that a separate patch 0003 for ease of review. (0003 would
be a candidate for adding to .git-blame-ignore-revs, perhaps.)
0004 is the old 0002 (still unmodified) and then 0005 cleans
up one remaining leakage observed by Valgrind.
regards, tom lane
Attachments:
v4-0001-Fix-memory-leakage-in-postgres_fdw-s-DirectModify.patchtext/x-diff; charset=us-ascii; name*0=v4-0001-Fix-memory-leakage-in-postgres_fdw-s-DirectModify.p; name*1=atchDownload+307-16
v4-0002-Reap-the-benefits-of-not-having-to-avoid-leaking-.patchtext/x-diff; charset=us-ascii; name*0=v4-0002-Reap-the-benefits-of-not-having-to-avoid-leaking-.p; name*1=atchDownload+83-291
v4-0003-Run-pgindent-on-the-changes-of-the-previous-patch.patchtext/x-diff; charset=us-ascii; name*0=v4-0003-Run-pgindent-on-the-changes-of-the-previous-patch.p; name*1=atchDownload+562-566
v4-0004-Silence-leakage-complaint-about-postgres_fdw-s-In.patchtext/x-diff; charset=us-ascii; name*0=v4-0004-Silence-leakage-complaint-about-postgres_fdw-s-In.p; name*1=atchDownload+12-22
v4-0005-Avoid-leak-when-dblink_connstr_check-fails.patchtext/x-diff; charset=us-ascii; name=v4-0005-Avoid-leak-when-dblink_connstr_check-fails.patchDownload+10-10
Hi,
On 26/05/25 16:36, Tom Lane wrote:
Here's a v4 that is actually more or less feature-complete:
it removes no-longer-needed complexity such as PG_TRY blocks.
I've checked that Valgrind shows no leaks in the postgres_fdw
and dblink tests after applying this on top of my other
patch series.0001 is like the previous version except that I took out some
inessential simplifications to get to the minimum possible
patch. Then 0002 does all the simplifications. Removal of
PG_TRY blocks implies reindenting a lot of code, but I made
that a separate patch 0003 for ease of review. (0003 would
be a candidate for adding to .git-blame-ignore-revs, perhaps.)
0004 is the old 0002 (still unmodified) and then 0005 cleans
up one remaining leakage observed by Valgrind.regards, tom lane
The v4-0001-Fix-memory-leakage-in-postgres_fdw-s-DirectModify.patch
looks good to me.
Just some thoughts on v4-0005-Avoid-leak-when-dblink_connstr_check-fails.patch:
I think that we can delay the allocation a bit more. The
dblink_security_check just use the rconn to pfree in case of a failure,
so I think that we can remove this parameter and move the rconn
allocation to the next if (connname) block. See attached as an example.
--
Matheus Alcantara
Attachments:
delay-rconn-allocation.diffapplication/octet-stream; name=delay-rconn-allocation.diffDownload+11-18
Matheus Alcantara <matheusssilv97@gmail.com> writes:
I think that we can delay the allocation a bit more. The
dblink_security_check just use the rconn to pfree in case of a failure,
so I think that we can remove this parameter and move the rconn
allocation to the next if (connname) block. See attached as an example.
Yeah, I thought of that too. But I think the point of the current
setup is to ensure we have the rconn block before we create the PGconn
object, because if we were to hit OOM after creating the connection,
we'd leak the PGconn, which would be quite bad.
Having said that, the idea that this sequence is OOM-safe is pretty
silly anyway, considering that createNewConnection does a pstrdup,
and creates a new hashtable entry which might require enlarging the
hashtable, and for that matter might even create the hashtable.
So maybe rather than continuing to adhere to a half-baked coding
rule, we need to think of some other way to do that. Maybe it'd be
reasonable to create a hashtable entry with NULL rconn, and then
open the connection? This'd require rejiggering the lookup
code to treat a hashtable entry with NULL rconn as not really
being there. But there's not too many routines touching that
hashtable, so maybe it's not hard.
regards, tom lane
I wrote:
Having said that, the idea that this sequence is OOM-safe is pretty
silly anyway, considering that createNewConnection does a pstrdup,
and creates a new hashtable entry which might require enlarging the
hashtable, and for that matter might even create the hashtable.
So maybe rather than continuing to adhere to a half-baked coding
rule, we need to think of some other way to do that.
Here's an attempt at fixing this properly. I'm posting it as a
standalone patch because I now think this part might be worth
back-patching. The odds of an OOM at just the wrong time aren't
high, but losing track of an open connection seems pretty bad.
regards, tom lane
Attachments:
v5-0001-Avoid-resource-leaks-when-a-dblink-connection-fai.patchtext/x-diff; charset=us-ascii; name*0=v5-0001-Avoid-resource-leaks-when-a-dblink-connection-fai.p; name*1=atchDownload+42-37
On 28/05/25 17:56, Tom Lane wrote:
I wrote:
Having said that, the idea that this sequence is OOM-safe is pretty
silly anyway, considering that createNewConnection does a pstrdup,
and creates a new hashtable entry which might require enlarging the
hashtable, and for that matter might even create the hashtable.
So maybe rather than continuing to adhere to a half-baked coding
rule, we need to think of some other way to do that.Here's an attempt at fixing this properly. I'm posting it as a
standalone patch because I now think this part might be worth
back-patching. The odds of an OOM at just the wrong time aren't
high, but losing track of an open connection seems pretty bad.
The v5-0001 makes sense to me. I think that now it follows a similar
approach with postgres_fdw that first create the hash entry and them set
the connection on the hash entry.
I've also reviewed the remaining patches, v4-0002, v4-0003 and v4-0004
and it all looks reasonable to me. +1 for going forward with these
patches.
The only point is that I've just tried to apply the v5-0001 on top of
the previous v4-000X patches and is raising an error:
❯❯❯ git am v5-0001-Avoid-resource-leaks-when-a-dblink-connection-fai.patch
Applying: Avoid resource leaks when a dblink connection fails.
error: patch failed: contrib/dblink/dblink.c:105
error: contrib/dblink/dblink.c: patch does not apply
Patch failed at 0001 Avoid resource leaks when a dblink connection fails.
--
Matheus Alcantara
Matheus Alcantara <matheusssilv97@gmail.com> writes:
The only point is that I've just tried to apply the v5-0001 on top of
the previous v4-000X patches and is raising an error:
git am v5-0001-Avoid-resource-leaks-when-a-dblink-connection-fai.patch
Applying: Avoid resource leaks when a dblink connection fails.
error: patch failed: contrib/dblink/dblink.c:105
error: contrib/dblink/dblink.c: patch does not apply
Patch failed at 0001 Avoid resource leaks when a dblink connection fails.
Yeah, it's not intended to be done in that order: the v5-0001 patch is
an independent thing. I anticipate I'll have to rebase the other
patches after I push v5-0001.
Thanks for looking at it!
regards, tom lane
I wrote:
Yeah, it's not intended to be done in that order: the v5-0001 patch is
an independent thing. I anticipate I'll have to rebase the other
patches after I push v5-0001.
Pushed v5-0001, and here are rebased versions of the other four
patches, mostly so that the cfbot knows what is the patch-of-record.
(The rebasing is completely trivial; I'm surprised that "git am"
fails to cope.)
regards, tom lane
Attachments:
v6-0001-Fix-memory-leakage-in-postgres_fdw-s-DirectModify.patchtext/x-diff; charset=us-ascii; name*0=v6-0001-Fix-memory-leakage-in-postgres_fdw-s-DirectModify.p; name*1=atchDownload+307-16
v6-0002-Reap-the-benefits-of-not-having-to-avoid-leaking-.patchtext/x-diff; charset=us-ascii; name*0=v6-0002-Reap-the-benefits-of-not-having-to-avoid-leaking-.p; name*1=atchDownload+83-291
v6-0003-Run-pgindent-on-the-changes-of-the-previous-patch.patchtext/x-diff; charset=us-ascii; name*0=v6-0003-Run-pgindent-on-the-changes-of-the-previous-patch.p; name*1=atchDownload+562-566
v6-0004-Silence-leakage-complaint-about-postgres_fdw-s-In.patchtext/x-diff; charset=us-ascii; name*0=v6-0004-Silence-leakage-complaint-about-postgres_fdw-s-In.p; name*1=atchDownload+12-22
I wrote:
Pushed v5-0001, and here are rebased versions of the other four
patches, mostly so that the cfbot knows what is the patch-of-record.
Finally, here's a minimalistic version of the original v1-0001
patch that I think we could safely apply to fix the DirectModify
problem in the back branches. I rejiggered it to not depend on
inventing MemoryContextUnregisterResetCallback, so that there's
not hazards of minor-version skew between postgres_fdw and the
main backend. This will of course not fix any other PGresult-leakage
cases that may exist, but I'm content to fix the known problem
in back branches.
(Patch is labeled .txt so that cfbot doesn't think it's the
patch-of-record.)
regards, tom lane
Attachments:
v7-backpatchable-fix.patch.txttext/x-diff; charset=us-ascii; name=v7-backpatchable-fix.patch.txtDownload+35-27
On Thu, May 29, 2025 at 2:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
Pushed v5-0001, and here are rebased versions of the other four
patches, mostly so that the cfbot knows what is the patch-of-record.Finally, here's a minimalistic version of the original v1-0001
patch that I think we could safely apply to fix the DirectModify
problem in the back branches. I rejiggered it to not depend on
inventing MemoryContextUnregisterResetCallback, so that there's
not hazards of minor-version skew between postgres_fdw and the
main backend. This will of course not fix any other PGresult-leakage
cases that may exist, but I'm content to fix the known problem
in back branches.(Patch is labeled .txt so that cfbot doesn't think it's the
patch-of-record.)
Sounds reasonable to me. +1 for going forward with these patches.
--
Matheus Alcantara
Matheus Alcantara <matheusssilv97@gmail.com> writes:
Sounds reasonable to me. +1 for going forward with these patches.
I got cold feet about applying the full patchset to v18 --- it's
kind of a large change and it's not fixing any known bug that the
minimal patch doesn't, so it feels like something not to do after
beta1. So I pushed the minimal patch in all branches. Here is
a rebased-on-top-of-that version of the full patchset, which
I plan to push once v19 development opens.
regards, tom lane
Attachments:
v8-0001-Create-infrastructure-to-reliably-prevent-leakage.patchtext/x-diff; charset=us-ascii; name*0=v8-0001-Create-infrastructure-to-reliably-prevent-leakage.p; name*1=atchDownload+309-43
v8-0002-Reap-the-benefits-of-not-having-to-avoid-leaking-.patchtext/x-diff; charset=us-ascii; name*0=v8-0002-Reap-the-benefits-of-not-having-to-avoid-leaking-.p; name*1=atchDownload+83-275
v8-0003-Run-pgindent-on-the-changes-of-the-previous-patch.patchtext/x-diff; charset=us-ascii; name*0=v8-0003-Run-pgindent-on-the-changes-of-the-previous-patch.p; name*1=atchDownload+553-557
v8-0004-Silence-leakage-complaint-about-postgres_fdw-s-In.patchtext/x-diff; charset=us-ascii; name*0=v8-0004-Silence-leakage-complaint-about-postgres_fdw-s-In.p; name*1=atchDownload+12-22
On Fri, May 30, 2025 at 2:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Finally, here's a minimalistic version of the original v1-0001
patch that I think we could safely apply to fix the DirectModify
problem in the back branches. I rejiggered it to not depend on
inventing MemoryContextUnregisterResetCallback, so that there's
not hazards of minor-version skew between postgres_fdw and the
main backend.
Seems reasonable.
Thanks for updating the patch (and pushing it in all supported versions)!
Best regards,
Etsuro Fujita
Etsuro Fujita <etsuro.fujita@gmail.com> writes:
On Fri, May 30, 2025 at 2:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Finally, here's a minimalistic version of the original v1-0001
patch that I think we could safely apply to fix the DirectModify
problem in the back branches. I rejiggered it to not depend on
inventing MemoryContextUnregisterResetCallback, so that there's
not hazards of minor-version skew between postgres_fdw and the
main backend.
Seems reasonable.
Pushed the larger patchset now. I had to do a little more work
to get it to play with 112faf137, but it wasn't hard.
regards, tom lane