Unlimited memory consumption with long-lived connection
If I execute the attached python script against a postgresql 15.2 (Ubuntu
15.2-1.pgdg22.10+1) server, with the default configuration (eg shared_buffers =
128M), then the server memory usage goes up and up, apparently endlessly. After
about 10 minutes (on my laptop) pg_top shows RES memory usage for the back-end
handling the connection as greater than 1 gigabyte, which seems far too high
given the server configuration. The script just performs the same SELECT
endlessly in a loop. The memory is released when the script is killed.
Platform: Ubuntu 22.10; Linux version 5.19.0-31-generic; x86-64.
PS: The testcase was reduced from a script that kept a connection open for a
long time in order to LISTEN, and would execute a query using the same
connection every time there was a notification on the channel. It consumed ever
more memory to the point of crashing the postgresql server. Changing the script
to perform the query using a new short-lived connection was an effective workaround.
Attachments:
mem.pytext/x-python; charset=UTF-8; name=mem.pyDownload
On 21/02/2023 14:57, Duncan Sands wrote:
If I execute the attached python script against a postgresql 15.2 (Ubuntu
15.2-1.pgdg22.10+1) server, with the default configuration (eg shared_buffers =
128M), then the server memory usage goes up and up, apparently endlessly. After
about 10 minutes (on my laptop) pg_top shows RES memory usage for the back-end
handling the connection as greater than 1 gigabyte, which seems far too high
given the server configuration. The script just performs the same SELECT
endlessly in a loop. The memory is released when the script is killed.Platform: Ubuntu 22.10; Linux version 5.19.0-31-generic; x86-64.
PS: The testcase was reduced from a script that kept a connection open for a
long time in order to LISTEN, and would execute a query using the same
connection every time there was a notification on the channel. It consumed ever
more memory to the point of crashing the postgresql server. Changing the script
to perform the query using a new short-lived connection was an effective workaround.
I can reproduce this on my laptop. When I turn 'jit=off', or disable JIT
inlining with 'jit_inline_above_cost = -1', the leak stops, or at least
gets small enough that I don't readily see it with 'top' anymore. So it
seems to be related to LLVM inlining. I'm using LLVM and clang 14.
To track down the leak, I started postgres server with Valgrind, and let
the script run for 40 iterations with Valgrind. It did report some leaks
from LLVM inline functions (attached), but I'm not very familiar with
this code so I'm not sure which ones might be genuine leaks or how to
fix them.
- Heikki
Attachments:
valgrind-inline-leak.txttext/plain; charset=UTF-8; name=valgrind-inline-leak.txtDownload
On Tue, Feb 21, 2023 at 04:32:02PM +0200, Heikki Linnakangas wrote:
On 21/02/2023 14:57, Duncan Sands wrote:
If I execute the attached python script against a postgresql 15.2 (Ubuntu
15.2-1.pgdg22.10+1) server, with the default configuration (eg shared_buffers =
128M), then the server memory usage goes up and up, apparently endlessly. After
about 10 minutes (on my laptop) pg_top shows RES memory usage for the back-end
handling the connection as greater than 1 gigabyte, which seems far too high
given the server configuration. The script just performs the same SELECT
endlessly in a loop. The memory is released when the script is killed.Platform: Ubuntu 22.10; Linux version 5.19.0-31-generic; x86-64.
PS: The testcase was reduced from a script that kept a connection open for a
long time in order to LISTEN, and would execute a query using the same
connection every time there was a notification on the channel. It consumed ever
more memory to the point of crashing the postgresql server. Changing the script
to perform the query using a new short-lived connection was an effective workaround.
It sounds like the same as the issue here:
https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
There are patches proposed here, which fixed the issue for me.
But I've always been suspicious that there may be a 2nd, undiagnosed
issue lurking behind this one...
/messages/by-id/20210417021602.7dilihkdc7oblrf7@alap3.anarazel.de
Hi Justin,
PS: The testcase was reduced from a script that kept a connection open for a
long time in order to LISTEN, and would execute a query using the same
connection every time there was a notification on the channel. It consumed ever
more memory to the point of crashing the postgresql server. Changing the script
to perform the query using a new short-lived connection was an effective workaround.It sounds like the same as the issue here:
https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
I agree that it is likely the same issue.
There are patches proposed here, which fixed the issue for me.
But I've always been suspicious that there may be a 2nd, undiagnosed
issue lurking behind this one...
/messages/by-id/20210417021602.7dilihkdc7oblrf7@alap3.anarazel.de
Thanks. Since those patches are just a workaround, and I already worked around
it, I don't plan to give them a whirl. On the other hand, if Andres ever gets
round to implementing approach (1) from
/messages/by-id/20210417021602.7dilihkdc7oblrf7@alap3.anarazel.de
then I'd be happy to give it a whirl. I'm rusty now but I used to be quite
expert in LLVM, so I might even be able to do some patch review.
Best wishes, Duncan.
Hi,
On 2023-02-22 08:53:41 -0600, Justin Pryzby wrote:
On Tue, Feb 21, 2023 at 04:32:02PM +0200, Heikki Linnakangas wrote:
On 21/02/2023 14:57, Duncan Sands wrote:
If I execute the attached python script against a postgresql 15.2 (Ubuntu
15.2-1.pgdg22.10+1) server, with the default configuration (eg shared_buffers =
128M), then the server memory usage goes up and up, apparently endlessly. After
about 10 minutes (on my laptop) pg_top shows RES memory usage for the back-end
handling the connection as greater than 1 gigabyte, which seems far too high
given the server configuration. The script just performs the same SELECT
endlessly in a loop. The memory is released when the script is killed.Platform: Ubuntu 22.10; Linux version 5.19.0-31-generic; x86-64.
PS: The testcase was reduced from a script that kept a connection open for a
long time in order to LISTEN, and would execute a query using the same
connection every time there was a notification on the channel. It consumed ever
more memory to the point of crashing the postgresql server. Changing the script
to perform the query using a new short-lived connection was an effective workaround.It sounds like the same as the issue here:
https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
There are patches proposed here, which fixed the issue for me.
But I've always been suspicious that there may be a 2nd, undiagnosed
issue lurking behind this one...
Any more details?
/messages/by-id/20210417021602.7dilihkdc7oblrf7@alap3.anarazel.de
A slightly edited / rebased version is at /messages/by-id/20221101055132.pjjsvlkeo4stbjkq@awork3.anarazel.de
I'd feel a lot better applying the patch if there'd be a bit more review. If
none is forthcoming, I'm somewhat inclined to just apply it to HEAD and later
decide whether we'd want to backpatch.
Greetings,
Andres Freund
On Wed, Feb 22, 2023 at 10:28:11AM -0800, Andres Freund wrote:
On 2023-02-22 08:53:41 -0600, Justin Pryzby wrote:
There are patches proposed here, which fixed the issue for me.
But I've always been suspicious that there may be a 2nd, undiagnosed
issue lurking behind this one...Any more details?
No. My suspicion is because the process is killed by OOM roughly every
24h, and the leak didn't seem to be big enough to cause that.
Maybe the leak is nonlinear or not completely deterministic or something
sometimes exacerbates the original issue, or maybe there's only one
issue that's 90% but not 100% understood, or maybe there's a 2nd issue
that's latent once the fix is applied for the known issue.
There's also this report:
/messages/by-id/1605804350439-0.post@n3.nabble.com
..which offers no details or reponse, and could easily be a leak in
someting other than JIT, or postgres, or not a leak at all.
--
Justin
On Wed, Feb 22, 2023 at 10:28:11AM -0800, Andres Freund wrote:
I'd feel a lot better applying the patch if there'd be a bit more review. If
none is forthcoming, I'm somewhat inclined to just apply it to HEAD and later
decide whether we'd want to backpatch.
Previously I ran the patches in production. Not sure what else I could
do, but it seems important to do something.
--
Justin