Understanding Postgres Memory Usage
I have an application that uses Postgres 9.3 as the primary datastore.
Like any real-life application, it's not all roses—There are many ugly,
convoluted, and inefficient queries.
Some of these queries use quite a bit of memory. I've observed a
"high-water mark" behavior in memory usage: running a query increases the
worker memory by many MBs (beyond shared buffers), but the memory is not
released until the connection is closed. For example, here's the memory
usage on my test server when running a query once and leaving the
connection open.
$ free -h # Before the query
total used free shared buffers cached
Mem: 7.8G 5.2G 2.6G 212M 90M 4.9G
-/+ buffers/cache: 248M 7.6G
Swap: 0B 0B 0B
$ free -h # After the query
total used free shared buffers cached
Mem: 7.8G 5.3G 2.5G 212M 90M 4.9G
-/+ buffers/cache: 312M 7.5G
Swap: 0B 0B 0B
$ sudo -u postgres smem -trs uss
PID User Command Swap USS PSS
RSS
8263 postgres postgres: postgres souschef 0 200204 203977
209540
8133 postgres /usr/lib/postgresql/9.3/bin 0 50456 61090
74596
8266 postgres /usr/bin/python /usr/bin/sm 0 5840 6261
7460
8138 postgres postgres: autovacuum launch 0 776 1146
2968
8139 postgres postgres: stats collector p 0 300 470
1872
8135 postgres postgres: checkpointer proc 0 148 342
1880
8137 postgres postgres: wal writer proces 0 140 322
1812
8136 postgres postgres: writer process 0 132 6814
15140
-------------------------------------------------------------------------------
8 1 0 257996 280422
315268
This is proving to be very troublesome on my production server because I
use connection pooling (so connections remain open indefinitely) and the
connection memory seems to rise without end, to the point where 25 open
connections OOM'd a 4GB server.
So I have a couple questions: Is this high-water mark memory behavior
expected? If so, how can I identify the queries that are using lots of
memory and driving the high-water mark upwards?
I understand that this post is rather vague, I didn't want to talk your ear
off with specifics in case this was pretty basic, well-understood
behavior. If necessary, I can follow up with an email diving into the
specifics of what I'm observing.
— Theron
$ free -h # Before the query
total used free shared buffers cached
Mem: 7.8G 5.2G 2.6G 212M 90M 4.9G
-/+ buffers/cache: 248M 7.6G
Swap: 0B 0B 0B
$ free -h # After the query
total used free shared buffers cached
Mem: 7.8G 5.3G 2.5G 212M 90M 4.9G
-/+ buffers/cache: 312M 7.5G
Swap: 0B 0B 0B
[I.K >> ] Are you talking about buffers/cache increased? AFAIK this memory is used by kernel as buffer before any block device (HDD for example).
Postgres does not use this memory directly, it simply reads data from block device, and kernel caches it. Process can’t be OOMed because of it.
I am sure you should configure your Postgres to NEVER exceed available RAM. You may use tools like (http://pgtune.leopard.in.ua/) or calculate it manually.
I do not remember exact formula, but it should be something like “work_mem*max_connections + shared_buffers” and it should be around 80% of your machine RAM (minus RAM used by other processes and kernel).
It will save you from OOM.
If you face performance bottleneck after it, you fix it using tools like “log_min_duration_statement”, “track_io_timing” and system-provided tools.
Ilya Kazakevich
JetBrains
<http://www.jetbrains.com/> http://www.jetbrains.com
The Drive to Develop
Theron Luhn <theron@luhn.com> writes:
I have an application that uses Postgres 9.3 as the primary datastore.
9.3.which? We do fix memory leaks from time to time ...
Some of these queries use quite a bit of memory. I've observed a
"high-water mark" behavior in memory usage: running a query increases the
worker memory by many MBs (beyond shared buffers), but the memory is not
released until the connection is closed.
Hm. I'm not familiar with smem, but assuming that that USS column
really is process-private space, that definitely looks bad.
If it's not an outright leak, it's probably consumption of cache space.
We cache stuff that we've read from system catalogs, so sessions that
touch lots of tables (like thousands) can grow due to that. Another
possible source of large cache consumption is calling lots-and-lots of
plpgsql functions.
If the same query, repeated over and over, causes memory to continue
to grow, I'd call it a leak (ie bug). If repeat executions consume
no additional memory then it's probably intentional caching behavior.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi Ilya,
Are you talking about buffers/cache increased? AFAIK this memory is used
by kernel as buffer before any block device (HDD for example).
If I'm reading the output correctly, buffers/cached do not increase. I'm
looking at the 248MB -> 312MB under the "used" column in the "-/+
buffers/cache" row. This number excludes the buffer/cached, so that can't
explain the ~60MB increase. "Shared" also remains the same (212MB), so the
shared buffers filling can't explain the increase either.
I do not remember exact formula, but it should be something like
“work_mem*max_connections + shared_buffers” and it should be around 80% of
your machine RAM (minus RAM used by other processes and kernel). It will
save you from OOM.
My Postgres is configured with *very* conservative values. work_mem (4MB)
* max_connections (100) + shared buffers (512MB) = ~1GB, yet Postgres
managed to fill up a 4GB server. I'm seeing workers consuming hundreds of
MBs of memory (and not releasing any of it until the connection closes),
despite work_mem being 4MB.
— Theron
On Thu, Aug 25, 2016 at 8:57 AM, Ilya Kazakevich <
Ilya.Kazakevich@jetbrains.com> wrote:
Show quoted text
$ free -h # Before the query
total used free shared buffers cached
Mem: 7.8G 5.2G 2.6G 212M 90M 4.9G
-/+ buffers/cache: 248M 7.6G
Swap: 0B 0B 0B
$ free -h # After the query
total used free shared buffers cached
Mem: 7.8G 5.3G 2.5G 212M 90M 4.9G
-/+ buffers/cache: 312M 7.5G
Swap: 0B 0B 0B
[I.K >> ] Are you talking about buffers/cache increased? AFAIK this memory
is used by kernel as buffer before any block device (HDD for example).Postgres does not use this memory directly, it simply reads data from
block device, and kernel caches it. Process can’t be OOMed because of it.I am sure you should configure your Postgres to NEVER exceed available
RAM. You may use tools like (http://pgtune.leopard.in.ua/) or calculate
it manually.I do not remember exact formula, but it should be something like
“work_mem*max_connections + shared_buffers” and it should be around 80% of
your machine RAM (minus RAM used by other processes and kernel).It will save you from OOM.
If you face performance bottleneck after it, you fix it using tools like
“log_min_duration_statement”, “track_io_timing” and system-provided tools.Ilya Kazakevich
JetBrains
The Drive to Develop
9.3.which? We do fix memory leaks from time to time ...
9.3.14
If it's not an outright leak, it's probably consumption of cache space.
We cache stuff that we've read from system catalogs, so sessions that
touch lots of tables (like thousands) can grow due to that. Another
possible source of large cache consumption is calling lots-and-lots of
plpgsql functions.
I have a reasonable number of tables (around 50) and very few plpgsql
functions.
If the same query, repeated over and over, causes memory to continue
to grow, I'd call it a leak (ie bug). If repeat executions consume
no additional memory then it's probably intentional caching behavior.
Here's the results of that:
https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13
So kind of a combination of the two: Memory usage increases up to a
certain point but then plateaus. So... cache? It's ~100MB increase,
though, which seems an excessive amount. What could be taking up that much
cache?
— Theron
On Thu, Aug 25, 2016 at 9:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Theron Luhn <theron@luhn.com> writes:
I have an application that uses Postgres 9.3 as the primary datastore.
9.3.which? We do fix memory leaks from time to time ...
Some of these queries use quite a bit of memory. I've observed a
"high-water mark" behavior in memory usage: running a query increasesthe
worker memory by many MBs (beyond shared buffers), but the memory is not
released until the connection is closed.Hm. I'm not familiar with smem, but assuming that that USS column
really is process-private space, that definitely looks bad.If it's not an outright leak, it's probably consumption of cache space.
We cache stuff that we've read from system catalogs, so sessions that
touch lots of tables (like thousands) can grow due to that. Another
possible source of large cache consumption is calling lots-and-lots of
plpgsql functions.If the same query, repeated over and over, causes memory to continue
to grow, I'd call it a leak (ie bug). If repeat executions consume
no additional memory then it's probably intentional caching behavior.regards, tom lane
Theron Luhn <theron@luhn.com> writes:
If it's not an outright leak, it's probably consumption of cache space.
We cache stuff that we've read from system catalogs, so sessions that
touch lots of tables (like thousands) can grow due to that. Another
possible source of large cache consumption is calling lots-and-lots of
plpgsql functions.
I have a reasonable number of tables (around 50) and very few plpgsql
functions.
Doesn't sound like a lot ...
If the same query, repeated over and over, causes memory to continue
to grow, I'd call it a leak (ie bug). If repeat executions consume
no additional memory then it's probably intentional caching behavior.
Here's the results of that:
https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13
So kind of a combination of the two: Memory usage increases up to a
certain point but then plateaus. So... cache? It's ~100MB increase,
though, which seems an excessive amount. What could be taking up that much
cache?
Hmm. I find it mighty suspicious that the USS, PSS, and RSS numbers are
all increasing to pretty much the same tune, ie from very little to circa
100MB. I think there is a decent chance that smem is not doing what it
says on the tin, and in fact is including shared memory consumption in
"USS". In which case the apparent leak just corresponds to the process
gradually touching more and more of the shared buffer arena. (If your
shared_buffers settings is not somewhere near 100MB, then this theory
breaks down.)
It would be worth using plain old top to watch this process. We have
enough experience with that to be pretty sure how to interpret its
numbers: "RES minus SHR" is the value to be worried about.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Thu, Aug 25, 2016 at 1:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hmm. I find it mighty suspicious that the USS, PSS, and RSS numbers are
all increasing to pretty much the same tune, ie from very little to circa
100MB. I think there is a decent chance that smem is not doing what it
says on the tin, and in fact is including shared memory consumption in
"USS". In which case the apparent leak just corresponds to the process
gradually touching more and more of the shared buffer arena. (If your
shared_buffers settings is not somewhere near 100MB, then this theory
breaks down.)
I can't speak to every implementation of smem, but I have used it
quite a bit under SLES and Ubuntu, and it always seemed to do what
it says -- USS is unshared (process-local) memory and PSS is that
plus the process's portion of shared memory. (The sum of
differences between PSS and USS == total shared memory.) RSS has
the usual meaning.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
It would be worth using plain old top to watch this process. We have
enough experience with that to be pretty sure how to interpret its
numbers: "RES minus SHR" is the value to be worried about.
Sure thing. https://gist.github.com/luhn/e09522d524354d96d297b153d1479c
13#file-top-txt
RES - SHR is showing a similar increase to what smem is reporting.
— Theron
On Thu, Aug 25, 2016 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Theron Luhn <theron@luhn.com> writes:
If it's not an outright leak, it's probably consumption of cache space.
We cache stuff that we've read from system catalogs, so sessions that
touch lots of tables (like thousands) can grow due to that. Another
possible source of large cache consumption is calling lots-and-lots of
plpgsql functions.I have a reasonable number of tables (around 50) and very few plpgsql
functions.Doesn't sound like a lot ...
If the same query, repeated over and over, causes memory to continue
to grow, I'd call it a leak (ie bug). If repeat executions consume
no additional memory then it's probably intentional caching behavior.Here's the results of that:
https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13So kind of a combination of the two: Memory usage increases up to a
certain point but then plateaus. So... cache? It's ~100MB increase,
though, which seems an excessive amount. What could be taking up thatmuch
cache?
Hmm. I find it mighty suspicious that the USS, PSS, and RSS numbers are
all increasing to pretty much the same tune, ie from very little to circa
100MB. I think there is a decent chance that smem is not doing what it
says on the tin, and in fact is including shared memory consumption in
"USS". In which case the apparent leak just corresponds to the process
gradually touching more and more of the shared buffer arena. (If your
shared_buffers settings is not somewhere near 100MB, then this theory
breaks down.)It would be worth using plain old top to watch this process. We have
enough experience with that to be pretty sure how to interpret its
numbers: "RES minus SHR" is the value to be worried about.regards, tom lane
Theron Luhn <theron@luhn.com> writes:
It would be worth using plain old top to watch this process. We have
enough experience with that to be pretty sure how to interpret its
numbers: "RES minus SHR" is the value to be worried about.
Sure thing. https://gist.github.com/luhn/e09522d524354d96d297b153d1479c13#file-top-txt
RES - SHR is showing a similar increase to what smem is reporting.
Hm, yeah, and the VIRT column agrees --- so 100MB of non-shared
memory went somewhere. Seems like a lot.
If you have debug symbols installed for this build, you could try
doing
gdb /path/to/postgres processID
gdb> call MemoryContextStats(TopMemoryContext)
gdb> quit
(when the process has reached an idle but bloated state) and seeing what
gets printed to the process's stderr. (You need to have launched the
postmaster with its stderr directed to a file, not to /dev/null.)
That would provide a better clue about what's eating space.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Okay, here's the output:
https://gist.github.com/luhn/a39db625ba5eed90946dd4a196d12220
— Theron
On Thu, Aug 25, 2016 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Theron Luhn <theron@luhn.com> writes:
It would be worth using plain old top to watch this process. We have
enough experience with that to be pretty sure how to interpret its
numbers: "RES minus SHR" is the value to be worried about.Sure thing. https://gist.github.com/luhn/e09522d524354d96d297b153d1479c
13#file-top-txt
RES - SHR is showing a similar increase to what smem is reporting.
Hm, yeah, and the VIRT column agrees --- so 100MB of non-shared
memory went somewhere. Seems like a lot.If you have debug symbols installed for this build, you could try
doinggdb /path/to/postgres processID
gdb> call MemoryContextStats(TopMemoryContext)
gdb> quit(when the process has reached an idle but bloated state) and seeing what
gets printed to the process's stderr. (You need to have launched the
postmaster with its stderr directed to a file, not to /dev/null.)
That would provide a better clue about what's eating space.regards, tom lane
On 8/25/2016 9:58 AM, Theron Luhn wrote:
I do not remember exact formula, but it should be something like
“work_mem*max_connections + shared_buffers” and it should be around
80% of your machine RAM (minus RAM used by other processes and
kernel). It will save you from OOM.
a single query can use multiple work_mem's if its got subqueries, joins,
etc.
My Postgres is configured with *very* conservative values. work_mem
(4MB) * max_connections (100) + shared buffers (512MB) = ~1GB, yet
Postgres managed to fill up a 4GB server. I'm seeing workers
consuming hundreds of MBs of memory (and not releasing any of it until
the connection closes), despite work_mem being 4MB.
are you doing queries that return large data sets?
--
john r pierce, recycling bits in santa cruz
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Theron Luhn <theron@luhn.com> writes:
Okay, here's the output:
https://gist.github.com/luhn/a39db625ba5eed90946dd4a196d12220
Hm, well the only thing there that looks even slightly out of the
ordinary is the amount of free space in TopMemoryContext itself:
TopMemoryContext: 3525712 total in 432 blocks; 3444272 free (12654 chunks); 81440 used
Normally, TopMemoryContext doesn't get to more than a few hundred K,
and in the cases I've seen where it does, it's usually been because of
leaky coding that was allocating stuff there and never cleaning it up.
But you've got no more than the typical amount of space still allocated
there, which seems to kill the "leak in TopMemoryContext" theory.
And in any case there is nowhere near 100MB accounted for by the whole
dump.
Are you using any other PLs besides plpgsql? We've seen cases where
bloat occurred within plpython or plperl, and wasn't visible in this
dump because those languages don't use PG's memory management code.
Or maybe some nonstandard extension?
If not that, then I'd have to speculate that the query you're running is
triggering some bug or otherwise pathological behavior. Can you put
together a self-contained test case?
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Okay, I got a semi-reproducible test case:
https://gist.github.com/luhn/2b35a9b31255e3a6a2e6a06d1213dfc9
The one caveat is that the memory rise only happens when using a
HashAggregate query plan (included in the gist), which I can't find a way
to get Postgres to reliably use.
If you need it, I could probably find another test case.
— Theron
On Thu, Aug 25, 2016 at 5:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Theron Luhn <theron@luhn.com> writes:
Okay, here's the output:
https://gist.github.com/luhn/a39db625ba5eed90946dd4a196d12220Hm, well the only thing there that looks even slightly out of the
ordinary is the amount of free space in TopMemoryContext itself:TopMemoryContext: 3525712 total in 432 blocks; 3444272 free (12654
chunks); 81440 usedNormally, TopMemoryContext doesn't get to more than a few hundred K,
and in the cases I've seen where it does, it's usually been because of
leaky coding that was allocating stuff there and never cleaning it up.
But you've got no more than the typical amount of space still allocated
there, which seems to kill the "leak in TopMemoryContext" theory.
And in any case there is nowhere near 100MB accounted for by the whole
dump.Are you using any other PLs besides plpgsql? We've seen cases where
bloat occurred within plpython or plperl, and wasn't visible in this
dump because those languages don't use PG's memory management code.
Or maybe some nonstandard extension?If not that, then I'd have to speculate that the query you're running is
triggering some bug or otherwise pathological behavior. Can you put
together a self-contained test case?regards, tom lane
Theron Luhn <theron@luhn.com> writes:
Okay, I got a semi-reproducible test case:
https://gist.github.com/luhn/2b35a9b31255e3a6a2e6a06d1213dfc9
The one caveat is that the memory rise only happens when using a
HashAggregate query plan (included in the gist), which I can't find a way
to get Postgres to reliably use.
OK, I can reproduce some memory bloat in 9.3, but not in 9.5 and up.
I believe this was fixed by commit b419865a8, which reduced the overhead
of running a lot of instances of array_agg() concurrently in a HashAgg
plan. I think your options are to live with it or upgrade. Or I guess
you could turn off enable_hashagg when using array_agg() plus GROUP BY,
though you'd want to remember to undo that whenever you do upgrade.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I've done the upgrade to 9.5. Memory bloat has reduced to a more
manageable level. Most workers have an overhead of <20MB, with one outlier
consuming 60MB.
— Theron
On Fri, Aug 26, 2016 at 5:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Theron Luhn <theron@luhn.com> writes:
Okay, I got a semi-reproducible test case:
https://gist.github.com/luhn/2b35a9b31255e3a6a2e6a06d1213dfc9The one caveat is that the memory rise only happens when using a
HashAggregate query plan (included in the gist), which I can't find a way
to get Postgres to reliably use.OK, I can reproduce some memory bloat in 9.3, but not in 9.5 and up.
I believe this was fixed by commit b419865a8, which reduced the overhead
of running a lot of instances of array_agg() concurrently in a HashAgg
plan. I think your options are to live with it or upgrade. Or I guess
you could turn off enable_hashagg when using array_agg() plus GROUP BY,
though you'd want to remember to undo that whenever you do upgrade.regards, tom lane