possible memory leak in VACUUM ANALYZE
Hi
Just a small note - I executed VACUUM ANALYZE on one customer's database,
and I had to cancel it after a few hours, because it had more than 20GB RAM
(almost all physical RAM). The memory leak is probably not too big. This
database is a little bit unusual. This one database has more than 1 800
000 tables. and the same number of indexes.
Regards
Pavel
Hi,
On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
Just a small note - I executed VACUUM ANALYZE on one customer's database,
and I had to cancel it after a few hours, because it had more than 20GB RAM
(almost all physical RAM).
Just to make sure: You're certain this was an actual memory leak, not just
vacuum ending up having referenced all of shared_buffers? Unless you use huge
pages, RSS increases over time, as a process touched more and more pages in
shared memory. Of course that couldn't explain rising above shared_buffers +
overhead.
The memory leak is probably not too big. This database is a little bit
unusual. This one database has more than 1 800 000 tables. and the same
number of indexes.
If you have 1.8 million tables in a single database, what you saw might just
have been the size of the relation and catalog caches.
Greetings,
Andres Freund
pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de> napsal:
Hi,
On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
Just a small note - I executed VACUUM ANALYZE on one customer's database,
and I had to cancel it after a few hours, because it had more than 20GBRAM
(almost all physical RAM).
Just to make sure: You're certain this was an actual memory leak, not just
vacuum ending up having referenced all of shared_buffers? Unless you use
huge
pages, RSS increases over time, as a process touched more and more pages in
shared memory. Of course that couldn't explain rising above
shared_buffers +
overhead.The memory leak is probably not too big. This database is a little bit
unusual. This one database has more than 1 800 000 tables. and the same
number of indexes.If you have 1.8 million tables in a single database, what you saw might
just
have been the size of the relation and catalog caches.
can be
Regards
Pavel
Show quoted text
Greetings,
Andres Freund
On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:
p� 10. 2. 2023 v 21:18 odes�latel Andres Freund <andres@anarazel.de> napsal:
On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
Just a small note - I executed VACUUM ANALYZE on one customer's database,
and I had to cancel it after a few hours, because it had more than 20GB RAM
(almost all physical RAM).Just to make sure: You're certain this was an actual memory leak, not just
vacuum ending up having referenced all of shared_buffers? Unless you use huge
pages, RSS increases over time, as a process touched more and more pages in
shared memory. Of course that couldn't explain rising above
shared_buffers + overhead.The memory leak is probably not too big. This database is a little bit
unusual. This one database has more than 1 800 000 tables. and the same
number of indexes.If you have 1.8 million tables in a single database, what you saw might just
have been the size of the relation and catalog caches.can be
Well, how big was shared_buffers on that instance ?
--
Justin
pá 10. 2. 2023 v 23:01 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:
On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:
pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de>
napsal:
On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
Just a small note - I executed VACUUM ANALYZE on one customer's
database,
and I had to cancel it after a few hours, because it had more than
20GB RAM
(almost all physical RAM).
Just to make sure: You're certain this was an actual memory leak, not
just
vacuum ending up having referenced all of shared_buffers? Unless you
use huge
pages, RSS increases over time, as a process touched more and more
pages in
shared memory. Of course that couldn't explain rising above
shared_buffers + overhead.The memory leak is probably not too big. This database is a little
bit
unusual. This one database has more than 1 800 000 tables. and the
same
number of indexes.
If you have 1.8 million tables in a single database, what you saw
might just
have been the size of the relation and catalog caches.
can be
Well, how big was shared_buffers on that instance ?
20GB RAM
20GB swap
2GB shared buffers
Show quoted text
--
Justin
On Sat, Feb 11, 2023 at 07:06:45AM +0100, Pavel Stehule wrote:
p� 10. 2. 2023 v 23:01 odes�latel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:
p� 10. 2. 2023 v 21:18 odes�latel Andres Freund <andres@anarazel.de> napsal:
On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
Just a small note - I executed VACUUM ANALYZE on one customer's database,
and I had to cancel it after a few hours, because it had more than 20GB RAM
(almost all physical RAM).Just to make sure: You're certain this was an actual memory leak, not just
vacuum ending up having referenced all of shared_buffers? Unless you use huge
pages, RSS increases over time, as a process touched more and more pages in
shared memory. Of course that couldn't explain rising above
shared_buffers + overhead.The memory leak is probably not too big. This database is a little bit
unusual. This one database has more than 1 800 000 tables. and the same
number of indexes.If you have 1.8 million tables in a single database, what you saw might just
have been the size of the relation and catalog caches.can be
Well, how big was shared_buffers on that instance ?
20GB RAM
20GB swap
2GB shared buffers
Thanks; so that can't explain using more than 2GB + a bit of overhead.
Can you reproduce the problem and figure out which relation was being
processed, or if the memory use is growing across relations?
pg_stat_progress_analyze/vacuum would be one thing to check.
Does VACUUM alone trigger the issue ? What about ANALYZE ?
Was parallel vacuum happening (are there more than one index per table) ?
Do you have any extended stats objects or non-default stats targets ?
What server version is it? What OS? Extensions? Non-btree indexes?
BTW I'm interested about this because I have an VM instance running v15
which has been killed more than a couple times in the last 6 months, and
I haven't been able to diagnose why. But autovacuum/analyze could
explain it. On this one particular instance, we don't have many
relations, though...
--
Justin
Hi,
On 2023-02-11 00:53:48 -0600, Justin Pryzby wrote:
On Sat, Feb 11, 2023 at 07:06:45AM +0100, Pavel Stehule wrote:
p� 10. 2. 2023 v 23:01 odes�latel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:
p� 10. 2. 2023 v 21:18 odes�latel Andres Freund <andres@anarazel.de> napsal:
On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
Just a small note - I executed VACUUM ANALYZE on one customer's database,
and I had to cancel it after a few hours, because it had more than 20GB RAM
(almost all physical RAM).Just to make sure: You're certain this was an actual memory leak, not just
vacuum ending up having referenced all of shared_buffers? Unless you use huge
pages, RSS increases over time, as a process touched more and more pages in
shared memory. Of course that couldn't explain rising above
shared_buffers + overhead.The memory leak is probably not too big. This database is a little bit
unusual. This one database has more than 1 800 000 tables. and the same
number of indexes.If you have 1.8 million tables in a single database, what you saw might just
have been the size of the relation and catalog caches.can be
Well, how big was shared_buffers on that instance ?
20GB RAM
20GB swap
2GB shared buffersThanks; so that can't explain using more than 2GB + a bit of overhead.
I think my theory of 1.8 million relcache / catcache entries is pretty good...
I'd do the vacuum analyze again, interrupt once memory usage is high, and
check
SELECT * FROM pg_backend_memory_contexts ORDER BY total_bytes DESC
BTW I'm interested about this because I have an VM instance running v15
which has been killed more than a couple times in the last 6 months, and
I haven't been able to diagnose why. But autovacuum/analyze could
explain it. On this one particular instance, we don't have many
relations, though...
Killed in what way? OOM?
If you'd set up strict overcommit you'd get a nice memory dump in the log...
Greetings,
Andres Freund
so 11. 2. 2023 v 7:53 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:
On Sat, Feb 11, 2023 at 07:06:45AM +0100, Pavel Stehule wrote:
pá 10. 2. 2023 v 23:01 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:
On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:
pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de>
napsal:
On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:
Just a small note - I executed VACUUM ANALYZE on one customer's
database,
and I had to cancel it after a few hours, because it had more
than 20GB RAM
(almost all physical RAM).
Just to make sure: You're certain this was an actual memory leak,
not just
vacuum ending up having referenced all of shared_buffers? Unless
you use huge
pages, RSS increases over time, as a process touched more and more
pages in
shared memory. Of course that couldn't explain rising above
shared_buffers + overhead.The memory leak is probably not too big. This database is a
little bit
unusual. This one database has more than 1 800 000 tables. and
the same
number of indexes.
If you have 1.8 million tables in a single database, what you saw
might just
have been the size of the relation and catalog caches.
can be
Well, how big was shared_buffers on that instance ?
20GB RAM
20GB swap
2GB shared buffersThanks; so that can't explain using more than 2GB + a bit of overhead.
Can you reproduce the problem and figure out which relation was being
processed, or if the memory use is growing across relations?
pg_stat_progress_analyze/vacuum would be one thing to check.
Does VACUUM alone trigger the issue ? What about ANALYZE ?
I executed VACUUM and ANALYZE separately, and memory grew up in both cases
with similar speed.
almost all tables has less than 120 pages, and less than thousand tuples
and almost all tables has twenty fields + one geometry field + gist index
the size of pg_attribute has 6GB and pg_class has about 2GB
Unfortunately, that is customer's production server, and I have not full
access there so I cannot to do deeper investigation
Was parallel vacuum happening (are there more than one index per table) ?
probably not - almost all tables are small
Do you have any extended stats objects or non-default stats targets ?
What server version is it? What OS? Extensions? Non-btree indexes?
PostgreSQL 14 on linux with installed PostGIS
Show quoted text
BTW I'm interested about this because I have an VM instance running v15
which has been killed more than a couple times in the last 6 months, and
I haven't been able to diagnose why. But autovacuum/analyze could
explain it. On this one particular instance, we don't have many
relations, though...--
Justin