possible memory leak in VACUUM ANALYZE

Started by Pavel Stehuleover 3 years ago8 messageshackers

pavel.stehule@gmail.com

over 3 years ago

Just a small note - I executed VACUUM ANALYZE on one customer's database,
and I had to cancel it after a few hours, because it had more than 20GB RAM
(almost all physical RAM). The memory leak is probably not too big. This
database is a little bit unusual. This one database has more than 1 800
000 tables. and the same number of indexes.

Regards

Pavel

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Pavel Stehule (#1)

Re: possible memory leak in VACUUM ANALYZE

Hi,

On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:

Just a small note - I executed VACUUM ANALYZE on one customer's database,
and I had to cancel it after a few hours, because it had more than 20GB RAM
(almost all physical RAM).

Just to make sure: You're certain this was an actual memory leak, not just
vacuum ending up having referenced all of shared_buffers? Unless you use huge
pages, RSS increases over time, as a process touched more and more pages in
shared memory. Of course that couldn't explain rising above shared_buffers +
overhead.

The memory leak is probably not too big. This database is a little bit
unusual. This one database has more than 1 800 000 tables. and the same
number of indexes.

If you have 1.8 million tables in a single database, what you saw might just
have been the size of the relation and catalog caches.

Greetings,

Andres Freund

Pavel Stehule

pavel.stehule@gmail.com

over 3 years ago

In reply to: Andres Freund (#2)

Re: possible memory leak in VACUUM ANALYZE

pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de> napsal:

Hi,

On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:

Just a small note - I executed VACUUM ANALYZE on one customer's database,
and I had to cancel it after a few hours, because it had more than 20GB

RAM

(almost all physical RAM).

Just to make sure: You're certain this was an actual memory leak, not just
vacuum ending up having referenced all of shared_buffers? Unless you use
huge
pages, RSS increases over time, as a process touched more and more pages in
shared memory. Of course that couldn't explain rising above
shared_buffers +
overhead.

The memory leak is probably not too big. This database is a little bit
unusual. This one database has more than 1 800 000 tables. and the same
number of indexes.

If you have 1.8 million tables in a single database, what you saw might
just
have been the size of the relation and catalog caches.

can be

Regards

Pavel

Show quoted text

Greetings,

Andres Freund

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Pavel Stehule (#3)

Re: possible memory leak in VACUUM ANALYZE

On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:

pï¿½ 10. 2. 2023 v 21:18 odesï¿½latel Andres Freund <andres@anarazel.de> napsal:

On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:

Just a small note - I executed VACUUM ANALYZE on one customer's database,
and I had to cancel it after a few hours, because it had more than 20GB RAM
(almost all physical RAM).

Just to make sure: You're certain this was an actual memory leak, not just
vacuum ending up having referenced all of shared_buffers? Unless you use huge
pages, RSS increases over time, as a process touched more and more pages in
shared memory. Of course that couldn't explain rising above
shared_buffers + overhead.

The memory leak is probably not too big. This database is a little bit
unusual. This one database has more than 1 800 000 tables. and the same
number of indexes.

If you have 1.8 million tables in a single database, what you saw might just
have been the size of the relation and catalog caches.

can be

Well, how big was shared_buffers on that instance ?

--
Justin

Pavel Stehule

pavel.stehule@gmail.com

over 3 years ago

In reply to: Justin Pryzby (#4)

Re: possible memory leak in VACUUM ANALYZE

pá 10. 2. 2023 v 23:01 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:

pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de>

napsal:

On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:

Just a small note - I executed VACUUM ANALYZE on one customer's

database,

and I had to cancel it after a few hours, because it had more than

20GB RAM

(almost all physical RAM).

Just to make sure: You're certain this was an actual memory leak, not

just

vacuum ending up having referenced all of shared_buffers? Unless you

use huge

pages, RSS increases over time, as a process touched more and more

pages in

shared memory. Of course that couldn't explain rising above
shared_buffers + overhead.

The memory leak is probably not too big. This database is a little

bit

unusual. This one database has more than 1 800 000 tables. and the

same

number of indexes.

If you have 1.8 million tables in a single database, what you saw

might just

have been the size of the relation and catalog caches.

can be

Well, how big was shared_buffers on that instance ?

20GB RAM
20GB swap
2GB shared buffers

Show quoted text

--
Justin

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Pavel Stehule (#5)

Re: possible memory leak in VACUUM ANALYZE

On Sat, Feb 11, 2023 at 07:06:45AM +0100, Pavel Stehule wrote:

pï¿½ 10. 2. 2023 v 23:01 odesï¿½latel Justin Pryzby <pryzby@telsasoft.com> napsal:

On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:

pï¿½ 10. 2. 2023 v 21:18 odesï¿½latel Andres Freund <andres@anarazel.de> napsal:

On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:

Just a small note - I executed VACUUM ANALYZE on one customer's database,
and I had to cancel it after a few hours, because it had more than 20GB RAM
(almost all physical RAM).

Just to make sure: You're certain this was an actual memory leak, not just
vacuum ending up having referenced all of shared_buffers? Unless you use huge
pages, RSS increases over time, as a process touched more and more pages in
shared memory. Of course that couldn't explain rising above
shared_buffers + overhead.

The memory leak is probably not too big. This database is a little bit
unusual. This one database has more than 1 800 000 tables. and the same
number of indexes.

If you have 1.8 million tables in a single database, what you saw might just
have been the size of the relation and catalog caches.

can be

Well, how big was shared_buffers on that instance ?

20GB RAM
20GB swap
2GB shared buffers

Thanks; so that can't explain using more than 2GB + a bit of overhead.

Can you reproduce the problem and figure out which relation was being
processed, or if the memory use is growing across relations?
pg_stat_progress_analyze/vacuum would be one thing to check.
Does VACUUM alone trigger the issue ? What about ANALYZE ?

Was parallel vacuum happening (are there more than one index per table) ?

Do you have any extended stats objects or non-default stats targets ?
What server version is it? What OS? Extensions? Non-btree indexes?

BTW I'm interested about this because I have an VM instance running v15
which has been killed more than a couple times in the last 6 months, and
I haven't been able to diagnose why. But autovacuum/analyze could
explain it. On this one particular instance, we don't have many
relations, though...

--
Justin

Andres Freund

andres@anarazel.de

over 3 years ago

In reply to: Justin Pryzby (#6)

Re: possible memory leak in VACUUM ANALYZE

Hi,

On 2023-02-11 00:53:48 -0600, Justin Pryzby wrote:

On Sat, Feb 11, 2023 at 07:06:45AM +0100, Pavel Stehule wrote:

pï¿½ 10. 2. 2023 v 23:01 odesï¿½latel Justin Pryzby <pryzby@telsasoft.com> napsal:

On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:

pï¿½ 10. 2. 2023 v 21:18 odesï¿½latel Andres Freund <andres@anarazel.de> napsal:

On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:

Just a small note - I executed VACUUM ANALYZE on one customer's database,
and I had to cancel it after a few hours, because it had more than 20GB RAM
(almost all physical RAM).

Just to make sure: You're certain this was an actual memory leak, not just
vacuum ending up having referenced all of shared_buffers? Unless you use huge
pages, RSS increases over time, as a process touched more and more pages in
shared memory. Of course that couldn't explain rising above
shared_buffers + overhead.

The memory leak is probably not too big. This database is a little bit
unusual. This one database has more than 1 800 000 tables. and the same
number of indexes.

If you have 1.8 million tables in a single database, what you saw might just
have been the size of the relation and catalog caches.

can be

Well, how big was shared_buffers on that instance ?

20GB RAM
20GB swap
2GB shared buffers

Thanks; so that can't explain using more than 2GB + a bit of overhead.

I think my theory of 1.8 million relcache / catcache entries is pretty good...

I'd do the vacuum analyze again, interrupt once memory usage is high, and
check
SELECT * FROM pg_backend_memory_contexts ORDER BY total_bytes DESC

BTW I'm interested about this because I have an VM instance running v15
which has been killed more than a couple times in the last 6 months, and
I haven't been able to diagnose why. But autovacuum/analyze could
explain it. On this one particular instance, we don't have many
relations, though...

Killed in what way? OOM?

If you'd set up strict overcommit you'd get a nice memory dump in the log...

Greetings,

Andres Freund

Pavel Stehule

pavel.stehule@gmail.com

over 3 years ago

In reply to: Justin Pryzby (#6)

Re: possible memory leak in VACUUM ANALYZE

so 11. 2. 2023 v 7:53 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Sat, Feb 11, 2023 at 07:06:45AM +0100, Pavel Stehule wrote:

pá 10. 2. 2023 v 23:01 odesílatel Justin Pryzby <pryzby@telsasoft.com>

napsal:

On Fri, Feb 10, 2023 at 09:23:11PM +0100, Pavel Stehule wrote:

pá 10. 2. 2023 v 21:18 odesílatel Andres Freund <andres@anarazel.de>

napsal:

On 2023-02-10 21:09:06 +0100, Pavel Stehule wrote:

Just a small note - I executed VACUUM ANALYZE on one customer's

database,

and I had to cancel it after a few hours, because it had more

than 20GB RAM

(almost all physical RAM).

Just to make sure: You're certain this was an actual memory leak,

not just

vacuum ending up having referenced all of shared_buffers? Unless

you use huge

pages, RSS increases over time, as a process touched more and more

pages in

shared memory. Of course that couldn't explain rising above
shared_buffers + overhead.

The memory leak is probably not too big. This database is a

little bit

unusual. This one database has more than 1 800 000 tables. and

the same

number of indexes.

If you have 1.8 million tables in a single database, what you saw

might just

have been the size of the relation and catalog caches.

can be

Well, how big was shared_buffers on that instance ?

20GB RAM
20GB swap
2GB shared buffers

Thanks; so that can't explain using more than 2GB + a bit of overhead.

Can you reproduce the problem and figure out which relation was being
processed, or if the memory use is growing across relations?
pg_stat_progress_analyze/vacuum would be one thing to check.
Does VACUUM alone trigger the issue ? What about ANALYZE ?

I executed VACUUM and ANALYZE separately, and memory grew up in both cases
with similar speed.

almost all tables has less than 120 pages, and less than thousand tuples

and almost all tables has twenty fields + one geometry field + gist index

the size of pg_attribute has 6GB and pg_class has about 2GB

Unfortunately, that is customer's production server, and I have not full
access there so I cannot to do deeper investigation

Was parallel vacuum happening (are there more than one index per table) ?

probably not - almost all tables are small

Do you have any extended stats objects or non-default stats targets ?
What server version is it? What OS? Extensions? Non-btree indexes?

PostgreSQL 14 on linux with installed PostGIS

Show quoted text

BTW I'm interested about this because I have an VM instance running v15
which has been killed more than a couple times in the last 6 months, and
I haven't been able to diagnose why. But autovacuum/analyze could
explain it. On this one particular instance, we don't have many
relations, though...

--
Justin