replication slot stats memory bug

Started by Andres Freundalmost 5 years ago5 messages
#1Andres Freund
andres@anarazel.de

Hi,

in the course of /messages/by-id/3471359.1615937770@sss.pgh.pa.us
I saw a leak in pgstat_read_statsfiles(), more precisely:
/* Allocate the space for replication slot statistics */
replSlotStats = palloc0(max_replication_slots * sizeof(PgStat_ReplSlotStats));

the issue is that the current memory context is not set by
pgstat_read_statsfiles().

In some cases CurrentMemoryContext is going to be a long-lived context,
accumulating those allocations over time. In other contexts it will be a
too short lived context, e.g. an ExprContext from the pg_stat_*
invocation in the query. A reproducer for the latter:

postgres[2252294][1]=# SELECT pg_create_logical_replication_slot('test', 'test_decoding');
┌────────────────────────────────────┐
│ pg_create_logical_replication_slot │
├────────────────────────────────────┤
│ (test,0/456C1878) │
└────────────────────────────────────┘
(1 row)

postgres[2252294][1]=# BEGIN ;
BEGIN

postgres[2252294][1]*=# SELECT * FROM pg_stat_replication_slots ;
┌───────────┬────────────┬─────────────┬─────────────┬─────────────┬──────────────┬──────────────┬─────────────┐
│ slot_name │ spill_txns │ spill_count │ spill_bytes │ stream_txns │ stream_count │ stream_bytes │ stats_reset │
├───────────┼────────────┼─────────────┼─────────────┼─────────────┼──────────────┼──────────────┼─────────────┤
│ test │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ (null) │
└───────────┴────────────┴─────────────┴─────────────┴─────────────┴──────────────┴──────────────┴─────────────┘
(1 row)

postgres[2252294][1]*=# SELECT * FROM pg_stat_replication_slots ;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────>
│ >
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────>
│ \x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F>
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────>
(1 row)

I'll push the minimal fix of forcing the allocation to happen in
pgStatLocalContext and setting it to NULL in pgstat_clear_snapshot().

But it seems like we just shouldn't allocate it dynamically at all?
max_replication_slots doesn't change during postmaster lifetime, so it
seems like it should just be allocated once?

Greetings,

Andres Freund

#2Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
Re: replication slot stats memory bug

Hi,

On 2021-03-17 16:04:47 -0700, Andres Freund wrote:

I'll push the minimal fix of forcing the allocation to happen in
pgStatLocalContext and setting it to NULL in pgstat_clear_snapshot().

Done: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5f79580ad69f6e696365bdc63bc265f45bd77211

Greetings,

Andres Freund

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#1)
Re: replication slot stats memory bug

Andres Freund <andres@anarazel.de> writes:

I saw a leak in pgstat_read_statsfiles(), more precisely:
/* Allocate the space for replication slot statistics */
replSlotStats = palloc0(max_replication_slots * sizeof(PgStat_ReplSlotStats));

Yeah, I just found that myself. I think your fix is good.

But it seems like we just shouldn't allocate it dynamically at all?
max_replication_slots doesn't change during postmaster lifetime, so it
seems like it should just be allocated once?

Meh. I don't see a need to wire in such an assumption here.

regards, tom lane

#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#2)
Re: replication slot stats memory bug

On Thu, Mar 18, 2021 at 4:55 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2021-03-17 16:04:47 -0700, Andres Freund wrote:

I'll push the minimal fix of forcing the allocation to happen in
pgStatLocalContext and setting it to NULL in pgstat_clear_snapshot().

Done: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5f79580ad69f6e696365bdc63bc265f45bd77211

Thank you!

--
With Regards,
Amit Kapila.

#5Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#3)
Re: replication slot stats memory bug

Hi,

On 2021-03-17 19:36:46 -0400, Tom Lane wrote:

But it seems like we just shouldn't allocate it dynamically at all?
max_replication_slots doesn't change during postmaster lifetime, so it
seems like it should just be allocated once?

Meh. I don't see a need to wire in such an assumption here.

It does make it easier for the shared memory stats patch, because if
there's a fixed number + location, the relevant stats reporting doesn't
need to go through a hashtable with the associated locking. I guess
that may have colored my perception that it's better to just have a
statically sized memory allocation for this. Noteworthy that SLRU stats
are done in a fixed size allocation as well...

Greetings,

Andres Freund