wal recycling problem

Started by Fabrice Chapuisover 2 years ago7 messages
#1Fabrice Chapuis
fabrice636861@gmail.com

Hello,

I have a question about the automatic removal of unused WAL files. When
loading data with pg_restore (200Gb) we noticed that a lot of WALs files
are generated and they are not purged automatically nor recycled despite
frequent checkpoints, then pg_wal folder (150Gb) fill and become out of
space.
We have a cluster of 2 members (1 primary and 1 standby) with Postgres
version 14.9 and 2 barman server, slots are only configured for barman,
barman is version 3.7.
The archive command is desactivated (archive_command=':')
I use pg_archivecleanup (with the wal file generated from the last
checkpoint in parameter) to remove files manually before the limit of 150Gb
so that the restore can terminate.

Why does postgres do not this cleanup automatically, which part of the code
is responsible for removing or recycling the wals?

Thanks for your help

Fabrice

#2Christoph Moench-Tegeder
cmt@burggraben.net
In reply to: Fabrice Chapuis (#1)
Re: wal recycling problem

## Fabrice Chapuis (fabrice636861@gmail.com):

We have a cluster of 2 members (1 primary and 1 standby) with Postgres
version 14.9 and 2 barman server, slots are only configured for barman,
barman is version 3.7.

The obvious question here is: can both of those barmans keep up with
your database, or are you seeing WAL retention due to exactly these
replication slots? (Check pg_replication_slots).

Regards,
Christoph

--
Spare Space

#3Fabrice Chapuis
fabrice636861@gmail.com
In reply to: Christoph Moench-Tegeder (#2)
Re: wal recycling problem

Yes, barman replication can keep up with primary, wals segments size are
under max_wal_size (24Gb in our configuration)

Here is pg_replication_slots view:

barman_ge physical f t 39409 1EE2/49000000
reserved f
barman_be physical f t 39434 1EE2/3D000000
reserved f

on the other hand there are 2 slots for logical replication which display
status extended. I don't understand why given that the confirmed_flush_lsn
field that is up to date. The restart_lsn remains frozen, for what reason?

pgoutput │ logical │ 2667915 │ db019a00 │ f │ t │ 1880162
│ │ 68512101 │ 1ECA/37C3F1B8 │ 1EE2/4D6BDCF8 │ extended │
│ f │
pgoutput │ logical │ 2668584 │ db038a00 │ f │ t │
363230 │ │ 68512101 │ 1ECA/37C3F1B8 │ 1EE2/4D6BDCF8 │
extended │ │ f │

Regards
Fabrice

On Thu, Sep 28, 2023 at 7:59 PM Christoph Moench-Tegeder <cmt@burggraben.net>
wrote:

Show quoted text

## Fabrice Chapuis (fabrice636861@gmail.com):

We have a cluster of 2 members (1 primary and 1 standby) with Postgres
version 14.9 and 2 barman server, slots are only configured for barman,
barman is version 3.7.

The obvious question here is: can both of those barmans keep up with
your database, or are you seeing WAL retention due to exactly these
replication slots? (Check pg_replication_slots).

Regards,
Christoph

--
Spare Space

#4Christoph Moench-Tegeder
cmt@burggraben.net
In reply to: Fabrice Chapuis (#3)
Re: wal recycling problem

Hi,

## Fabrice Chapuis (fabrice636861@gmail.com):

on the other hand there are 2 slots for logical replication which display
status extended. I don't understand why given that the confirmed_flush_lsn
field that is up to date. The restart_lsn remains frozen, for what reason?

There you have it - "extended" means "holding wal". And as long as the
restart_lsn does not advance, checkpointer cannot free any wal beyond
that lsn. My first idea would be some long-running (or huge) transaction
which is in process (active or still being streamed). I'd recommend
looking into what the clients on these slots are doing.

Regards,
Christoph

--
Spare Space

#5Fabrice Chapuis
fabrice636861@gmail.com
In reply to: Christoph Moench-Tegeder (#4)
Re: wal recycling problem

Thanks Christoph for your message.
Now I understand why the wals are preserved if logical replication is
configured and enabled. The problem is that when a large volume of data is
loaded into a database, for example during a pg_restore, the wal sender
process associated with the logical replication slot will have to decrypt
all of the wals generated during this operation which will take a long time
and the restart_lsn will not be modified.
From a conceptual point of view I think that specific wals per subscription
should be used and stored in the pg_replslot folder in order to avoid
working directly on the wals of the instance.

What do you think about this proposal?

Regards

Fabrice

On Mon, Oct 2, 2023 at 12:06 PM Christoph Moench-Tegeder <cmt@burggraben.net>
wrote:

Show quoted text

Hi,

## Fabrice Chapuis (fabrice636861@gmail.com):

on the other hand there are 2 slots for logical replication which display
status extended. I don't understand why given that the

confirmed_flush_lsn

field that is up to date. The restart_lsn remains frozen, for what

reason?

There you have it - "extended" means "holding wal". And as long as the
restart_lsn does not advance, checkpointer cannot free any wal beyond
that lsn. My first idea would be some long-running (or huge) transaction
which is in process (active or still being streamed). I'd recommend
looking into what the clients on these slots are doing.

Regards,
Christoph

--
Spare Space

#6Christoph Moench-Tegeder
cmt@burggraben.net
In reply to: Fabrice Chapuis (#5)
Re: wal recycling problem

## Fabrice Chapuis (fabrice636861@gmail.com):

From a conceptual point of view I think that specific wals per subscription
should be used and stored in the pg_replslot folder in order to avoid
working directly on the wals of the instance.
What do you think about this proposal?

I think that would open a wholly new can of worms.
The most obvious point here is: that WAL is primarily generated for
the operation of the database itself - it's our kind of transaction
log, or "Redo Log" in other systems' lingo. Replication (be it physical
or logical) is a secondary purpose (an obvious and important one, but
still secondary).
How would you know which part of WAL is needed for any specific
replication slot? You'd have to decode and filter it, and already
you're back at square one. How would you handle multiple replications
for the same table (in the same publication, or even over multiple
(overlapping) publications) - do you multiply the WAL?

For now, we have "any replication using replication slots, be it logical
or physical replication, retains WAL up to max_slot_wal_keep_size
(or "unlimited" if not set - and on PostgreSQL 12 and before); and you
need to monitor the state of your replication slots", which is a
totally usabe rule, I think.

Regards,
Christoph

--
Spare Space

#7Fabrice Chapuis
fabrice636861@gmail.com
In reply to: Christoph Moench-Tegeder (#6)
Re: wal recycling problem

Thanks for your feedback

How would you know which part of WAL is needed for any specific

replication slot?
change are captured for each published table and written twice, once in
the current wal and once in the slot-specific wal

How would you handle multiple replications

for the same table
added information about from which publication a table belongs is entered
in the wal slot

be it logical or physical replication, retains WAL up to

max_slot_wal_keep_size
ok but if max_slot_wal_keep_size is exceeded the changes are lost and all
of the replicated tables must be resynchronized

Regards

Fabrice

On Sun, Oct 8, 2023 at 3:57 PM Christoph Moench-Tegeder <cmt@burggraben.net>
wrote:

Show quoted text

## Fabrice Chapuis (fabrice636861@gmail.com):

From a conceptual point of view I think that specific wals per

subscription

should be used and stored in the pg_replslot folder in order to avoid
working directly on the wals of the instance.
What do you think about this proposal?

I think that would open a wholly new can of worms.
The most obvious point here is: that WAL is primarily generated for
the operation of the database itself - it's our kind of transaction
log, or "Redo Log" in other systems' lingo. Replication (be it physical
or logical) is a secondary purpose (an obvious and important one, but
still secondary).
How would you know which part of WAL is needed for any specific
replication slot? You'd have to decode and filter it, and already
you're back at square one. How would you handle multiple replications
for the same table (in the same publication, or even over multiple
(overlapping) publications) - do you multiply the WAL?

For now, we have "any replication using replication slots, be it logical
or physical replication, retains WAL up to max_slot_wal_keep_size
(or "unlimited" if not set - and on PostgreSQL 12 and before); and you
need to monitor the state of your replication slots", which is a
totally usabe rule, I think.

Regards,
Christoph

--
Spare Space