pg_wal folder high disk usage

Started by Paul Brindusaover 1 year ago8 messagesgeneral

paulbrindusa88@gmail.com

over 1 year ago

Good morning,

On one of our postgres instances we have the pg_wal/data folder up to
196GB, out of 200GB disk filled up.
This has stopped the posgresql.service this morning causing two
applications to crash.
Unfortunately our database admin is on leave today, and we are trying to
figure out how to get the disk down?
Any ideas or suggestions are more than welcome.

Thank you in advance.

--
Kind Regards,
Paul Brindusa
paulbrindusa88@gmail.com

Ron

ronljohnsonjr@gmail.com

over 1 year ago

In reply to: Paul Brindusa (#1)

Re: pg_wal folder high disk usage

On Thu, Oct 31, 2024 at 6:36 AM Paul Brindusa <paulbrindusa88@gmail.com>
wrote:

Good morning,

On one of our postgres instances we have the pg_wal/data folder up to
196GB, out of 200GB disk filled up.
This has stopped the posgresql.service this morning causing two
applications to crash.
Unfortunately our database admin is on leave today, and we are trying to
figure out how to get the disk down?
Any ideas or suggestions are more than welcome.

Is data supposed to be replicated to a second (remote) database? Such
growth would be explained if that network link is broken.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!

Priancka Chatz

pc9926@gmail.com

over 1 year ago

In reply to: Paul Brindusa (#1)

Re: pg_wal folder high disk usage

Hi,

You might wanna check if archive backups(id enabled) are happening and/or
if there is replication lag or replication broken if you have replicas.

On Thu, Oct 31, 2024 at 11:36 AM Paul Brindusa <paulbrindusa88@gmail.com>
wrote:

Show quoted text

Good morning,

On one of our postgres instances we have the pg_wal/data folder up to
196GB, out of 200GB disk filled up.
This has stopped the posgresql.service this morning causing two
applications to crash.
Unfortunately our database admin is on leave today, and we are trying to
figure out how to get the disk down?
Any ideas or suggestions are more than welcome.

Thank you in advance.

--
Kind Regards,
Paul Brindusa
paulbrindusa88@gmail.com

Laurenz Albe

laurenz.albe@cybertec.at

over 1 year ago

In reply to: Paul Brindusa (#1)

Re: pg_wal folder high disk usage

On Thu, 2024-10-31 at 10:36 +0000, Paul Brindusa wrote:

On one of our postgres instances we have the pg_wal/data folder up to 196GB, out of 200GB disk filled up.
This has stopped the posgresql.service this morning causing two applications to crash.
Unfortunately our database admin is on leave today, and we are trying to figure out how to get the disk down?
Any ideas or suggestions are more than welcome.

Check why pg_wal is growing:
https://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing/

Yours,
Laurenz Albe

Muhammad Usman Khan

usman.k@bitnine.net

over 1 year ago

In reply to: Paul Brindusa (#1)

Re: pg_wal folder high disk usage

First of all check if postgres cannot archive or delete old WAL files. For
immediate space, move older files from pg_Wal to another storage but don't
delete them.
Restart postgres in recovery mode and if archiving is not working then try
disabling it temporarily to let PostgreSQL automatically clear older WAL
files
archive_mode = off

On Thu, 31 Oct 2024 at 15:36, Paul Brindusa <paulbrindusa88@gmail.com>
wrote:

Show quoted text

Good morning,

On one of our postgres instances we have the pg_wal/data folder up to
196GB, out of 200GB disk filled up.
This has stopped the posgresql.service this morning causing two
applications to crash.
Unfortunately our database admin is on leave today, and we are trying to
figure out how to get the disk down?
Any ideas or suggestions are more than welcome.

Thank you in advance.

--
Kind Regards,
Paul Brindusa
paulbrindusa88@gmail.com

Greg Sabino Mullane

greg@turnstep.com

over 1 year ago

In reply to: Muhammad Usman Khan (#5)

Re: pg_wal folder high disk usage

On Fri, Nov 1, 2024 at 2:40 AM Muhammad Usman Khan <usman.k@bitnine.net>
wrote:

For immediate space, move older files from pg_Wal to another storage but
don't delete them.

No, do not do this! Figure out why WAL is not getting removed by Postgres
and let it do its job once fixed. Please recall the original poster is
trying to figure out what to do because they are not the database admin, so
having them figure out which WAL are "older" and safe to move is not good
advice.

Resizing the disk is a better option. Could also see if there are other
large files on that volume that can be removed or moved elsewhere, esp.
large log files.

Hopefully all of this is moot because their DBA is back from leave. :)

Cheers,
Greg

Koen De Groote

kdg.dev@gmail.com

over 1 year ago

In reply to: Greg Sabino Mullane (#6)

Re: pg_wal folder high disk usage

A possible reason for pg_wal buildup is that there is a sort of replication
going on(logical or physical replication) and the receiving side of the
replication has stopped somehow.

This means: a different server that has a connection to your server and is
expecting to receive data. And your server is then expecting to have to
send data(this is the important bit). There could be multiple of these
connections.

If even 1 of these receiving servers is down, or the network is out, or
there is some other reason that it is no longer requesting data from your
server, your server will notice it isn't getting confirmation from that
other side, that they have received the data. As such, your postgres server
will keep this data locally, expecting this situation to be solved in the
future, and at that point in time, send all the data the other side hasn't
gotten yet.

This is 1 option. As long as your server is configured to expect that other
server to be there, and to be receiving, the buildup will continue. Taking
the other server offline won't help, in fact it is likely the cause of the
issue. The official documentation explains how to get rid of replication
slots, ideally your DBA should handle this.

Laurenz's blogpost lays out all the options, for instance it can also
happen that your system is generating data so fast, the writing of the WAL
files cannot keep up. Or your setup also does WAL archiving and the
compression on that is slow.

The post offers some ways to verify things, I suggest checking them out.

And of course, if your DBA is back, have them look at it too.

Regards,
Koen De Groote

On Fri, Nov 1, 2024 at 2:10 PM Greg Sabino Mullane <htamfids@gmail.com>
wrote:

Show quoted text

On Fri, Nov 1, 2024 at 2:40 AM Muhammad Usman Khan <usman.k@bitnine.net>
wrote:

For immediate space, move older files from pg_Wal to another storage but
don't delete them.

No, do not do this! Figure out why WAL is not getting removed by Postgres
and let it do its job once fixed. Please recall the original poster is
trying to figure out what to do because they are not the database admin, so
having them figure out which WAL are "older" and safe to move is not good
advice.

Resizing the disk is a better option. Could also see if there are other
large files on that volume that can be removed or moved elsewhere, esp.
large log files.

Hopefully all of this is moot because their DBA is back from leave. :)

Cheers,
Greg

Paul Brindusa

paulbrindusa88@gmail.com

over 1 year ago

In reply to: Koen De Groote (#7)

Re: pg_wal folder high disk usage

Good morning Koen,

Highly appreciate your response on this.

This has clarified a little bit on the WAL files. Your insights made the
whole thing a little bit more clear.

Kind Regards,

Paul B.

Show quoted text

On 03/11/2024 13:59, Koen De Groote wrote:

A possible reason for pg_wal buildup is that there is a sort of
replication going on(logical or physical replication) and the
receiving side of the replication has stopped somehow.

This means: a different server that has a connection to your server
and is expecting to receive data. And your server is then expecting to
have to send data(this is the important bit). There could be multiple
of these connections.

If even 1 of these receiving servers is down, or the network is out,
or there is some other reason that it is no longer requesting data
from your server, your server will notice it isn't getting
confirmation from that other side, that they have received the data.
As such, your postgres server will keep this data locally, expecting
this situation to be solved in the future, and at that point in time,
send all the data the other side hasn't gotten yet.

This is 1 option. As long as your server is configured to expect that
other server to be there, and to be receiving, the buildup will
continue. Taking the other server offline won't help, in fact it is
likely the cause of the issue. The official documentation explains how
to get rid of replication slots, ideally your DBA should handle this.

Laurenz's blogpost lays out all the options, for instance it can also
happen that your system is generating data so fast, the writing of the
WAL files cannot keep up. Or your setup also does WAL archiving and
the compression on that is slow.

The post offers some ways to verify things, I suggest checking them out.

And of course, if your DBA is back, have them look at it too.

Regards,
Koen De Groote

On Fri, Nov 1, 2024 at 2:10 PM Greg Sabino Mullane
<htamfids@gmail.com> wrote:

On Fri, Nov 1, 2024 at 2:40 AM Muhammad Usman Khan
<usman.k@bitnine.net> wrote:

For immediate space, move older files from pg_Wal to another
storage but don't delete them.

No, do not do this! Figure out why WAL is not getting removed by
Postgres and let it do its job once fixed. Please recall the
original poster is trying to figure out what to do because they
are not the database admin, so having them figure out which WAL
are "older" and safe to move is not good advice.

Resizing the disk is a better option. Could also see if there are
other large files on that volume that can be removed or moved
elsewhere, esp. large log files.

Hopefully all of this is moot because their DBA is back from
leave. :)

Cheers,
Greg