How to prevent master server crash if hot standby stops

Started by Andrusabout 6 years ago10 messagesgeneral
Jump to latest
#1Andrus
kobruleht2@hot.ee

Hi!

Streaming asynchronous binary replication is used with hot standby slave.

If slave stops responing, master server will create files in pg_wal directory.
If disk becomes full, master server crashes also.

How to avoid this ?

If disk is nearly full, master should stop additional files creation (and maybe stop or disable replication slot).
Postgres 12 in Debian 10 is used.

Andrus.

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Andrus (#1)
Re: How to prevent master server crash if hot standby stops

On Mon, 2020-04-06 at 11:03 +0300, Andrus wrote:

Streaming asynchronous binary replication is used with hot standby slave.

If slave stops responing, master server will create files in pg_wal directory.
If disk becomes full, master server crashes also.

How to avoid this ?

If disk is nearly full, master should stop additional files creation (and maybe stop or disable replication slot).
Postgres 12 in Debian 10 is used.

That's why you use monitoring.

Never use replication slots without monitoring replication (or at least the
disk space on the primary).

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#3Michael Paquier
michael@paquier.xyz
In reply to: Andrus (#1)
Re: How to prevent master server crash if hot standby stops

On Mon, Apr 06, 2020 at 11:03:20AM +0300, Andrus wrote:

If slave stops responing, master server will create files in pg_wal directory.
If disk becomes full, master server crashes also.

How to avoid this ?

If disk is nearly full, master should stop additional files creation
(and maybe stop or disable replication slot).
Postgres 12 in Debian 10 is used.

When you use replication slots, it is very important to put in place a
monitoring solution to check if too much WAL is retained, and note
that there is nothing able to do that natively in core Postgres.
There are however multiple ways to solve this problem, like a
background worker (for the slot monitoring as well as optionally
killing and/or dropping), a simple cron job or even check_postgres.
--
Michael

#4Andrus
kobruleht2@hot.ee
In reply to: Laurenz Albe (#2)
Re: How to prevent master server crash if hot standby stops

Hi!

Thank you.

That's why you use monitoring.
Never use replication slots without monitoring replication (or at least the
disk space on the primary).

How to implement this automatically, without human interaction required ?

"superuser_reserved_connections" setting exists.
How about implementing "data_reserved_space" setting ?

How to create procedure in server or maybe cron scipt which stops replication if disk becomes nearly full ?

How to force to stop replication slot in master if it is in use ?

Andrus.

#5Andrus
kobruleht2@hot.ee
In reply to: Michael Paquier (#3)
Re: How to prevent master server crash if hot standby stops

Hi!

Thank you.

When you use replication slots, it is very important to put in place a
monitoring solution to check if too much WAL is retained, and note
that there is nothing able to do that natively in core Postgres.
There are however multiple ways to solve this problem, like a
background worker (for the slot monitoring as well as optionally
killing and/or dropping), a simple cron job or even check_postgres.

Where to find some sample how to implement this ?

I read from docs that slot cannot dropped if it is in use.
How to stop replication in this case.

Andrus.

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Andrus (#4)
Re: How to prevent master server crash if hot standby stops

On Mon, 2020-04-06 at 11:47 +0300, Andrus wrote:

That's why you use monitoring.
Never use replication slots without monitoring replication (or at least the
disk space on the primary).

How to implement this automatically, without human interaction required ?

"superuser_reserved_connections" setting exists.
How about implementing "data_reserved_space" setting ?

How to create procedure in server or maybe cron scipt which stops replication if disk becomes nearly full ?

How to force to stop replication slot in master if it is in use ?

You don't.

You reserve enough disk space that WAL can grow for a while before
filling the disk. Any you need to trigger human intervention with
monitoring.

If you prefer replication to fail silently, don't use replication
slots. Use "wal_keep_segments" instead.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#7Andrus
kobruleht2@hot.ee
In reply to: Laurenz Albe (#6)
Re: How to prevent master server crash if hot standby stops

Hi!

If you prefer replication to fail silently, don't use replication
slots. Use "wal_keep_segments" instead.

I desided to give 1 GB to wal. So I added

wal_keep_segments=60

After some time Postgres created 80 files with total size 1.3GB.

How to fix this so that no more than 1 GB of disk space is used ?
How to get information how may wal files are yet not processed by slave ?
How to delete processed wal files so that 1 GB of disk space can used for some other purposes ?

/var/lib/postgresql/12/main/pg_wal# ls
00000001000002A200000072 00000001000002A200000083 00000001000002A200000094 00000001000002A2000000A5 00000001000002A2000000B6
00000001000002A200000073 00000001000002A200000084 00000001000002A200000095 00000001000002A2000000A6 00000001000002A2000000B7
00000001000002A200000074 00000001000002A200000085 00000001000002A200000096 00000001000002A2000000A7 00000001000002A2000000B8
00000001000002A200000075 00000001000002A200000086 00000001000002A200000097 00000001000002A2000000A8 00000001000002A2000000B9
00000001000002A200000076 00000001000002A200000087 00000001000002A200000098 00000001000002A2000000A9 00000001000002A2000000BA
00000001000002A200000077 00000001000002A200000088 00000001000002A200000099 00000001000002A2000000AA 00000001000002A2000000BB
00000001000002A200000078 00000001000002A200000089 00000001000002A20000009A 00000001000002A2000000AB 00000001000002A2000000BC
00000001000002A200000079 00000001000002A20000008A 00000001000002A20000009B 00000001000002A2000000AC 00000001000002A2000000BD
00000001000002A20000007A 00000001000002A20000008B 00000001000002A20000009C 00000001000002A2000000AD 00000001000002A2000000BE
00000001000002A20000007B 00000001000002A20000008C 00000001000002A20000009D 00000001000002A2000000AE 00000001000002A2000000BF
00000001000002A20000007C 00000001000002A20000008D 00000001000002A20000009E 00000001000002A2000000AF 00000001000002A2000000C0
00000001000002A20000007D 00000001000002A20000008E 00000001000002A20000009F 00000001000002A2000000B0 00000001000002A2000000C1
00000001000002A20000007E 00000001000002A20000008F 00000001000002A2000000A0 00000001000002A2000000B1 archive_status
00000001000002A20000007F 00000001000002A200000090 00000001000002A2000000A1 00000001000002A2000000B2
00000001000002A200000080 00000001000002A200000091 00000001000002A2000000A2 00000001000002A2000000B3
00000001000002A200000081 00000001000002A200000092 00000001000002A2000000A3 00000001000002A2000000B4
00000001000002A200000082 00000001000002A200000093 00000001000002A2000000A4 00000001000002A2000000B5

Andrus.

#8Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Andrus (#7)
Re: How to prevent master server crash if hot standby stops

On Tue, 2020-04-07 at 00:50 +0300, Andrus wrote:

If you prefer replication to fail silently, don't use replication
slots. Use "wal_keep_segments" instead.

I desided to give 1 GB to wal. So I added

wal_keep_segments=60

After some time Postgres created 80 files with total size 1.3GB.

How to fix this so that no more than 1 GB of disk space is used ?
How to get information how may wal files are yet not processed by slave ?
How to delete processed wal files so that 1 GB of disk space can used for some other purposes ?

"wal_keep_segments" is the number of old WAL segments the server keeps around
for the standby. But there are also some segments that are created for future
use (the minimum is governed by "min_wal_size").

All these limits are not hard limits, the server will try to keep them
more or less. Consider that WAL segments are created as needed, but only
removed during checkpoints.

So, about your first question, you cannot.
Always make sure that there is more disk space available.

About your second question, you also cannot do that.
The primary server has no idea which standby server needs which WAL information.
All you can tell is where the currently connected standby servers are:

SELECT pg_walfile_name(flush_lsn) FROM pg_stat_replication;

About your third question, you *never* manually mess with the files in pg_wal.
The server does that.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#9Andrus
kobruleht2@hot.ee
In reply to: Laurenz Albe (#8)
Re: How to prevent master server crash if hot standby stops

Hi!

About your third question, you *never* manually mess with the files in pg_wal.
The server does that.

Is it OK to stop server, delete all files in pg_wal directory and re-start server ?

Or should default value put back and wait until server frees 1 GB disk space ?

Andrus.

#10Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Andrus (#9)
Re: How to prevent master server crash if hot standby stops

On Tue, 2020-04-07 at 12:05 +0300, Andrus wrote:

About your third question, you *never* manually mess with the files in pg_wal.
The server does that.

Is it OK to stop server, delete all files in pg_wal directory and re-start server ?

No.

Or should default value put back and wait until server frees 1 GB disk space ?

You'd have to set "wal_keep_segments" and "max_wal_size" lower, then have
normal database activity and wait until a couple of checkpoints have passed.

But frankly, 1 GB is ridiculously little. If you have to worry about that
much disk space, you're must be running PostgreSQL on a toaster.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com