sql query for postgres replication check
We would like to check the Postgres SYNC streaming replication status with Nagios using the same query on all servers (master + standby) and versions (9.6, 10, 12) for simplicity.
I came up with the following query which should return any apply lag in seconds.
select coalesce(replay_delay, 0) replication_delay_in_sec
from (
select datname,
(
select case
when received_lsn = latest_end_lsn then 0
else extract(epoch
from now() - latest_end_time)
end
from pg_stat_wal_receiver
) replay_delay
from pg_database
where datname = current_database()
) xview;
I would expect delays >0 in case SYNC or ASYNC replication is somehow behind. We will do a warning at 120 secs and critical at 300 secs.
Would this do the job or am I missing something here?
Thanks, Markus
On Fri, Nov 22, 2019 at 01:20:59PM +0000, Zwettler Markus (OIZ) wrote:
I came up with the following query which should return any apply lag in seconds.
select coalesce(replay_delay, 0) replication_delay_in_sec
from (
select datname,
(
select case
when received_lsn = latest_end_lsn then 0
else extract(epoch
from now() - latest_end_time)
end
from pg_stat_wal_receiver
) replay_delay
from pg_database
where datname = current_database()
) xview;I would expect delays >0 in case SYNC or ASYNC replication is
somehow behind. We will do a warning at 120 secs and critical at 300
secs.
pg_stat_wal_receiver is available only on the receiver, aka the
standby so it would not really be helpful on a primary. On top of
that streaming replication is system-wide, so there is no actual point
to look at databases either.
Would this do the job or am I missing something here?
Here is a suggestion for Nagios: hot_standby_delay, as told in
https://github.com/bucardo/check_postgres/blob/master/check_postgres.pl
--
Michael
On Fri, Nov 22, 2019 at 01:20:59PM +0000, Zwettler Markus (OIZ) wrote:
I came up with the following query which should return any apply lag in seconds.
select coalesce(replay_delay, 0) replication_delay_in_sec from (
select datname,
(
select case
when received_lsn = latest_end_lsn then 0
else extract(epoch
from now() - latest_end_time)
end
from pg_stat_wal_receiver
) replay_delay
from pg_database
where datname = current_database()
) xview;I would expect delays >0 in case SYNC or ASYNC replication is somehow
behind. We will do a warning at 120 secs and critical at 300 secs.pg_stat_wal_receiver is available only on the receiver, aka the standby so it would
not really be helpful on a primary. On top of that streaming replication is system-
wide, so there is no actual point to look at databases either.Would this do the job or am I missing something here?
Here is a suggestion for Nagios: hot_standby_delay, as told in
https://github.com/bucardo/check_postgres/blob/master/check_postgres.pl
--
Michael
I don't want to use check_hot_standby_delay as I would have to configure every streaming replication configuration separately with nagios.
I want a generic routine which I can load on any postgres server regardless of streaming replication or database role.
The query would return >0 if streaming replication falls behind and 0 in all other cases (replication or not).
Checking streaming replication per database doesn't make any sense to me.
Markus