Monitoring Replication

Started by Brandon Phelpsover 14 years ago3 messagesgeneral
Jump to latest
#1Brandon Phelps
bphelps@gls.com

Hello all,

I use Nagios to monitor various things on a few servers and have recently set up a hot-standby server and would obviously like to include the state of streaming replication in my monitoring.

I know about the pg_stat_replication view on the master and the pg_last_xlog_receive_location() system function on the standby... and while there is no traffic I know that the values from the sent_location column from the master view should match the value returned by pg_last_xlog_receive_location on the standby. I also assume that if streaming replication fails completely the pg_stat_replication view on the master should simply return no records... so that should be easy to detect.

The confusion I have is how exactly can I determine just how far behind the replication is during loads? Currently with no traffic (servers not in production yet) sent_location on the master is "A/10018560" and pg_last_xlog_receive_location() on the standby also returns "A/10018560"... How far apart can these be for me to start worrying? I could make a bit more sense of all this if they were simple timestamps or something, but the hex values returned boggle my mind.

Any advice on these issues or other tips on monitoring the replication would be greatly appreciated.

Thanks,
Brandon

#2Mahlon E. Smith
mahlon@martini.nu
In reply to: Brandon Phelps (#1)
Re: Monitoring Replication

On Wed, Oct 12, 2011, Brandon Phelps wrote:

I use Nagios to monitor various things on a few servers and have
recently set up a hot-standby server and would obviously like to
include the state of streaming replication in my monitoring.

[...]

The confusion I have is how exactly can I determine just how far
behind the replication is during loads? Currently with no traffic
(servers not in production yet) sent_location on the master is
"A/10018560" and pg_last_xlog_receive_location() on the standby also
returns "A/10018560"... How far apart can these be for me to start
worrying? I could make a bit more sense of all this if they were
simple timestamps or something, but the hex values returned boggle my
mind.

Any advice on these issues or other tips on monitoring the replication
would be greatly appreciated.

Brandon: I'm using this script for Mon, you should be able to adapt it
to whatever language and monitoring system you please.

http://www.martini.nu/misc/db_replication.monitor.txt

--
Mahlon E. Smith
http://www.martini.nu/contact.html

#3Mark Keisler
qa4437@motorola.com
In reply to: Mahlon E. Smith (#2)
Re: Monitoring Replication

There is also http://bucardo.org/wiki/Check_postgres but I haven't been able
to get it to work for monitoring replication. I am using a similar custom
script as Mahlon, but written in perl. Looking at Mahlon's code has shown
me an error in how I have been thinking about calculating the replication
lag. Thanks :)

On Wed, Oct 12, 2011 at 3:28 PM, Mahlon E. Smith <mahlon@martini.nu> wrote:

Show quoted text

On Wed, Oct 12, 2011, Brandon Phelps wrote:

I use Nagios to monitor various things on a few servers and have
recently set up a hot-standby server and would obviously like to
include the state of streaming replication in my monitoring.

[...]

The confusion I have is how exactly can I determine just how far
behind the replication is during loads? Currently with no traffic
(servers not in production yet) sent_location on the master is
"A/10018560" and pg_last_xlog_receive_location() on the standby also
returns "A/10018560"... How far apart can these be for me to start
worrying? I could make a bit more sense of all this if they were
simple timestamps or something, but the hex values returned boggle my
mind.

Any advice on these issues or other tips on monitoring the replication
would be greatly appreciated.

Brandon: I'm using this script for Mon, you should be able to adapt it
to whatever language and monitoring system you please.

http://www.martini.nu/misc/db_replication.monitor.txt

--
Mahlon E. Smith
http://www.martini.nu/contact.html