Negative replication lag?
I'm using this script to check my replication lag on my streaming
replication pairs with Nagios:
https://gist.github.com/jacobian/743942
It generally works fine, but will occasionally return a negative lag value
(-37kb for example) which of course causes it to throw an alarm, but is
total nonsense. I've been working on the assumption that it is some sort of
bug in the script, but in taking a quick look at it nothing jumps out at me.
Is there something in Postgres itself that could cause this to happen once
in awhile? Is it something to be concerned about? Is there a better way to
monitor this state?
Thanks!
QH
On 2013-04-22 16:36:38 -0600, Quentin Hartman wrote:
I'm using this script to check my replication lag on my streaming
replication pairs with Nagios:https://gist.github.com/jacobian/743942
It generally works fine, but will occasionally return a negative lag value
(-37kb for example) which of course causes it to throw an alarm, but is
total nonsense. I've been working on the assumption that it is some sort of
bug in the script, but in taking a quick look at it nothing jumps out at me.Is there something in Postgres itself that could cause this to happen once
in awhile? Is it something to be concerned about? Is there a better way to
monitor this state?
Well, between the time pg_current_xlog_location() is run on the primary
and pg_last_xlog_replay_location() on the standby some time passes, so
its not all that unlikely that wal has been generated, streamed *and*
applied in that time. Given the short timeframe it only happens every
now and then.
Did you check the pg_stat_replication view on the primary?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Ah, that makes sense. I think I'll add some logic to the script that has it
get new data points if it comes up with a negative value.
Thanks for the insight.
QH
On Mon, Apr 22, 2013 at 5:11 PM, Andres Freund <andres@2ndquadrant.com>wrote:
Show quoted text
On 2013-04-22 16:36:38 -0600, Quentin Hartman wrote:
I'm using this script to check my replication lag on my streaming
replication pairs with Nagios:https://gist.github.com/jacobian/743942
It generally works fine, but will occasionally return a negative lag
value
(-37kb for example) which of course causes it to throw an alarm, but is
total nonsense. I've been working on the assumption that it is some sortof
bug in the script, but in taking a quick look at it nothing jumps out at
me.
Is there something in Postgres itself that could cause this to happen
once
in awhile? Is it something to be concerned about? Is there a better way
to
monitor this state?
Well, between the time pg_current_xlog_location() is run on the primary
and pg_last_xlog_replay_location() on the standby some time passes, so
its not all that unlikely that wal has been generated, streamed *and*
applied in that time. Given the short timeframe it only happens every
now and then.Did you check the pg_stat_replication view on the primary?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services