Negative replication lag?

Started by Quentin Hartmanalmost 13 years ago3 messagesgeneral
Jump to latest
#1Quentin Hartman
qhartman@direwolfdigital.com

I'm using this script to check my replication lag on my streaming
replication pairs with Nagios:

https://gist.github.com/jacobian/743942

It generally works fine, but will occasionally return a negative lag value
(-37kb for example) which of course causes it to throw an alarm, but is
total nonsense. I've been working on the assumption that it is some sort of
bug in the script, but in taking a quick look at it nothing jumps out at me.

Is there something in Postgres itself that could cause this to happen once
in awhile? Is it something to be concerned about? Is there a better way to
monitor this state?

Thanks!

QH

#2Andres Freund
andres@anarazel.de
In reply to: Quentin Hartman (#1)
Re: Negative replication lag?

On 2013-04-22 16:36:38 -0600, Quentin Hartman wrote:

I'm using this script to check my replication lag on my streaming
replication pairs with Nagios:

https://gist.github.com/jacobian/743942

It generally works fine, but will occasionally return a negative lag value
(-37kb for example) which of course causes it to throw an alarm, but is
total nonsense. I've been working on the assumption that it is some sort of
bug in the script, but in taking a quick look at it nothing jumps out at me.

Is there something in Postgres itself that could cause this to happen once
in awhile? Is it something to be concerned about? Is there a better way to
monitor this state?

Well, between the time pg_current_xlog_location() is run on the primary
and pg_last_xlog_replay_location() on the standby some time passes, so
its not all that unlikely that wal has been generated, streamed *and*
applied in that time. Given the short timeframe it only happens every
now and then.

Did you check the pg_stat_replication view on the primary?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Quentin Hartman
qhartman@direwolfdigital.com
In reply to: Andres Freund (#2)
Re: Negative replication lag?

Ah, that makes sense. I think I'll add some logic to the script that has it
get new data points if it comes up with a negative value.

Thanks for the insight.

QH

On Mon, Apr 22, 2013 at 5:11 PM, Andres Freund <andres@2ndquadrant.com>wrote:

Show quoted text

On 2013-04-22 16:36:38 -0600, Quentin Hartman wrote:

I'm using this script to check my replication lag on my streaming
replication pairs with Nagios:

https://gist.github.com/jacobian/743942

It generally works fine, but will occasionally return a negative lag

value

(-37kb for example) which of course causes it to throw an alarm, but is
total nonsense. I've been working on the assumption that it is some sort

of

bug in the script, but in taking a quick look at it nothing jumps out at

me.

Is there something in Postgres itself that could cause this to happen

once

in awhile? Is it something to be concerned about? Is there a better way

to

monitor this state?

Well, between the time pg_current_xlog_location() is run on the primary
and pg_last_xlog_replay_location() on the standby some time passes, so
its not all that unlikely that wal has been generated, streamed *and*
applied in that time. Given the short timeframe it only happens every
now and then.

Did you check the pg_stat_replication view on the primary?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services