replay doesn't catch up with receive on standby

Started by Steven Parkesabout 15 years ago5 messagesgeneral
Jump to latest
#1Steven Parkes
smparkes@smparkes.net

This is on 9.0.3: I've got two dbs running as standby to a main db. They start up fine and seem to think they're all caught up (by /var/log logs), but

SELECT pg_last_xlog_receive_location() AS receive, pg_last_xlog_replay_location() AS replay;

reports replay behind receive and it doesn't change. This is on both dbs.

Notably the main db isn't (wasn't) doing anything, so no new commits were causing things to move forward. I did a write to it and both slaves moved both their recieved and replay serial numbers up.

Is there a valid situation where an idle master/standby setup could remain with replay behind received indefinitely? (My nagios monitor isn't very happy with that (at present)) and before changing that I'd like to understand better what's going on.)

#2Fujii Masao
masao.fujii@gmail.com
In reply to: Steven Parkes (#1)
Re: replay doesn't catch up with receive on standby

On Tue, Apr 19, 2011 at 9:00 AM, Steven Parkes <smparkes@smparkes.net> wrote:

This is on 9.0.3: I've got two dbs running as standby to a main db. They start up fine and seem to think they're all caught up (by /var/log logs), but

SELECT pg_last_xlog_receive_location() AS receive, pg_last_xlog_replay_location() AS replay;

reports replay behind receive and it doesn't change. This is on both dbs.

Notably the main db isn't (wasn't) doing anything, so no new commits were causing things to move forward. I did a write to it and both slaves moved both their recieved and replay serial numbers up.

Is there a valid situation where an idle master/standby setup could remain with replay behind received indefinitely? (My nagios monitor isn't very happy with that (at present)) and before changing that I'd like to understand better what's going on.)

Did you run query on the standby? If yes, I guess that query conflict prevented
the reply location from advancing.
http://www.postgresql.org/docs/9.0/static/hot-standby.html#HOT-STANDBY-CONFLICT

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#3Steven Parkes
smparkes@smparkes.net
In reply to: Fujii Masao (#2)
Re: replay doesn't catch up with receive on standby

Did you run query on the standby?

Yup. Both standbys. They both responded the same way.

If yes, I guess that query conflict prevented
the reply location from advancing.
http://www.postgresql.org/docs/9.0/static/hot-standby.html#HOT-STANDBY-CONFLICT

The standbys were idle and this was a persistent state. I restarted the standbys and they stayed in this state. Am I missing something? I thought these conflicts were related to queries against the standbys but there shouldn't have been any that I'm aware. Certainly none should survive a restart ...

Am I missing something about the conflict?

It also seems notable that a new commit on the master cleared the issue ... Does that seem like the hot standby conflict case?

#4Fujii Masao
masao.fujii@gmail.com
In reply to: Steven Parkes (#3)
Re: replay doesn't catch up with receive on standby

On Tue, Apr 19, 2011 at 10:28 AM, Steven Parkes <smparkes@smparkes.net> wrote:

Did you run query on the standby?

Yup. Both standbys. They both responded the same way.

If yes, I guess that query conflict prevented
the reply location from advancing.
http://www.postgresql.org/docs/9.0/static/hot-standby.html#HOT-STANDBY-CONFLICT

The standbys were idle and this was a persistent state. I restarted the standbys and they stayed in this state. Am I missing something? I thought these conflicts were related to queries against the standbys but there shouldn't have been any that I'm aware. Certainly none should survive a restart ...

Am I missing something about the conflict?

It also seems notable that a new commit on the master cleared the issue ... Does that seem like the hot standby conflict case?

Probably no.

Was there idle-in-transaction in the master when the problem happened?
If yes, this can happen. In that case, only half of WAL record can be written
to the disk by walwriter and sent to the standby by walsender. The rest
will be written and sent after you'll have finished the transaction. In this
case, the receive location indicates the end of that WAL record obviously.
OTOH, since that half-baked WAL record cannot be replayed, the replay
location cannot advance and still has to indicate the end of previous WAL
record.

If you issue new commit, all of the WAL record is flushed to the standby.
So that WAL record was replayed and the replay location advanced. I guess
you observed the above situation.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#5Steven Parkes
smparkes@smparkes.net
In reply to: Fujii Masao (#4)
Re: replay doesn't catch up with receive on standby

Was there idle-in-transaction in the master when the problem happened?

Shouldn't have been, but that's what I was wondering, too. I didn't check. Not sure I know how to check.

That was my guess and I mostly wanted to confirm that that could happen. Does seem like an edge case. I don't expect uncommitted transactions to be hanging around in general, or even long periods between some kind of write.

Thanks for the help.