pg_last_xact_replay_timestamp lies

Started by Antonalmost 11 years ago5 messagesgeneral
Jump to latest
#1Anton
djeday84@gmail.com

Hello, dear guru =) help me to fix situation when no change made on
primary and standby show lag more than 15 minutes:
master :
postgres=# select * from pg_stat_replication
;
-[ RECORD 1 ]----+------------------------------
pid | 18553
usesysid | 117942
usename | replication
application_name | walreceiver
client_addr | 10.62.43.60
client_hostname |
client_port | 45281
backend_start | 2015-06-15 00:02:00.095195+03
state | streaming
*sent_location | 62/BC000000**
**write_location | 62/BC000000**
**flush_location | 62/BC000000**
**replay_location | 62/BC000000*
sync_priority | 0
sync_state | async

standby:
postgres=# SELECT (extract(epoch from now() -
pg_last_xact_replay_timestamp() ))::int AS slave_lag;
slave_lag
-----------
* 1030*

on master :
postgres=# select name,setting from pg_settings where lower (category)
like '%write%log%';
name | setting
------------------------------+-------------------------------------------------------------------
archive_command | cp -i %p
/var/lib/postgresql/9.3/main/pg_archivelog/%f </dev/null
archive_mode | on
archive_timeout | 600
checkpoint_completion_target | 0.7
checkpoint_segments | 32
checkpoint_timeout | 300
checkpoint_warning | 30
commit_delay | 0
commit_siblings | 5
fsync | on
full_page_writes | on
synchronous_commit | on
wal_buffers | 2048
wal_level | hot_standby
wal_sync_method | fdatasync
wal_writer_delay | 200
(16 rows)

#2Michael Paquier
michael@paquier.xyz
In reply to: Anton (#1)
Re: pg_last_xact_replay_timestamp lies

On Mon, Jun 15, 2015 at 8:30 AM, Anton Bushmelev <djeday84@gmail.com> wrote:

Hello, dear guru =) help me to fix situation when no change made on primary
and standby show lag more than 15 minutes:
master :
postgres=# select * from pg_stat_replication
;
-[ RECORD 1 ]----+------------------------------
pid | 18553
usesysid | 117942
usename | replication
application_name | walreceiver
client_addr | 10.62.43.60
client_hostname |
client_port | 45281
backend_start | 2015-06-15 00:02:00.095195+03
state | streaming
sent_location | 62/BC000000
write_location | 62/BC000000
flush_location | 62/BC000000
replay_location | 62/BC000000
sync_priority | 0
sync_state | async

standby:
postgres=# SELECT (extract(epoch from now() -
pg_last_xact_replay_timestamp() ))::int AS slave_lag;
slave_lag
-----------
1030

on master :
postgres=# select name,setting from pg_settings where lower (category) like
'%write%log%';
name | setting
------------------------------+-------------------------------------------------------------------
archive_command | cp -i %p
/var/lib/postgresql/9.3/main/pg_archivelog/%f </dev/null
archive_mode | on
archive_timeout | 600
checkpoint_completion_target | 0.7
checkpoint_segments | 32
checkpoint_timeout | 300
checkpoint_warning | 30
commit_delay | 0
commit_siblings | 5
fsync | on
full_page_writes | on
synchronous_commit | on
wal_buffers | 2048
wal_level | hot_standby
wal_sync_method | fdatasync
wal_writer_delay | 200
(16 rows)

Isn't your mistake the fact that you rely on the assumption that
replication lag measured in terms of timestamp is a good thing while
it should be estimated in terms of byte difference by comparing WAL
positions between the master and its standbys?
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Anton
djeday84@gmail.com
In reply to: Michael Paquier (#2)
Re: pg_last_xact_replay_timestamp lies

Hello, thank t for response, measure in bytes may bemore correct, but to
bring it to the customer? :) I think it is easier to say that the
standby database lags behind master no more than 15 minutes, than the
fact that it differs for 1 megabyte.
ps: sorry for my English

On 06/15/2015 02:57 AM, Michael Paquier wrote:

Isn't your mistake the fact that you rely on the assumption that
replication lag measured in terms of timestamp is a good thing while
it should be estimated in terms of byte difference by comparing WAL
positions between the master and its standbys?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Michael Paquier
michael@paquier.xyz
In reply to: Anton (#3)
Re: pg_last_xact_replay_timestamp lies

On Mon, Jun 15, 2015 at 9:04 AM, Anton Bushmelev <djeday84@gmail.com> wrote:

Hello, thank t for response, measure in bytes may bemore correct, but to
bring it to the customer? :) I think it is easier to say that the standby
database lags behind master no more than 15 minutes, than the fact that it
differs for 1 megabyte.
ps: sorry for my English

On 06/15/2015 02:57 AM, Michael Paquier wrote:

Isn't your mistake the fact that you rely on the assumption that
replication lag measured in terms of timestamp is a good thing while
it should be estimated in terms of byte difference by comparing WAL
positions between the master and its standbys?

Comparing pg_last_xact_replay_timestamp() with now() to measure
replication lag makes little sense: this function shows the timestamp
of the *last transaction replayed* during recovery (see here:
http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL
). Hence if your master server has no activity for a certain amount of
time, meaning that no transactions could be replayed on the standby,
this will continuously increase.
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Anton
djeday84@gmail.com
In reply to: Michael Paquier (#4)
Re: pg_last_xact_replay_timestamp lies

use this query:
SELECT
CASE
WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())::INTEGER
END
AS replication_lag;

found at
/messages/by-id/CADKbJJWz9M0swPT3oqe8f9+tfD4-F54uE6Xtkh4nERpVsQnjnw@mail.gmail.com

On 06/15/2015 04:24 AM, Michael Paquier wrote:

On Mon, Jun 15, 2015 at 9:04 AM, Anton Bushmelev <djeday84@gmail.com> wrote:

Hello, thank t for response, measure in bytes may bemore correct, but to
bring it to the customer? :) I think it is easier to say that the standby
database lags behind master no more than 15 minutes, than the fact that it
differs for 1 megabyte.
ps: sorry for my English

On 06/15/2015 02:57 AM, Michael Paquier wrote:

Isn't your mistake the fact that you rely on the assumption that
replication lag measured in terms of timestamp is a good thing while
it should be estimated in terms of byte difference by comparing WAL
positions between the master and its standbys?

Comparing pg_last_xact_replay_timestamp() with now() to measure
replication lag makes little sense: this function shows the timestamp
of the *last transaction replayed* during recovery (see here:
http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL
). Hence if your master server has no activity for a certain amount of
time, meaning that no transactions could be replayed on the standby,
this will continuously increase.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general