Non-null values of recovery functions after promote or crash of primary
Hi,
Yesterday we (that's me and my colleague Ricardo Gomez) were working on
an issue where a monitoring script was returning increasing lag
information on a primary instead of a NULL value.
The query used involved the following functions (the function was
amended to work-around the issue I'm reporting here):
pg_last_wal_receive_lsn()
pg_last_wal_replay_lsn()
pg_last_xact_replay_timestamp()
Under normal circumstances we would expect to receive NULLs from all
three functions on a primary node, and code comments back up my thoughts.
The problem is, what if the node is a standby which was promoted without
restarting, or that had to perform crash recovery?
So during the time it's recovering the values in ` XLogCtl` are updated
with recovery information, and once the recovery finishes, due to crash
recovery reaching a consistent state, or a promotion of a standby
happening, those values are not reset to startup defaults.
That's when you start seeing non-null values returned by
`pg_last_wal_replay_lsn()`and `pg_last_xact_replay_timestamp()`.
Now, I don't know if we should call this a bug, or an undocumented
anomaly. We could fix the bug by resetting the values from ` XLogCtl`
after finishing recovery, or document that we might see non-NULL values
in certain cases.
Regards,
--
Mart�n Marqu�s http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Greetings,
* Martín Marqués (martin@2ndquadrant.com) wrote:
pg_last_wal_receive_lsn()
pg_last_wal_replay_lsn()
pg_last_xact_replay_timestamp()Under normal circumstances we would expect to receive NULLs from all
three functions on a primary node, and code comments back up my thoughts.
Agreed.
The problem is, what if the node is a standby which was promoted without
restarting, or that had to perform crash recovery?So during the time it's recovering the values in ` XLogCtl` are updated
with recovery information, and once the recovery finishes, due to crash
recovery reaching a consistent state, or a promotion of a standby
happening, those values are not reset to startup defaults.That's when you start seeing non-null values returned by
`pg_last_wal_replay_lsn()`and `pg_last_xact_replay_timestamp()`.Now, I don't know if we should call this a bug, or an undocumented
anomaly. We could fix the bug by resetting the values from ` XLogCtl`
after finishing recovery, or document that we might see non-NULL values
in certain cases.
IMV, and not unlike other similar cases I've talked about on another
thread, these should be cleared when the system is promoted as they're
otherwise confusing and nonsensical.
Thanks,
Stephen
Hi,
IMV, and not unlike other similar cases I've talked about on another
thread, these should be cleared when the system is promoted as they're
otherwise confusing and nonsensical.
Keep in mind that this also happens when the server crashes and has to
perform crash recovery. In that case the server was always a primary.
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services