Monitoring of a hot standby with a largely idle master

Started by Jeff Janesover 8 years ago11 messagesgeneral
Jump to latest
#1Jeff Janes
jeff.janes@gmail.com

I think that none of the recovery information functions (
https://www.postgresql.org/docs/9.6/static/functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE)
can distinguish a hot standby which is connected to an idle master, versus
one which is disconnected. For example, because the master has crashed, or
someone has changed the firewall rules.

Is there a way to monitor from SQL the last time the standby was able to
contact the master and initiate streaming with it? Other than trying to
write a function that parses it out of pg_log?

Cheers,

Jeff

#2Michael Paquier
michael@paquier.xyz
In reply to: Jeff Janes (#1)
Re: Monitoring of a hot standby with a largely idle master

On Thu, Jul 13, 2017 at 5:26 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

I think that none of the recovery information functions
(https://www.postgresql.org/docs/9.6/static/functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE)
can distinguish a hot standby which is connected to an idle master, versus
one which is disconnected. For example, because the master has crashed, or
someone has changed the firewall rules.

Is there a way to monitor from SQL the last time the standby was able to
contact the master and initiate streaming with it? Other than trying to
write a function that parses it out of pg_log?

Not directly I am afraid. One way I can think about is to poll
periodically the state of pg_stat_replication on the primary or
pg_stat_wal_receiver on the standby and save it in a custom table. The
past information is not persistent as any replication-related data in
catalogs is based on the shared memory state of the WAL senders and
the WAL receiver, and those are wiped out at reconnection.
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Jeff Janes
jeff.janes@gmail.com
In reply to: Michael Paquier (#2)
Re: Monitoring of a hot standby with a largely idle master

On Thu, Jul 13, 2017 at 1:15 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Thu, Jul 13, 2017 at 5:26 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

I think that none of the recovery information functions
(https://www.postgresql.org/docs/9.6/static/functions-admin.

html#FUNCTIONS-RECOVERY-INFO-TABLE)

can distinguish a hot standby which is connected to an idle master,

versus

one which is disconnected. For example, because the master has crashed,

or

someone has changed the firewall rules.

Is there a way to monitor from SQL the last time the standby was able to
contact the master and initiate streaming with it? Other than trying to
write a function that parses it out of pg_log?

Not directly I am afraid. One way I can think about is to poll
periodically the state of pg_stat_replication on the primary or
pg_stat_wal_receiver on the standby and save it in a custom table. The
past information is not persistent as any replication-related data in
catalogs is based on the shared memory state of the WAL senders and
the WAL receiver, and those are wiped out at reconnection.

Thanks, that looks like what I want (or will be, once I get the other side
to upgrade to 9.6).

I think that pg_stat_wal_receiver should be crossreferenced in
https://www.postgresql.org/docs/9.6/static/hot-standby.html, near the same
place which it crossreferences table 9-79. That would make it more
discoverable.

Cheers,

Jeff

#4Michael Paquier
michael@paquier.xyz
In reply to: Jeff Janes (#3)
Re: Monitoring of a hot standby with a largely idle master

On Thu, Jul 13, 2017 at 7:23 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Thu, Jul 13, 2017 at 1:15 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Thu, Jul 13, 2017 at 5:26 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

I think that none of the recovery information functions

(https://www.postgresql.org/docs/9.6/static/functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE)
can distinguish a hot standby which is connected to an idle master,
versus
one which is disconnected. For example, because the master has crashed,
or
someone has changed the firewall rules.

Is there a way to monitor from SQL the last time the standby was able to
contact the master and initiate streaming with it? Other than trying to
write a function that parses it out of pg_log?

Not directly I am afraid. One way I can think about is to poll
periodically the state of pg_stat_replication on the primary or
pg_stat_wal_receiver on the standby and save it in a custom table. The
past information is not persistent as any replication-related data in
catalogs is based on the shared memory state of the WAL senders and
the WAL receiver, and those are wiped out at reconnection.

Thanks, that looks like what I want (or will be, once I get the other side
to upgrade to 9.6).

I think that pg_stat_wal_receiver should be crossreferenced in
https://www.postgresql.org/docs/9.6/static/hot-standby.html, near the same
place which it crossreferences table 9-79. That would make it more
discoverable.

Hm. Hot standby may not involve streaming replication. What about a
paragraph here instead?
https://www.postgresql.org/docs/devel/static/warm-standby.html#streaming-replication

In the monitoring subsection, we could tell that on a standby the WAL
receiver status can be retrieved from this view when changes are
streamed. What do you think?
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Jeff Janes
jeff.janes@gmail.com
In reply to: Michael Paquier (#4)
Re: Monitoring of a hot standby with a largely idle master

On Thu, Jul 13, 2017 at 10:38 AM, Michael Paquier <michael.paquier@gmail.com

wrote:

On Thu, Jul 13, 2017 at 7:23 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

I think that pg_stat_wal_receiver should be crossreferenced in
https://www.postgresql.org/docs/9.6/static/hot-standby.html, near the

same

place which it crossreferences table 9-79. That would make it more
discoverable.

Hm. Hot standby may not involve streaming replication. What about a
paragraph here instead?
https://www.postgresql.org/docs/devel/static/warm-standby.html#streaming-
replication

In the monitoring subsection, we could tell that on a standby the WAL
receiver status can be retrieved from this view when changes are
streamed. What do you think?

That works for me.

Cheers,

Jeff

#6drum.lucas@gmail.com
drum.lucas@gmail.com
In reply to: Michael Paquier (#2)
Re: Monitoring of a hot standby with a largely idle master

2017-07-13 20:15 GMT+12:00 Michael Paquier <michael.paquier@gmail.com>:

On Thu, Jul 13, 2017 at 5:26 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

I think that none of the recovery information functions
(https://www.postgresql.org/docs/9.6/static/functions-

admin.html#FUNCTIONS-RECOVERY-INFO-TABLE)

can distinguish a hot standby which is connected to an idle master,

versus

one which is disconnected. For example, because the master has crashed,

or

someone has changed the firewall rules.

Is there a way to monitor from SQL the last time the standby was able to
contact the master and initiate streaming with it? Other than trying to
write a function that parses it out of pg_log?

Not directly I am afraid. One way I can think about is to poll
periodically the state of pg_stat_replication on the primary or
pg_stat_wal_receiver on the standby and save it in a custom table. The
past information is not persistent as any replication-related data in
catalogs is based on the shared memory state of the WAL senders and
the WAL receiver, and those are wiped out at reconnection.
--
Michael

That works for me too! I do this way... cron job runs that every X minutes
and if the replication lag is higher than 1 second it sends me an email..

It works pretty well and I used bash.

Lucas

#7Michael Paquier
michael@paquier.xyz
In reply to: Jeff Janes (#5)
Re: Monitoring of a hot standby with a largely idle master

On Fri, Jul 14, 2017 at 9:11 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Thu, Jul 13, 2017 at 10:38 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Jul 13, 2017 at 7:23 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

I think that pg_stat_wal_receiver should be crossreferenced in
https://www.postgresql.org/docs/9.6/static/hot-standby.html, near the
same
place which it crossreferences table 9-79. That would make it more
discoverable.

Hm. Hot standby may not involve streaming replication. What about a
paragraph here instead?

https://www.postgresql.org/docs/devel/static/warm-standby.html#streaming-replication

In the monitoring subsection, we could tell that on a standby the WAL
receiver status can be retrieved from this view when changes are
streamed. What do you think?

That works for me.

What do you think about the patch attached?

     <para>
      You can retrieve a list of WAL sender processes via the
-     <link linkend="monitoring-stats-views-table">
+     <link linkend="monitoring-stats-dynamic-views-table">
      <literal>pg_stat_replication</></link> view. Large differences between
In the previous paragraph I have noticed that the link reference is
incorrect. pg_stat_replication is listed under
monitoring-stats-dynamic-views-table.
-- 
Michael

Attachments:

walreceiver-doc.patchapplication/octet-stream; name=walreceiver-doc.patchDownload+9-1
#8Jeff Janes
jeff.janes@gmail.com
In reply to: Michael Paquier (#7)
Re: Monitoring of a hot standby with a largely idle master

On Sun, Jul 16, 2017 at 8:47 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Fri, Jul 14, 2017 at 9:11 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Thu, Jul 13, 2017 at 10:38 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Jul 13, 2017 at 7:23 PM, Jeff Janes <jeff.janes@gmail.com>

wrote:

I think that pg_stat_wal_receiver should be crossreferenced in
https://www.postgresql.org/docs/9.6/static/hot-standby.html, near the
same
place which it crossreferences table 9-79. That would make it more
discoverable.

Hm. Hot standby may not involve streaming replication. What about a
paragraph here instead?

https://www.postgresql.org/docs/devel/static/warm-

standby.html#streaming-replication

In the monitoring subsection, we could tell that on a standby the WAL
receiver status can be retrieved from this view when changes are
streamed. What do you think?

That works for me.

What do you think about the patch attached?

Looks OK. Should it mention specifically "On a hot standby" rather than
"On a standby"? Otherwise people might be left confused on how they are
supposed to do this on a generic standby. It is the kind of thing which is
obvious once you know it, but confusing the first time you encounter it.

<para>
You can retrieve a list of WAL sender processes via the
-     <link linkend="monitoring-stats-views-table">
+     <link linkend="monitoring-stats-dynamic-views-table">
<literal>pg_stat_replication</></link> view. Large differences
between
In the previous paragraph I have noticed that the link reference is
incorrect. pg_stat_replication is listed under
monitoring-stats-dynamic-views-table.

Yes, that is clearly wrong. But why not link directly to the description
of the view itself, pg-stat-replication-view, rather than the correct table
which mentions the view? Is that the accepted docs style to link to the
more generic place? (Same thing applies to your patch, it could link
directly to pg-stat-wal-receiver-view.

Sorry for the delay, it took me awhile to get the new doc build system to
work (solution seems to be, "Don't use CentOS6 anymore")

Cheers,

Jeff

#9Michael Paquier
michael@paquier.xyz
In reply to: Jeff Janes (#8)
Re: Monitoring of a hot standby with a largely idle master

On Mon, Jul 24, 2017 at 9:08 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Sun, Jul 16, 2017 at 8:47 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

What do you think about the patch attached?

Looks OK. Should it mention specifically "On a hot standby" rather than "On
a standby"? Otherwise people might be left confused on how they are
supposed to do this on a generic standby. It is the kind of thing which is
obvious once you know it, but confusing the first time you encounter it.

Yes, right. Let's update as you suggest.

<para>
You can retrieve a list of WAL sender processes via the
-     <link linkend="monitoring-stats-views-table">
+     <link linkend="monitoring-stats-dynamic-views-table">
<literal>pg_stat_replication</></link> view. Large differences
between
In the previous paragraph I have noticed that the link reference is
incorrect. pg_stat_replication is listed under
monitoring-stats-dynamic-views-table.

Yes, that is clearly wrong. But why not link directly to the description of
the view itself, pg-stat-replication-view, rather than the correct table
which mentions the view? Is that the accepted docs style to link to the
more generic place? (Same thing applies to your patch, it could link
directly to pg-stat-wal-receiver-view.

Yes, that's even better.

Sorry for the delay, it took me awhile to get the new doc build system to
work (solution seems to be, "Don't use CentOS6 anymore")

:)
Thanks for the review.
--
Michael

Attachments:

walreceiver-doc-v2.patchapplication/octet-stream; name=walreceiver-doc-v2.patchDownload+9-1
#10Jeff Janes
jeff.janes@gmail.com
In reply to: Michael Paquier (#9)
Re: Monitoring of a hot standby with a largely idle master

On Wed, Jul 26, 2017 at 1:44 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Mon, Jul 24, 2017 at 9:08 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Sun, Jul 16, 2017 at 8:47 AM, Michael Paquier <

michael.paquier@gmail.com>

wrote:

What do you think about the patch attached?

Looks OK. Should it mention specifically "On a hot standby" rather than

"On

a standby"? Otherwise people might be left confused on how they are
supposed to do this on a generic standby. It is the kind of thing which

is

obvious once you know it, but confusing the first time you encounter it.

Yes, right. Let's update as you suggest.

new version looks good.

Thanks,

Jeff

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Jeff Janes (#10)
Re: Monitoring of a hot standby with a largely idle master

On 7/27/17 16:14, Jeff Janes wrote:

On Wed, Jul 26, 2017 at 1:44 AM, Michael Paquier
<michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>> wrote:

On Mon, Jul 24, 2017 at 9:08 PM, Jeff Janes <jeff.janes@gmail.com
<mailto:jeff.janes@gmail.com>> wrote:

On Sun, Jul 16, 2017 at 8:47 AM, Michael Paquier <michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>>
wrote:

What do you think about the patch attached?

Looks OK.  Should it mention specifically "On a hot standby" rather than "On
a standby"?  Otherwise people might be left confused on how they are
supposed to do this on a generic standby.  It is the kind of thing which is
obvious once you know it, but confusing the first time you encounter it.

Yes, right. Let's update as you suggest.

new version looks good.

committed

I changed to links to xrefs, which automatically generated the correct
target texts.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general