Standby node using replication slot not visible in pg_stat_replication while catching up

Started by Michael Paquierabout 12 years ago3 messageshackers

michael@paquier.xyz

about 12 years ago

Hi all,

I have been playing a bit with the replication slots, and I noticed a
weird behavior in such a scenario:
1) Create a master/slave cluster, and have slave use a replication slot
2) Stop the master
3) Create a certain amount of WAL, during my tests I played with 4~5GB of WAL
4) Restart the slave, it catches up with the WALs that master has
retained in pg_xlog.
I noticed that while the standby using the replication slot catches
up, it is not visible in pg_stat_replication on master. This makes
monitoring of the replication lag difficult to follow, particularly in
the case where the standby disconnects from the master. Once the
standby has caught up, it reappears once again in pg_stat_replication.
I didn't have a look at the code to see what is happening, but is this
behavior expected?
Regards,
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

about 12 years ago

In reply to: Michael Paquier (#1)

Re: Standby node using replication slot not visible in pg_stat_replication while catching up

Hi,

On 2014-03-10 21:06:53 +0900, Michael Paquier wrote:

I have been playing a bit with the replication slots, and I noticed a
weird behavior in such a scenario:
1) Create a master/slave cluster, and have slave use a replication slot
2) Stop the master
3) Create a certain amount of WAL, during my tests I played with 4~5GB of WAL
4) Restart the slave, it catches up with the WALs that master has
retained in pg_xlog.
I noticed that while the standby using the replication slot catches
up, it is not visible in pg_stat_replication on master. This makes
monitoring of the replication lag difficult to follow, particularly in
the case where the standby disconnects from the master. Once the
standby has caught up, it reappears once again in pg_stat_replication.
I didn't have a look at the code to see what is happening, but is this
behavior expected?

Does the use of replication slots actually alter the behaviour? I don't
see how the slot code could influence things to that degree here. Could
it be that it's just restoring code from the standby's pg_xlog or using
restore_command?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael@paquier.xyz

about 12 years ago

In reply to: Andres Freund (#2)

Re: Standby node using replication slot not visible in pg_stat_replication while catching up

On Mon, Mar 10, 2014 at 9:24 PM, Andres Freund <andres@2ndquadrant.com> wrote:

Hi,

On 2014-03-10 21:06:53 +0900, Michael Paquier wrote:

I have been playing a bit with the replication slots, and I noticed a
weird behavior in such a scenario:
1) Create a master/slave cluster, and have slave use a replication slot
2) Stop the master
3) Create a certain amount of WAL, during my tests I played with 4~5GB of WAL
4) Restart the slave, it catches up with the WALs that master has
retained in pg_xlog.
I noticed that while the standby using the replication slot catches
up, it is not visible in pg_stat_replication on master. This makes
monitoring of the replication lag difficult to follow, particularly in
the case where the standby disconnects from the master. Once the
standby has caught up, it reappears once again in pg_stat_replication.
I didn't have a look at the code to see what is happening, but is this
behavior expected?

Does the use of replication slots actually alter the behaviour? I don't
see how the slot code could influence things to that degree here. Could
it be that it's just restoring code from the standby's pg_xlog or using
restore_command?

Sorry for the noise, I'm feeling stupid. Yes the standby was using a
restore_command so it recovered the WAL from archives before reporting
activity back to master.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers