Fast Primary shutdown only after wal_sender_timeout

Started by Michael Banckover 9 years ago2 messagesgeneral
Jump to latest
#1Michael Banck
michael.banck@credativ.de

Hi,

I'm doing some failover tests on a 2-node streaming replication cluster
and shutting down the primary with 'pg_ctl -m fast' results in a timeout
of 50-60 seconds, pg_ctl returns only after the latter message:

<71804----2016-10-28 10:01:37.833 CEST-5808e5a4.1187c-transid:0>LOG:
database system is shut down
<62866-replicator-[unbekannt]-10.1.181.30(39609)-2016-10-28 10:02:27.963
CEST-581305b9.f592-transid:0>LOG: terminating walsender process due to
replication timeout

If I set wal_sender_timeout (it has been commented out so far, i.e. set
to 60 seconds) to something smaller like 10 seconds, I get a 10 second
delay. There are no users logged into either primary or standby, nor is
there any other activity. The hot_standby_feedback parameter is set to
'on'.

I would assume that the replication connection is shut down along with
the backends, but this seems to be not the case, is this expected?

This is on 9.5.4, self-compiled.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: Michael Banck (#1)
Re: Fast Primary shutdown only after wal_sender_timeout

Le 28 octobre 2016 12:40:24 GMT+02:00, Michael Banck <michael.banck@credativ.de> a écrit :

Hi,

I'm doing some failover tests on a 2-node streaming replication cluster
and shutting down the primary with 'pg_ctl -m fast' results in a
timeout
of 50-60 seconds, pg_ctl returns only after the latter message:

<71804----2016-10-28 10:01:37.833 CEST-5808e5a4.1187c-transid:0>LOG:
database system is shut down
<62866-replicator-[unbekannt]-10.1.181.30(39609)-2016-10-28
10:02:27.963
CEST-581305b9.f592-transid:0>LOG: terminating walsender process due to
replication timeout

If I set wal_sender_timeout (it has been commented out so far, i.e. set
to 60 seconds) to something smaller like 10 seconds, I get a 10 second
delay. There are no users logged into either primary or standby, nor is
there any other activity. The hot_standby_feedback parameter is set to
'on'.

I would assume that the replication connection is shut down along with
the backends, but this seems to be not the case, is this expected?

Yes, in normal situation. But the master ensure everything has been replicated to the connected standby before shutting down the connections.

It it hits wal_sender_timeout, maybe you have a badly disconnected standby not detected by the master? Maybe a secondary IP address moved away from the master before its shutdown ?

This is on 9.5.4, self-compiled.

Michael

/ioguix

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general