repmgr won't update witness after failover
Hey,
I have set up three nodes of postgresql 9.4 with repmgr in this way:
1. master - node1
2. standby - node2
3. witness - node3
Now I have set up the replication and the witness as it says here:
https://github.com/2ndQuadrant/repmgr/blob/master/FAILOVER.rst
Now when I do 'kill -9 $(pidof postmaster)' The witness detects that
something went wrong and fails over from node1 to node2
But when I setup the replication now to work from node2 to node1 and I kill
the postgresql process it doesn't failover and the repmgrd log shows the
following message:
unable to determine a valid master server; waiting 10 seconds to retry...
it seems that the witness doesn't know about the new standby server..
Has anyone got any idea about what am I doing wrong here?
Best regards,
Aviel Buskila
Hi Aviel,
you can use the 'show cluster' command to see the repmgr state before you
do the 2nd failover - make sure the node1 is indeed marked as replica.
After a failover the Master doesn't automatically attach to the new master
- you need to point him as a slave (standby follow - if possible...)
did you start the repmgrd on node1 after making it a replica of the new
master? (it needs 2 daemons to decide what to promote)
Regards,
- Jony
On Thu, Aug 13, 2015 at 1:29 PM, Aviel Buskila <aviel33@gmail.com> wrote:
Show quoted text
Hey,
I have set up three nodes of postgresql 9.4 with repmgr in this way:
1. master - node1
2. standby - node2
3. witness - node3Now I have set up the replication and the witness as it says here:
https://github.com/2ndQuadrant/repmgr/blob/master/FAILOVER.rstNow when I do 'kill -9 $(pidof postmaster)' The witness detects that
something went wrong and fails over from node1 to node2
But when I setup the replication now to work from node2 to node1 and I
kill the postgresql process it doesn't failover and the repmgrd log shows
the following message:
unable to determine a valid master server; waiting 10 seconds to retry...it seems that the witness doesn't know about the new standby server..
Has anyone got any idea about what am I doing wrong here?
Best regards,
Aviel Buskila
---------- Forwarded message ----------
From: Aviel Buskila <aviel33@gmail.com>
Date: 2015-08-13 15:43 GMT+03:00
Subject: Re: [GENERAL] repmgr won't update witness after failover
To: Jony Cohen <jony.cohenjo@gmail.com>
Hey,
I have just tried to start the repmgrd on the new standby after I have
fixed it as a standby and still this goes the same way.
from the message given in the repmgrd log in the witness server it seems
that he is not able to elect a new master because he can't see anyone .
I have check in the repl_nodes table in the witness and it shows:
witness node3
master node2
master node1
is there a way update the witness after the first failover?
2015-08-13 15:06 GMT+03:00 Jony Cohen <jony.cohenjo@gmail.com>:
Hi Aviel,
you can use the 'show cluster' command to see the repmgr state before you
do the 2nd failover - make sure the node1 is indeed marked as replica.
After a failover the Master doesn't automatically attach to the new master
- you need to point him as a slave (standby follow - if possible...)
did you start the repmgrd on node1 after making it a replica of the new
master? (it needs 2 daemons to decide what to promote)Regards,
- Jony
On Thu, Aug 13, 2015 at 1:29 PM, Aviel Buskila <aviel33@gmail.com> wrote:
Hey,
I have set up three nodes of postgresql 9.4 with repmgr in this way:
1. master - node1
2. standby - node2
3. witness - node3Now I have set up the replication and the witness as it says here:
https://github.com/2ndQuadrant/repmgr/blob/master/FAILOVER.rstNow when I do 'kill -9 $(pidof postmaster)' The witness detects that
something went wrong and fails over from node1 to node2
But when I setup the replication now to work from node2 to node1 and I
kill the postgresql process it doesn't failover and the repmgrd log shows
the following message:
unable to determine a valid master server; waiting 10 seconds to retry...it seems that the witness doesn't know about the new standby server..
Has anyone got any idea about what am I doing wrong here?
Best regards,
Aviel Buskila
Import Notes
Reply to msg id not found: CAB3=tTH2UsO5FF5T6RMnDza6WhDT+p862dD2uHC7smFxvuZLdQ@mail.gmail.com
Hey,
yes I did .. and still it wont fail back..
2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen <jony.cohenjo@gmail.com>:
Show quoted text
Hi, did you make the old master follow the new one using repmgr?
It doesn't update itself automatically...
From the looks of it repmgr thinks you have 2 masters - the old one
offline and the new one online.Regards,
Jony
Sent from my iPhoneOn 13 באוג׳ 2015, at 15:43, Aviel Buskila <aviel33@gmail.com> wrote:
Hey,
I have just tried to start the repmgrd on the new standby after I have
fixed it as a standby and still this goes the same way.from the message given in the repmgrd log in the witness server it seems
that he is not able to elect a new master because he can't see anyone .I have check in the repl_nodes table in the witness and it shows:
witness node3
master node2
master node1is there a way update the witness after the first failover?
2015-08-13 15:06 GMT+03:00 Jony Cohen <jony.cohenjo@gmail.com>:
Hi Aviel,
you can use the 'show cluster' command to see the repmgr state before you
do the 2nd failover - make sure the node1 is indeed marked as replica.
After a failover the Master doesn't automatically attach to the new
master - you need to point him as a slave (standby follow - if possible...)
did you start the repmgrd on node1 after making it a replica of the new
master? (it needs 2 daemons to decide what to promote)Regards,
- JonyOn Thu, Aug 13, 2015 at 1:29 PM, Aviel Buskila <aviel33@gmail.com> wrote:
Hey,
I have set up three nodes of postgresql 9.4 with repmgr in this way:
1. master - node1
2. standby - node2
3. witness - node3Now I have set up the replication and the witness as it says here:
https://github.com/2ndQuadrant/repmgr/blob/master/FAILOVER.rstNow when I do 'kill -9 $(pidof postmaster)' The witness detects that
something went wrong and fails over from node1 to node2
But when I setup the replication now to work from node2 to node1 and I
kill the postgresql process it doesn't failover and the repmgrd log shows
the following message:
unable to determine a valid master server; waiting 10 seconds to retry...it seems that the witness doesn't know about the new standby server..
Has anyone got any idea about what am I doing wrong here?
Best regards,
Aviel Buskila
Import Notes
Reply to msg id not found: 77A5F297-EA08-439B-AFA1-469C37A755E0@gmail.com
El 14/08/15 a las 04:14, Aviel Buskila escribió:
Hey,
yes I did .. and still it wont fail back..
Can you send over the output of "repmgr cluster show" before and after
the failover process?
The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover
(you need to change repmgr_schema with what you have configured).
Also, which version of repmgr are you running?
2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen <jony.cohenjo@gmail.com>:
Hi, did you make the old master follow the new one using repmgr?
It doesn't update itself automatically...
From the looks of it repmgr thinks you have 2 masters - the old one
offline and the new one online.
Regards,
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
hey,
I have tried to set the configuration all over again, now the status of
'repl_nodes' before the failover is:
id | type | upstream_node_id | cluster | name | conninfo | priority | active
----+---------+---------------+------------------------------------------------------------+----------+---------
1 | master | | cluster_name |node1| host=node1
dbname=repmgr port=5432 user=repmgr | 100 | t
2 | standby| 1 | cluster_name |node2| host=node2
dbname=repmgr port=5432 user=repmgr | 100 | t
3 | witness| | cluster_name |node3| host=node3
dbname=repmgr port=5499 user=repmgr | 100 | t
repmgr is started on node2 and node3 (standby and witness) now when I kill
postgresmaster process I can see in the
repmgrd log the following messages:
[WARNING] connection to master has been lost, trying to recover... 60
seconds before failover decision
[WARNING] connection to master has been lost, trying to recover... 50
seconds before failover decision
[WARNING] connection to master has been lost, trying to recover... 40
seconds before failover decision
[WARNING] connection to master has been lost, trying to recover... 30
seconds before failover decision
[WARNING] connection to master has been lost, trying to recover... 20
seconds before failover decision
[WARNING] connection to master has been lost, trying to recover... 10
seconds before failover decision
and than when it tried to elect node2 to be promoted it shows the following
messages:
[DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr
fallback_application_name='repmgr''
[WARNING] unable to defermmine a valid master server; waiting 10 seconds to
retry...
[ERROR] unable to determine a valid master node, terminating...
[INFO] repmgrd terminating..
what am I doing wrong?
El 14/08/15 a las 04:14, Aviel Buskila escribió:
Hey,
yes I did .. and still it wont fail back..
Can you send over the output of "repmgr cluster show" before and after
the failover process?
The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover
(you need to change repmgr_schema with what you have configured).
Also, which version of repmgr are you running?
2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen <jony.cohenjo@gmail.com>:
Hi, did you make the old master follow the new one using repmgr?
It doesn't update itself automatically...
From the looks of it repmgr thinks you have 2 masters - the old one
offline and the new one online.
Regards,
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hey,
I think I know what the problem is,
after the first failover when I clone the old master to be standby with the
'repmgr standby clone' command it seems that nothing updates the repl_nodes
table with the new standby in my cluster so on the next failover the
repmgrd is failed to find a new upcoming standby to failover..
this issue is confirmed after that I manually updated the repl_nodes table
after the clone so that the old master is now a standby database.
now my question is:
Where does is suppose to happen that after I issue the 'repmgr standby
clone' the repl_nodes should be updated too about the new standby server?
Best regards,
Aviel Buskila
2015-08-16 12:11 GMT+03:00 Aviel Buskila <aviel33@gmail.com>:
Show quoted text
hey,
I have tried to set the configuration all over again, now the status of
'repl_nodes' before the failover is:id | type | upstream_node_id | cluster | name | conninfo | priority |
active----+---------+---------------+------------------------------------------------------------+----------+---------
1 | master | | cluster_name |node1| host=node1
dbname=repmgr port=5432 user=repmgr | 100 | t
2 | standby| 1 | cluster_name |node2| host=node2
dbname=repmgr port=5432 user=repmgr | 100 | t3 | witness| | cluster_name |node3| host=node3
dbname=repmgr port=5499 user=repmgr | 100 | trepmgr is started on node2 and node3 (standby and witness) now when I kill
postgresmaster process I can see in therepmgrd log the following messages:
[WARNING] connection to master has been lost, trying to recover... 60
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 50
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 40
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 30
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 20
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 10
seconds before failover decisionand than when it tried to elect node2 to be promoted it shows the
following messages:[DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr
fallback_application_name='repmgr''[WARNING] unable to defermmine a valid master server; waiting 10 seconds
to retry...[ERROR] unable to determine a valid master node, terminating...
[INFO] repmgrd terminating..
what am I doing wrong?
El 14/08/15 a las 04:14, Aviel Buskila escribió:
Hey,
yes I did .. and still it wont fail back..Can you send over the output of "repmgr cluster show" before and after
the failover process?The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover
(you need to change repmgr_schema with what you have configured).Also, which version of repmgr are you running?
2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen <jony.cohenjo@gmail.com
:Hi, did you make the old master follow the new one using repmgr?
It doesn't update itself automatically...
From the looks of it repmgr thinks you have 2 masters - the old one
offline and the new one online.Regards,
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
The clone command just clones the data from node2 to node1, you need to
also register it with the `force` option to override the old record. (as if
you're building a new replica node...)
see:
https://github.com/2ndQuadrant/repmgr#converting-a-failed-master-to-a-standby
Regards,
- Jony
On Sun, Aug 16, 2015 at 3:19 PM, Aviel Buskila <aviel33@gmail.com> wrote:
Show quoted text
Hey,
I think I know what the problem is,
after the first failover when I clone the old master to be standby with
the 'repmgr standby clone' command it seems that nothing updates the
repl_nodes table with the new standby in my cluster so on the next failover
the repmgrd is failed to find a new upcoming standby to failover..this issue is confirmed after that I manually updated the repl_nodes table
after the clone so that the old master is now a standby database.now my question is:
Where does is suppose to happen that after I issue the 'repmgr standby
clone' the repl_nodes should be updated too about the new standby server?Best regards,
Aviel Buskila2015-08-16 12:11 GMT+03:00 Aviel Buskila <aviel33@gmail.com>:
hey,
I have tried to set the configuration all over again, now the status of
'repl_nodes' before the failover is:id | type | upstream_node_id | cluster | name | conninfo | priority |
active----+---------+---------------+------------------------------------------------------------+----------+---------
1 | master | | cluster_name |node1| host=node1
dbname=repmgr port=5432 user=repmgr | 100 | t
2 | standby| 1 | cluster_name |node2| host=node2
dbname=repmgr port=5432 user=repmgr | 100 | t3 | witness| | cluster_name |node3| host=node3
dbname=repmgr port=5499 user=repmgr | 100 | trepmgr is started on node2 and node3 (standby and witness) now when I
kill postgresmaster process I can see in therepmgrd log the following messages:
[WARNING] connection to master has been lost, trying to recover... 60
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 50
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 40
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 30
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 20
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 10
seconds before failover decisionand than when it tried to elect node2 to be promoted it shows the
following messages:[DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr
fallback_application_name='repmgr''[WARNING] unable to defermmine a valid master server; waiting 10 seconds
to retry...[ERROR] unable to determine a valid master node, terminating...
[INFO] repmgrd terminating..
what am I doing wrong?
El 14/08/15 a las 04:14, Aviel Buskila escribió:
Hey,
yes I did .. and still it wont fail back..Can you send over the output of "repmgr cluster show" before and after
the failover process?The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover
(you need to change repmgr_schema with what you have configured).Also, which version of repmgr are you running?
2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen <jony.cohenjo@gmail.com
:Hi, did you make the old master follow the new one using repmgr?
It doesn't update itself automatically...
From the looks of it repmgr thinks you have 2 masters - the old one
offline and the new one online.Regards,
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hey,
Thanks for the reply, this helped me very much.
Kind Regards,
Aviel Buskila.
בתאריך 17 באוג' 2015 08:49, "Jony Cohen" <jony.cohenjo@gmail.com> כתב:
Show quoted text
Hi,
The clone command just clones the data from node2 to node1, you need to
also register it with the `force` option to override the old record. (as if
you're building a new replica node...)
see:https://github.com/2ndQuadrant/repmgr#converting-a-failed-master-to-a-standby
Regards,
- JonyOn Sun, Aug 16, 2015 at 3:19 PM, Aviel Buskila <aviel33@gmail.com> wrote:
Hey,
I think I know what the problem is,
after the first failover when I clone the old master to be standby with
the 'repmgr standby clone' command it seems that nothing updates the
repl_nodes table with the new standby in my cluster so on the next failover
the repmgrd is failed to find a new upcoming standby to failover..this issue is confirmed after that I manually updated the repl_nodes
table after the clone so that the old master is now a standby database.now my question is:
Where does is suppose to happen that after I issue the 'repmgr standby
clone' the repl_nodes should be updated too about the new standby server?Best regards,
Aviel Buskila2015-08-16 12:11 GMT+03:00 Aviel Buskila <aviel33@gmail.com>:
hey,
I have tried to set the configuration all over again, now the status of
'repl_nodes' before the failover is:id | type | upstream_node_id | cluster | name | conninfo | priority |
active----+---------+---------------+------------------------------------------------------------+----------+---------
1 | master | | cluster_name |node1| host=node1
dbname=repmgr port=5432 user=repmgr | 100 | t
2 | standby| 1 | cluster_name |node2| host=node2
dbname=repmgr port=5432 user=repmgr | 100 | t3 | witness| | cluster_name |node3| host=node3
dbname=repmgr port=5499 user=repmgr | 100 | trepmgr is started on node2 and node3 (standby and witness) now when I
kill postgresmaster process I can see in therepmgrd log the following messages:
[WARNING] connection to master has been lost, trying to recover... 60
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 50
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 40
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 30
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 20
seconds before failover decision[WARNING] connection to master has been lost, trying to recover... 10
seconds before failover decisionand than when it tried to elect node2 to be promoted it shows the
following messages:[DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr
fallback_application_name='repmgr''[WARNING] unable to defermmine a valid master server; waiting 10 seconds
to retry...[ERROR] unable to determine a valid master node, terminating...
[INFO] repmgrd terminating..
what am I doing wrong?
El 14/08/15 a las 04:14, Aviel Buskila escribió:
Hey,
yes I did .. and still it wont fail back..Can you send over the output of "repmgr cluster show" before and after
the failover process?The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover
(you need to change repmgr_schema with what you have configured).Also, which version of repmgr are you running?
2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen <
jony.cohenjo@gmail.com>:
Hi, did you make the old master follow the new one using repmgr?
It doesn't update itself automatically...
From the looks of it repmgr thinks you have 2 masters - the old one
offline and the new one online.Regards,
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services