Mult-standby streaming replication master failover

Started by Vladislav Tchernevabout 11 years ago2 messagesgeneral
Jump to latest
#1Vladislav Tchernev
vtchernev@gmail.com

Hello All,
I am trying to figure out a clear way to preform a post-disaster (where the
master hasn't had the chance to shut down properly) master failover in a
multi-standby streaming replication setup. I found a couple of old posts
suggesting that the choice of the new master is very important and if not
done correctly might lead to database corruption.The posts suggest that the
first thing to do is to find the most caught-up replica, by compare WAL
replay locations among all standby servers. This makes perfect scene but
also makes the failover process a little bit more complicated and difficult
to automate. I guess my question is, does my understanding is right and if
there are any changes in this regard in the newer Postgresql 9.3/9.4
releases.

Best,
Vlad

#2Michael Paquier
michael@paquier.xyz
In reply to: Vladislav Tchernev (#1)
Re: Mult-standby streaming replication master failover

On Tue, Feb 10, 2015 at 1:26 AM, Vladislav Tchernev <vtchernev@gmail.com> wrote:

I am trying to figure out a clear way to preform a post-disaster (where the
master hasn't had the chance to shut down properly) master failover in a
multi-standby streaming replication setup. I found a couple of old posts
suggesting that the choice of the new master is very important and if not
done correctly might lead to database corruption.The posts suggest that the
first thing to do is to find the most caught-up replica, by compare WAL
replay locations among all standby servers. This makes perfect scene but
also makes the failover process a little bit more complicated and difficult
to automate. I guess my question is, does my understanding is right and if
there are any changes in this regard in the newer Postgresql 9.3/9.4
releases.

You are right to take this approach. One of the modifications that 9.3
brought is that a standby is able to follow a timeline switch, so by
promoting the most-updated standby all the other existing standbys
will be able to follow it just by redirecting their primary_conninfo
flow to the newly-promoted node. For the old master, you may as well
rewind it with pg_rewind if WAL forked (assuming that page checksum is
enabled for 9.3~ or that wal_log_hints is enabled in 9.4~).
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general