Failover, Wal Logging, and Multiple Spares

Started by Bryan Murphyover 16 years ago4 messagesgeneral
Jump to latest
#1Bryan Murphy
bmurphy1976@gmail.com

Assuming we are running a Postgres instance that is shipping log files to 2
or more warm spares, is there a way I can fail over to one of the spares,
and have the second spare start receiving updates from the new master
without missing a beat? I can live with losing the old master, and at least
at the moment it would be a controlled failover, but I would like to to know
if it's possible during an uncontrolled failover as well (catastrophic
hardware failure).
Right now, we have just that setup, but every time I've failed over to the
new master, we've had to rebuild our spares from scratch and unfortunately
this is a multi-hour long process. We can't afford the risk of not having a
warm spare for that length of time. We're planning to move entirely to a
slony cluster, but I'd like to fail over to a more powerful machine before
we begin the slony migration as the current server is already overloaded.

Thanks,
Bryan

#2Bryan Murphy
bmurphy1976@gmail.com
In reply to: Bryan Murphy (#1)
Re: Failover, Wal Logging, and Multiple Spares

Ok, I've asked this a few times, but nobody ever responded. I think I
finally got it though, could somebody confirm my logic? Basically, you
setup a chain of servers, and when fails you replicate to the next link in
the chain, like so:
Master (A) --> Warm Standby (B) --> Warn Standby (C) --> etc.

Master Fails, now becomes:

Old Master (A) xxxxx> New Master (B) --> Warm Standby (C)

And, of course, you might have an additional replication chain from Master
(A) just in case you goof something up in the failover process, but that's
the basic idea.

Thanks,
Bryan

On Sun, Aug 16, 2009 at 9:35 PM, Bryan Murphy <bmurphy1976@gmail.com> wrote:

Show quoted text

Assuming we are running a Postgres instance that is shipping log files to 2
or more warm spares, is there a way I can fail over to one of the spares,
and have the second spare start receiving updates from the new master
without missing a beat? I can live with losing the old master, and at least
at the moment it would be a controlled failover, but I would like to to know
if it's possible during an uncontrolled failover as well (catastrophic
hardware failure).
Right now, we have just that setup, but every time I've failed over to the
new master, we've had to rebuild our spares from scratch and unfortunately
this is a multi-hour long process. We can't afford the risk of not having a
warm spare for that length of time. We're planning to move entirely to a
slony cluster, but I'd like to fail over to a more powerful machine before
we begin the slony migration as the current server is already overloaded.

Thanks,
Bryan

#3Yaroslav Tykhiy
yar@barnet.com.au
In reply to: Bryan Murphy (#2)
Re: Failover, Wal Logging, and Multiple Spares

On 18/08/2009, at 9:36 AM, Bryan Murphy wrote:

Ok, I've asked this a few times, but nobody ever responded. I think
I finally got it though, could somebody confirm my logic?
Basically, you setup a chain of servers, and when fails you
replicate to the next link in the chain, like so:

Master (A) --> Warm Standby (B) --> Warn Standby (C) --> etc.

Master Fails, now becomes:

Old Master (A) xxxxx> New Master (B) --> Warm Standby (C)

And, of course, you might have an additional replication chain from
Master (A) just in case you goof something up in the failover
process, but that's the basic idea.

Excuse me, but I fail to see how you are going to replicate from one
warm standby to another warm standby. I don't think PostgreSQL can do
that. That said, the idea of just partially degrading a warm standby
cluster by electing a new master node looked very attractive to me, too.

On Sun, Aug 16, 2009 at 9:35 PM, Bryan Murphy
<bmurphy1976@gmail.com> wrote:
Assuming we are running a Postgres instance that is shipping log
files to 2 or more warm spares, is there a way I can fail over to
one of the spares, and have the second spare start receiving updates
from the new master without missing a beat? I can live with losing
the old master, and at least at the moment it would be a controlled
failover, but I would like to to know if it's possible during an
uncontrolled failover as well (catastrophic hardware failure).

Right now, we have just that setup, but every time I've failed over
to the new master, we've had to rebuild our spares from scratch and
unfortunately this is a multi-hour long process. We can't afford
the risk of not having a warm spare for that length of time. We're
planning to move entirely to a slony cluster, but I'd like to fail
over to a more powerful machine before we begin the slony migration
as the current server is already overloaded.

Encouraged by Bruce Momjian, I tried and had some success in this
area. It was a controlled failover but it worked like a charm. An
obvious condition was that the warm standbys be in perfect sync; you
can't do the trick if some of them received the last WAL segment while
the others didn't.

Please see http://archives.postgresql.org/pgsql-general/2009-07/msg00215.php
for my report. Of course, questions and comments are welcome.

Cheers,
Yar

#4Bruce Momjian
bruce@momjian.us
In reply to: Yaroslav Tykhiy (#3)
Re: Failover, Wal Logging, and Multiple Spares

On Tue, Aug 18, 2009 at 1:25 AM, Yaroslav Tykhiy<yar@barnet.com.au> wrote:

Encouraged by Bruce Momjian, I tried and had some success in this area.  It
was a controlled failover but it worked like a charm.  An obvious condition
was that the warm standbys be in perfect sync; you can't do the trick if
some of them received the last WAL segment while the others didn't.

It seems like it should be possible to weaken this constraint. As long
as you're careful to fail over to the slave which is the furthest
ahead in replaying WAL. All the other slaves must switch to replaying
logs from the new master before the point where it took over.

This does seem like a very useful area to explore.

--
greg
http://mit.edu/~gsstark/resume.pdf