Failover architecture

Started by Reuven M. Lernerover 14 years ago5 messagesgeneral
Jump to latest
#1Reuven M. Lerner
reuven@lerner.co.il

<html style="direction: ltr;">
<head>

<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi, everyone.  I'm working on a project that is already using
PostgreSQL 9.0, including streaming replication.  I'm trying to help
them figure out a good architecture for ensuring stability and
failover under a variety of conditions, and wanted to ask the
community for suggestions and help.<br>
<br>
Basically, they have a mission-critical application that talks to
PostgreSQL, and which works quite well.  Because of the
mission-critical nature of the application, it has been implemented
twice, once at each data center.  The two data centers are connected
via a network connection; one PostgreSQL server acts as the master,
and the other acts as a (read-only) slave.  We're using pgpool in
the second data center (i.e., the one with the PostgreSQL
replication slave) to send all writes to the first data center
(i.e., the one with the PostgreSQL replication master), but to
balance reads across the two servers.<br>
<br>
This all works really well.  The automatic failover also works well,
such that when the master goes down, the slave is promoted to the
master, a bit of IP-address switching happens behind the scenes, and
things continue to hum along.<br>
<br>
So far, so good.  But we have a few questions:<br>
<ul>
<li>Once the slave has been promoted to master, we have a single
server, and a single point of failure.  Is there any simple way
to get the former master to become a slave?  I assume that it
would need to start the whole becoming-a-slave process from
scratch, invoking pg_start_backup(), copying files with rsync,
and then pg_stop_backup(), followed by connecting to the new
master.  But perhaps there's a shorter, easier way for a "fallen
master" to become a slave?  <br>
</li>
<li>Is there any easy, straightforward way for a "fallen master"
to re-take its position, demoting the promoted slave back to its
original position of slave?  (With little or no downtime, of
course.)  I assume not, but I just wanted to check; my guess is
that you have to just make it a slave, and then start to follow
the newly promoted master.<br>
</li>
<li>If the network connection between the two data centers goes
down, but if the computers are still up, we worry that we'll end
up with two masters -- the original master, as well as the
slave, which will (falsely) believe the master to be down, and
will thus promote itself to master.  Given that PostgreSQL
doesn't allow master-master synchronization, we're thinking of
using a heartbeat to check if the other computer is available,
in both directions -- and that if the master cannot detect the
slave, then it goes into a read-only mode of some sort.  Then,
when it detects the slave again, and can restart streaming, it
goes back into read-write mode.  Is there a way (other than
Bucardo, which doesn't seem to fit the bill for this project),
is there any way for us to merge whatever diffs might be on the
two servers, and then reconnect them in master-slave streaming
mode when communication is re-established?<br>
</li>
<li>Of course, Is there any easy way to do that?  If so, then what
happens when pgpool tries forward an INSERT to the master while
it's in read-only mode?  (For the record, I'm pretty sure that
there isn't any easy or obvious way to make a database
read-only, and that we can simulate read-only mode by adding
INSERT/UPDATE triggers on each of the four -- yes, only four --
tables in the database, silently ignoring data that's posted.  I
floated this with the project managers, and they were OK with
this idea -- but I wanted to double-check whether this is a
viable solution, or if there's an obvious pitfall I'm missing
and/or a better way to go about this.</li>
<li>If we use master-slave replication, and communication is cut
off, does the slave reconnect automatically?  I believe that the
answer is "yes," and that the replication will continue so long
as we're in the defined window for replication delays.</li>
</ul>
Thanks for any suggestions and answers that you can provide.  And of
course, if I've missed something obvious in the documentation, then
a pointer to the appropriate resource would be more than welcome.n<br>
<br>
Reuven<br>
<pre class="moz-signature" cols="72">--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner
</pre>
</body>
</html>

#2John R Pierce
pierce@hogranch.com
In reply to: Reuven M. Lerner (#1)
Re: Failover architecture

On 08/17/11 6:25 AM, Reuven M. Lerner wrote:

* Once the slave has been promoted to master, we have a single
server, and a single point of failure. Is there any simple way to
get the former master to become a slave? I assume that it would
need to start the whole becoming-a-slave process from scratch,
invoking pg_start_backup(), copying files with rsync, and then
pg_stop_backup(), followed by connecting to the new master. But
perhaps there's a shorter, easier way for a "fallen master" to
become a slave?

nope, thats pretty much what you have to do. if you use rsync, and
the files haven't changed too much, the replication should be relatively
fast.

* Is there any easy, straightforward way for a "fallen master" to
re-take its position, demoting the promoted slave back to its
original position of slave? (With little or no downtime, of
course.) I assume not, but I just wanted to check; my guess is
that you have to just make it a slave, and then start to follow
the newly promoted master.

what you said.

* If the network connection between the two data centers goes down,
but if the computers are still up, we worry that we'll end up with
two masters -- the original master, as well as the slave, which
will (falsely) believe the master to be down, and will thus
promote itself to master. Given that PostgreSQL doesn't allow
master-master synchronization, we're thinking of using a heartbeat
to check if the other computer is available, in both directions --
and that if the master cannot detect the slave, then it goes into
a read-only mode of some sort. Then, when it detects the slave
again, and can restart streaming, it goes back into read-write
mode. Is there a way (other than Bucardo, which doesn't seem to
fit the bill for this project), is there any way for us to merge
whatever diffs might be on the two servers, and then reconnect
them in master-slave streaming mode when communication is
re-established?

problematic in any sort of cluster system, you end up with two versions
of 'the truth' and you have to figure out how to reconcile them.
absolutely won't work at all with streaming replication, which requires
the two servers to be block by block the same. If you have to deal
with this sort of thing, you may want to do your OWN replication at an
application level, perhaps using some sort of messaging environment,
where you can queue up the pending "change requests"

* Of course, Is there any easy way to do that? If so, then what
happens when pgpool tries forward an INSERT to the master while
it's in read-only mode? (For the record, I'm pretty sure that
there isn't any easy or obvious way to make a database read-only,
and that we can simulate read-only mode by adding INSERT/UPDATE
triggers on each of the four -- yes, only four -- tables in the
database, silently ignoring data that's posted. I floated this
with the project managers, and they were OK with this idea -- but
I wanted to double-check whether this is a viable solution, or if
there's an obvious pitfall I'm missing and/or a better way to go
about this.

that sounds messy.

* If we use master-slave replication, and communication is cut off,
does the slave reconnect automatically? I believe that the answer
is "yes," and that the replication will continue so long as we're
in the defined window for replication delays.

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

#3Craig Ringer
craig@2ndquadrant.com
In reply to: Reuven M. Lerner (#1)
Re: Failover architecture

I can't help so much with the Pg replication specific parts, but this I
can answer:

On 17/08/2011 9:25 PM, Reuven M. Lerner wrote:

restart streaming, it goes back into read-write mode. Is there a
way (other than Bucardo, which doesn't seem to fit the bill for this
project), is there any way for us to merge whatever diffs might be
on the two servers, and then reconnect them in master-slave
streaming mode when communication is re-established?

Nope.

Merging diffs between two "forked" database timelines is not possible
with PostgreSQL's built-in replication. Pg does replication at the block
level, so there's no meaningful way to merge the changes.

Even if replication were done at the tuple level, how would you merge
changes where both forks INSERTed into a table with a sequence-generated
primary key? Or used an aggregate like sum(...) when generating content
for a new record?

Statement-level replication has similar issues. An app may calculate a
value that it includes in a query based on the result of a prior query
or might issue a different query depending on prior queries. This makes
it impossible to interleave and replay recorded statements when contact
is resumed and still get consistent, correct results.

It's a lot like the SERIALIZABLE transaction problem on a larger scale.
Often you can run two transactions in parallel and have them produce the
same results as they would've done when run serially. It's not possible
to guarantee this (without predicate locking and communication between
the transactions) though, which is why apps must be prepared for
serializable transactions to fail. Same deal when merging timelines,
except that you're dealing with long-committed transactions the app
_trusts_ the database to have successfully recorded.

The only way to do this sort of thing seems to be at the application
level. You can insert new keys with UUIDs to work around sequence
issues, etc, but you'll still have to handle delete collisions and
numerous other issues yourself. No-SQL folks may chime in with "<my-db>
magically fixes this" here, but all the cases I've seen so far just push
the problem back to the application to deal with rather than finding a
true solution for seamlessly merging forked timelines.

I suspect the only sane way to cope with these issues _reliably_ will be
to have your app _always_ run with the assumption that the other server
is unreachable, and always be synchronizing with the other server as it
goes. Otherwise you'll find that everything works great until your link
goes down, then it'll turn out that your clever merge-and-sync logic has
bugs that eat your data. Of course, you'll probably find that your DB
access logic becomes cumbersome and painful...

I can't help thinking that there must be some easy solution to this, but
I've never seen anyone solve the DB change merging problem properly.
Everyone who claims to turns out to have a "solution" with numerous
caveats and limitations - or numerous obvious flaws. Once you fork a
timeline where events may depend on the outcome of prior events, you
cannot guarantee that you can seamlessly merge them into a single
timeline where every event happens (or doesn't happen) in the same order
as it would've without the fork.

--
Craig Ringer

#4Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Reuven M. Lerner (#1)
Re: Failover architecture

<li>Of course, Is there any easy way to do that?  If so, then what
happens when pgpool tries forward an INSERT to the master while
it's in read-only mode?

Assuming "read-only mode" is a database running in read-only
transaction mode(like standby), you will get errors something like
this:

ERROR: cannot execute INSERT in a read-only transaction

  (For the record, I'm pretty sure that

there isn't any easy or obvious way to make a database
read-only, and that we can simulate read-only mode by adding
INSERT/UPDATE triggers on each of the four -- yes, only four --
tables in the database, silently ignoring data that's posted.  I
floated this with the project managers, and they were OK with
this idea -- but I wanted to double-check whether this is a
viable solution, or if there's an obvious pitfall I'm missing
and/or a better way to go about this.</li>

<li>If we use master-slave replication, and communication is cut
off, does the slave reconnect automatically?  I believe that the
answer is "yes," and that the replication will continue so long

Yes, as long as you turn on archive logging *and* keep enough archive
log segments.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#5Reuven M. Lerner
reuven@lerner.co.il
In reply to: Tatsuo Ishii (#4)
Re: Failover architecture

<html style="direction: ltr;">
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
<style>body p { margin-bottom: 0pt; margin-top: 0pt; } </style>
</head>
<body style="direction: ltr;"
bidimailui-detected-decoding-type="UTF-8" bgcolor="#FFFFFF"
text="#000000">
Thanks, Tatsuo, and others who commented so helpfully.  It's the
best of all worlds when I get confirmation that my feelings were
right, *and* I learn a lot of new things that I had never
considered, thanks to the generosity of this great community.<br>
<br>
Reuven<br>
</body>
</html>