Tutorials on high availability Postgresql setup?

Started by Andyover 15 years ago10 messagesgeneral
Jump to latest
#1Andy
angelflow@yahoo.com

Is there any tutorials or detailed instructions on how to set up HA postgresql & failover? The documentation (http://www.postgresql.org/docs/9.0/interactive/warm-standby-failover.html) on this topics is pretty scarce.

The scenario I'm most interested in is this:

2 servers - a master and a hot standby. All writes are sent to master, reads are split between master and hot standby.

1) If the hot standby goes down, how do I redirect reads to the master?
2) If the master fails
-how do I automatically promote the standby to master and send all reads/writes to the new master?
-what happens when the old master comes back up? Do I need to so anything to make it catches up to the new master?

Thanks.

#2Scott Marlowe
scott.marlowe@gmail.com
In reply to: Andy (#1)
Re: Tutorials on high availability Postgresql setup?

On Thu, Oct 7, 2010 at 12:27 AM, Andy <angelflow@yahoo.com> wrote:

Is there any tutorials or detailed instructions on how to set up HA postgresql & failover? The documentation (http://www.postgresql.org/docs/9.0/interactive/warm-standby-failover.html) on this topics is pretty scarce.

The scenario I'm most interested in is this:

2 servers - a master and a hot standby. All writes are sent to master, reads are split between master and hot standby.

To have true redundancy, you need 3 servers. Just saying. Otherwise
when one goes down, no more redundancy.

1) If the hot standby goes down, how do I redirect reads to the master?

Have a config file for your app that tells it where to go for reads
and writes. Change the config file to point reads at a different db
if a read slave fails. What constitutes a failed read slave is kind
of a business decision, so you'll likely have to write your own code
to decide what being down means.

2) If the master fails
   -how do I automatically promote the standby to master and send all reads/writes to the new master?

First you need to decide if you actually want automated failovers.
I've seen automated failovers cause as many problems as they were
supposed to fix, but it can be done. Keep in mind that on a two db
system, failing over means you lose redundancy. If your cluster fails
over on a lot of false positives, that's a lot of time with no
redundancy. If your script isn't written with having only one node in
mind, it might try to failover a second time with no read slave to
promote to master.

Also, you're going to have to come up with what constitutes a failed
master. 30 seconds non-responsive? 5 minutes? An hour? If the
problem is that the write master is simply overloaded, then failing
over isn't gonna solve anything, as the now newly promoted master is
going to collapse as well under even heavier load. It might have been
better to adjust the load factors used to determine where read queries
go to take load off of the master, or to change a setting in your app
that reduces load on the master. With an overloaded write master,
then failover, then overloaded even worse new write master you've got
a site down, no redundancy, and you need to rebuild your old master as
a read slave to handle the load.

To start with I do not recommend doing automatic failovers. Have a
system in place where your DBA / SA can promote a slave to master in
one or two easy steps, and if / when the master truly fails, then run
that script. A human can make that decision with far more care than a
piece of code.

   -what happens when the old master comes back up? Do I need to so anything to make it catches up to the new master?

You can't let the old master come back up as thinking it's the master
as well. You have to re-establish replication to it as a slave.
Again, this is usually not automated, at least not at first. The old
master needs to be "shot in the head" so to speak before it comes back
up, or your app may start writing to it instead of or as well as the
new master, and now you've got split-brain problems.

In short automated failover is complicated to get right, and if you
get it wrong the cost of the consequences can far worse than the 5 or
10 minutes of downtime required for a manual switch-over. First write
scripts that automate most of the task for your application and db
farm. Test those scripts as much as you can on a test farm. Then run
them when needed by hand when things go wrong. If or when you're
certain you've got all the bugs worked out and all the possible
failure scenarios worked out, you can start testing automated
failover.

#3Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Andy (#1)
Re: Tutorials on high availability Postgresql setup?

Is there any tutorials or detailed instructions on how to set up HA postgresql & failover? The documentation (http://www.postgresql.org/docs/9.0/interactive/warm-standby-failover.html) on this topics is pretty scarce.

The scenario I'm most interested in is this:

2 servers - a master and a hot standby. All writes are sent to master, reads are split between master and hot standby.

1) If the hot standby goes down, how do I redirect reads to the master?

pgpool-II 3.0 will take care of this.

2) If the master fails
-how do I automatically promote the standby to master and send all reads/writes to the new master?

This is covered by pgpool-II 3.0 as well.

-what happens when the old master comes back up? Do I need to so anything to make it catches up to the new master?

I recommend to use it a standby. Such a configuration is possible by
using pgpool-II 3.0.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#4Andy
angelflow@yahoo.com
In reply to: Tatsuo Ishii (#3)
Re: Tutorials on high availability Postgresql setup?
--- On Thu, 10/7/10, Tatsuo Ishii <ishii@postgresql.org> wrote:

The scenario I'm most interested in is this:

2 servers - a master and a hot standby. All writes are

sent to master, reads are split between master and hot
standby.

1) If the hot standby goes down, how do I redirect

reads to the master?

pgpool-II 3.0 will take care of this.

2) If the master fails
     -how do I automatically

promote the standby to master and send all reads/writes to
the new master?

This is covered by pgpool-II 3.0 as well.

     -what happens when the old

master comes back up? Do I need to so anything to make it
catches up to the new master?

I recommend to use it a standby. Such a configuration is
possible by
using pgpool-II 3.0.
--

Oh so I'd still need a proxy such as pgpool-II for HA setup?

I was thinking that with the new built-in replication in 9.0 there would be no need to use pgpool-II.

If pgpool is still necessary why not also use it for replication? What would be the advantages of using the 9.0's built-in replication as opposed to pgpool's replication?

#5John R Pierce
pierce@hogranch.com
In reply to: Andy (#4)
Re: Tutorials on high availability Postgresql setup?

On 10/07/10 12:53 AM, Andy wrote:

If pgpool is still necessary why not also use it for replication? What would be the advantages of using the 9.0's built-in replication as opposed to pgpool's replication?

pgpool's replication works quite differently than the built in. pgpool
replicates queries at the front end, while the built in replication
replicates block writes at the back end.

#6Andy
angelflow@yahoo.com
In reply to: Scott Marlowe (#2)
Re: Tutorials on high availability Postgresql setup?

Ah thanks for the explanation. I was hoping for an automated setup without the need to get paged 24/7.

So HA is still as hard as I thought it would be. I was hoping that with 9.0 things would be easier.

--- On Thu, 10/7/10, Scott Marlowe <scott.marlowe@gmail.com> wrote:
Show quoted text

From: Scott Marlowe <scott.marlowe@gmail.com>
Subject: Re: [GENERAL] Tutorials on high availability Postgresql setup?
To: "Andy" <angelflow@yahoo.com>
Cc: pgsql-general@postgresql.org
Date: Thursday, October 7, 2010, 3:24 AM
On Thu, Oct 7, 2010 at 12:27 AM, Andy
<angelflow@yahoo.com>
wrote:

Is there any tutorials or detailed instructions on how

to set up HA postgresql & failover? The documentation
(http://www.postgresql.org/docs/9.0/interactive/warm-standby-failover.html)
on this topics is pretty scarce.

The scenario I'm most interested in is this:

2 servers - a master and a hot standby. All writes are

sent to master, reads are split between master and hot
standby.

To have true redundancy, you need 3 servers.  Just
saying.  Otherwise
when one goes down, no more redundancy.

1) If the hot standby goes down, how do I redirect

reads to the master?

Have a config file for your app that tells it where to go
for reads
and writes.  Change the config file to point reads at
a different db
if a read slave fails.  What constitutes a failed read
slave is kind
of a business decision, so you'll likely have to write your
own code
to decide what being down means.

2) If the master fails
   -how do I automatically promote the standby to

master and send all reads/writes to the new master?

First you need to decide if you actually want automated
failovers.
I've seen automated failovers cause as many problems as
they were
supposed to fix, but it can be done.  Keep in mind
that on a two db
system, failing over means you lose redundancy.  If
your cluster fails
over on a lot of false positives, that's a lot of time with
no
redundancy.  If your script isn't written with having
only one node in
mind, it might try to failover a second time with no read
slave to
promote to master.

Also, you're going to have to come up with what constitutes
a failed
master.  30 seconds non-responsive?  5
minutes?  An hour?    If the
problem is that the write master is simply overloaded, then
failing
over isn't gonna solve anything, as the now newly promoted
master is
going to collapse as well under even heavier load.  It
might have been
better to adjust the load factors used to determine where
read queries
go to take load off of the master, or to change a setting
in your app
that reduces load on the master.  With an overloaded
write master,
then failover, then overloaded even worse new write master
you've got
a site down, no redundancy, and you need to rebuild your
old master as
a read slave to handle the load.

To start with I do not recommend doing automatic
failovers.  Have a
system in place where your DBA / SA can promote a slave to
master in
one or two easy steps, and if / when the master truly
fails, then run
that script.  A human can make that decision with far
more care than a
piece of code.

   -what happens when the old master comes back up?

Do I need to so anything to make it catches up to the new
master?

You can't let the old master come back up as thinking it's
the master
as well.  You have to re-establish replication to it
as a slave.
Again, this is usually not automated, at least not at
first.  The old
master needs to be "shot in the head" so to speak before it
comes back
up, or your app may start writing to it instead of or as
well as the
new master, and now you've got split-brain problems.

In short automated failover is complicated to get right,
and if you
get it wrong the cost of the consequences can far worse
than the 5 or
10 minutes of downtime required for a manual
switch-over.  First write
scripts that automate most of the task for your application
and db
farm.  Test those scripts as much as you can on a test
farm.  Then run
them when needed by hand when things go wrong.  If or
when you're
certain you've got all the bugs worked out and all the
possible
failure scenarios worked out, you can start testing
automated
failover.

#7Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Andy (#4)
Re: Tutorials on high availability Postgresql setup?

[Message body could not be decoded - encoding: 7bit, error: code converter not found (ISO-2022-JP-2 to UTF-8)]

#8Madison Kelly
linux@alteeve.com
In reply to: Andy (#1)
Re: Tutorials on high availability Postgresql setup?

On 10-10-07 02:27 AM, Andy wrote:

Is there any tutorials or detailed instructions on how to set up HA postgresql & failover? The documentation (http://www.postgresql.org/docs/9.0/interactive/warm-standby-failover.html) on this topics is pretty scarce.

The scenario I'm most interested in is this:

2 servers - a master and a hot standby. All writes are sent to master, reads are split between master and hot standby.

1) If the hot standby goes down, how do I redirect reads to the master?
2) If the master fails
-how do I automatically promote the standby to master and send all reads/writes to the new master?
-what happens when the old master comes back up? Do I need to so anything to make it catches up to the new master?

Thanks.

One option would be to create a simple 2-node cluster and run your PgSQL
server in a migrateable VM backed by a SAN or, if your budget is more
modest, a simple DRBD device.

Personally, I like to use RHCS (Red Hat Cluster Services) with a DRBD
array becking clustered LVM with Xen VMs (domU) on dedicated LVs. This
doesn't dictate the OS version hosting postgres though, so if you've got
a particular set of requirement, you can meet them.

Then in the case of planned outage, you can hot-migrate your VM to the
other node. In the case of catastrophic failure, the surviving node can
boot the PgSQL server and it would come back as if it had simply lost
power. In either case, you don't need to worry about IPs, special
configs or what have you.

--
Digimer
E-Mail: linux@alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin: http://nodeassassin.org

#9Leonardo Francalanci
m_lists@yahoo.it
In reply to: Madison Kelly (#8)
Re: Tutorials on high availability Postgresql setup?

One option would be to create a simple 2-node cluster and run your PgSQL
server in a migrateable VM backed by a SAN or, if your budget is more
modest, a simple DRBD device.

Personally, I like to use RHCS (Red Hat Cluster Services) with a DRBD
array becking clustered LVM with Xen VMs (domU) on dedicated LVs. This
doesn't dictate the OS version hosting postgres though, so if you've got
a particular set of requirement, you can meet them.

Then in the case of planned outage, you can hot-migrate your VM to the
other node. In the case of catastrophic failure, the surviving node can
boot the PgSQL server and it would come back as if it had simply lost
power.

I'm interested in the subject... can you explain a little more the setup?

Thank you

#10Steve Crawford
scrawford@pinpointresearch.com
In reply to: Andy (#6)
Re: Tutorials on high availability Postgresql setup?

On 10/07/2010 12:59 AM, Andy wrote:

Ah thanks for the explanation. I was hoping for an automated setup without the need to get paged 24/7.

So HA is still as hard as I thought it would be. I was hoping that with 9.0 things would be easier.

My 0.02.

Whether you need 3 servers (or 2 or 5 or even just 1) is a business
decision informed by technological constraints. HA/redundancy is
basically just like insurance - how much you should spend depends on the
frequency of failure (MTBF), how long it takes to repair a failure
(MTTR), and the cost to the business of downtime and/or data-loss. For
many businesses, a single server is fine but for others 3 isn't close to
enough. Based on many years of experience running PostgreSQL, I would
say that using a single-server option would have given us well over

99.9% availability. (I'm referring to failure-related downtime which

we just never see. Scheduled downtime for updates - especially with the
older releases - are a different story and a second server is very handy
for that.)

99.9% gives you over 8-hours/year downtime - generally pretty easy.
Trimming that downtime to a guaranteed 5-minute maximum (5-nines)
becomes exponentially more costly. (And if your primary worry is lost
sleep due to the 24x7 pager, consider that PostgreSQL properly set up on
quality hardware is pretty friggin reliable. I've lost sleep many times
- it just hasn't been PostgreSQL that caused it.)

Replication is one piece of HA and 9.0 does make that much easier (and
lets you to run read queries on the backup machine so it can earn its
keep). But how to fail from one machine to another is still up to you.

Cheers,
Steve