Automatic Client Failover

Started by Simon Riggsover 17 years ago32 messageshackers
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

When primary server fails, it would be good if the clients connected to
the primary knew to reconnect to the standby servers automatically.

We might want to specify that centrally and then send the redirection
address to the client when it connects. Sounds like lots of work though.

Seems fairly straightforward to specify a standby connection service at
client level: .pgreconnect, or pgreconnect.conf
No config, then option not used.

Would work with various forms of replication.

Implementation would be to make PQreset() try secondary connection if
the primary one fails to reset. Of course you can program this manually,
but the feature is that you wouldn't need to, nor would you need to
request changes to 27 different interfaces either.

Good? Bad? Ugly?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#2Jonah H. Harris
jonah.harris@gmail.com
In reply to: Simon Riggs (#1)
Re: Automatic Client Failover

On Mon, Aug 4, 2008 at 5:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

When primary server fails, it would be good if the clients connected to
the primary knew to reconnect to the standby servers automatically.

This would be a nice feature which many people I've talked to have
asked for. In Oracle-land, it's called Transparent Application
Failover (TAF) and it gives you a lot of options, including the
ability to write your own callbacks when a failover is detected.

+1

--
Jonah H. Harris, Senior DBA
myYearbook.com

#3Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#1)
Re: Automatic Client Failover

On Monday 04 August 2008 14:08, Simon Riggs wrote:

When primary server fails, it would be good if the clients connected to
the primary knew to reconnect to the standby servers automatically.

We might want to specify that centrally and then send the redirection
address to the client when it connects. Sounds like lots of work though.

Seems fairly straightforward to specify a standby connection service at
client level: .pgreconnect, or pgreconnect.conf
No config, then option not used.

Well, it's less simple, but you can already do this with pgPool on the
client machine.

--
--Josh

Josh Berkus
PostgreSQL
San Francisco

#4Jonah H. Harris
jonah.harris@gmail.com
In reply to: Josh Berkus (#3)
Re: Automatic Client Failover

On Mon, Aug 4, 2008 at 5:39 PM, Josh Berkus <josh@agliodbs.com> wrote:

Well, it's less simple, but you can already do this with pgPool on the
client machine.

Yeah, but if you have tens or hundreds of clients, you wouldn't want
to be installing/managing a pgpool on each. Similarly, I think an
application should have the option of being notified of a connection
change; I know that wasn't in Simon's proposal, but I've found it
necessary in several applications which rely on things such as
temporary tables. You don't want the app just blowing up because a
table doesn't exist; you want to be able to handle it gracefully.

--
Jonah H. Harris, Senior DBA
myYearbook.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jonah H. Harris (#4)
Re: Automatic Client Failover

"Jonah H. Harris" <jonah.harris@gmail.com> writes:

On Mon, Aug 4, 2008 at 5:39 PM, Josh Berkus <josh@agliodbs.com> wrote:

Well, it's less simple, but you can already do this with pgPool on the
client machine.

Yeah, but if you have tens or hundreds of clients, you wouldn't want
to be installing/managing a pgpool on each.

Huh? The pgpool is on the server, not on the client side.

There is one really bad consequence of the oversimplified failover
design that Simon proposes, which is that clients might try to fail over
for reasons other than a primary server failure. (Think network
partition.) You really want any such behavior to be managed centrally,
IMHO.

regards, tom lane

#6Hannu Krosing
hannu@tm.ee
In reply to: Simon Riggs (#1)
Re: Automatic Client Failover

On Mon, 2008-08-04 at 22:08 +0100, Simon Riggs wrote:

When primary server fails, it would be good if the clients connected to
the primary knew to reconnect to the standby servers automatically.

We might want to specify that centrally and then send the redirection
address to the client when it connects. Sounds like lots of work though.

One way to do it is _outside_ of client, by having a separately managed
subnet for logical DB addresses. So when a failover occurs, then you
move that logical DB address to the new host, flush ARP caches and just
reconnect.

This also solves the case of inadvertent failover in case of unrelated
network failure.

Show quoted text

Seems fairly straightforward to specify a standby connection service at
client level: .pgreconnect, or pgreconnect.conf
No config, then option not used.

Would work with various forms of replication.

Implementation would be to make PQreset() try secondary connection if
the primary one fails to reset. Of course you can program this manually,
but the feature is that you wouldn't need to, nor would you need to
request changes to 27 different interfaces either.

Good? Bad? Ugly?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#7Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#5)
Re: Automatic Client Failover

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Le 5 août 08 à 01:13, Tom Lane a écrit :

There is one really bad consequence of the oversimplified failover
design that Simon proposes, which is that clients might try to fail
over
for reasons other than a primary server failure. (Think network
partition.) You really want any such behavior to be managed
centrally,
IMHO.

Then, what about having pgbouncer capability into -core. This would
probably mean, AFAIUI, than the listen()ing process would no longer
be postmaster but a specialized one, with the portable poll()/
select()/... process, that is now know as pgbouncer.

Existing pgbouncer would have to be expanded to:
- provide a backward compatible mode
(session pooling, release server session at client closing time)
- allow to configure several backend servers and to try next on
certain conditions
- add hooks for clients to know when some events happen
(failure of current master, automatic switchover, etc)

Existing pgbouncer hooks and next ones could be managed with catalog
tables as we have special options table for autovacuum, e.g.,
pg_connection_pool, which could contain arbitrary SQL for new backend
fork, backend closing, failover, switchover, etc; and maybe the client
hooks would be NOTIFY messages sent from the backend at its initiative.

Would we then have the centrally managed behavior Tom is mentioning?
I'm understanding this in 2 ways:
- this extension would be able to distinguish between failure cases
where we are able
to do an automatic failover from "hard" crashes (impacting the
listener)
- when we have read-only slave(s) pgbouncer will be able to redirect
ro statements to it.

Maybe it would even be useful to see about Markus' work in Postgres-R
and its inter-backend communication system allowing the executor to
require more than one backend working on a single query. The pgbouncer
inherited system would then be a pre-forked backend pooling manager
too...

Once more, I hope that giving (not so) random ideas here as a (not
yet) pgsql hacker is helping the project more than it's disturbing
real work...

Regards,
- --
dim

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkiXk5gACgkQlBXRlnbh1bkBhACfQdgHh27yGeyHgeCrC7aV1LET
U4IAn1N6FaanI2BEWMLyPWKmGtedaSQC
=ifVF
-----END PGP SIGNATURE-----

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dimitri Fontaine (#7)
Re: Automatic Client Failover

Dimitri Fontaine <dfontaine@hi-media.com> writes:

Le 5 ao�t 08 � 01:13, Tom Lane a �crit :

There is one really bad consequence of the oversimplified failover
design that Simon proposes, which is that clients might try to fail
over for reasons other than a primary server failure. (Think network
partition.) You really want any such behavior to be managed
centrally, IMHO.

Then, what about having pgbouncer capability into -core. This would
probably mean, AFAIUI, than the listen()ing process would no longer
be postmaster but a specialized one,

Huh? The problem case is that the primary server goes down, which would
certainly mean that a pgbouncer instance on the same machine goes with
it. So it seems to me that integrating pgbouncer is 100% backwards.

Failover that actually works is not something we can provide with
trivial changes to Postgres. It's really a major project in its
own right: you need heartbeat detection, STONITH capability,
IP address redirection, etc. I think we should be recommending
external failover-management project(s) instead of offering a
half-baked home-grown solution. Searching freshmeat for "failover"
finds plenty of potential candidates, but not having used any of
them I'm not sure which are worth closer investigation.

regards, tom lane

#9Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#8)
Re: Automatic Client Failover

Tom,

Failover that actually works is not something we can provide with
trivial changes to Postgres.

I think the proposal was for an extremely simple "works 75% of the time"
failover solution. While I can see the attraction of that, the
consequences of having failover *not* work are pretty severe.

On the other hand, we will need to deal with this for the built-in
replication project.

--
--Josh

Josh Berkus
PostgreSQL
San Francisco

#10daveg
daveg@sonic.net
In reply to: Jonah H. Harris (#2)
Re: Automatic Client Failover

On Mon, Aug 04, 2008 at 05:17:59PM -0400, Jonah H. Harris wrote:

On Mon, Aug 4, 2008 at 5:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

When primary server fails, it would be good if the clients connected to
the primary knew to reconnect to the standby servers automatically.

This would be a nice feature which many people I've talked to have
asked for. In Oracle-land, it's called Transparent Application
Failover (TAF) and it gives you a lot of options, including the
ability to write your own callbacks when a failover is detected.

This might be better done as part of a proxy server, eg pgbouncer, pgpool
than as part of postgresql or libpq. I like the concept, but the logic to
determine when a failover has occurred is complex and a client will often
not have access to enough information to make this determination accurately.

postgresql could have hooks to support this though, ie to determine when a
standby thinks it has become the master.

-dg

--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#9)
Re: Automatic Client Failover

Josh Berkus <josh@agliodbs.com> writes:

I think the proposal was for an extremely simple "works 75% of the time"
failover solution. While I can see the attraction of that, the
consequences of having failover *not* work are pretty severe.

Exactly. The point of failover (or any other HA feature) is to get
several nines worth of reliability. "It usually works" is simply
not playing in the right league.

On the other hand, we will need to deal with this for the built-in
replication project.

Nope, that's orthogonal. A failover solution depends on having a master
and a slave database, but it has nothing directly to do with how those
DBs are synchronized.

regards, tom lane

#12Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#11)
Re: Automatic Client Failover

On Mon, 2008-08-04 at 22:56 -0400, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

I think the proposal was for an extremely simple "works 75% of the time"
failover solution. While I can see the attraction of that, the
consequences of having failover *not* work are pretty severe.

Exactly. The point of failover (or any other HA feature) is to get
several nines worth of reliability. "It usually works" is simply
not playing in the right league.

Why would you all presume that I haven't thought about the things you
mention? Where did I say "...and this would be the only feature required
for full and correct HA failover." The post is specifically about Client
Failover, as the title clearly states.

Your comments were illogical anyway, since if it was so bad a technique
then it would not work for pgpool either, since it is also a client. If
pgpool can do this, why can't another client? Why can't *all* clients?

With correctly configured other components the primary will shut down if
it is no longer the boss. The client will then be disconnected. If it
switches to its secondary connection, we can have an option to read
session_replication_role to ensure that this is set to origin. This
covers the case where the client has lost connection with primary,
though it is still up, yet can reach the standby which has not changed
state.

DB2, SQLServer and Oracle all provide this feature, BTW. We don't need
to follow, but we should do that consciously. I'm comfortable with us
deciding not to do it, if that is our considered judgement.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#13Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#5)
Re: Automatic Client Failover

Greg

On 5-Aug-08, at 12:15 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

There is one really bad consequence of the oversimplified failover
design that Simon proposes, which is that clients might try to fail
over
for reasons other than a primary server failure. (Think network
partition.) You really want any such behavior to be managed
centrally,
IMHO.

The alternative to a cwnrallu managed failover system is one based
on a quorum system. At first glance it seems to me that would fit our
use case better. But the point remains that we would be better off
adopting a complete system than trying to reinvent one.

#14Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#8)
Re: Automatic Client Failover

Le mardi 05 août 2008, Tom Lane a écrit :

Huh? The problem case is that the primary server goes down, which would
certainly mean that a pgbouncer instance on the same machine goes with
it. So it seems to me that integrating pgbouncer is 100% backwards.

With all due respect, it seems to me you're missing an important piece of the
scheme here: I certainly failed to explain correctly. Of course, I'm not sure
(by and large) that detailing what I have in mind will answer your concerns,
but still...

What I have in mind is having the pgbouncer listening process both at master
and slave sites. So your clients can already connect to slave for normal
operations, and the listener process simply connects them to the master,
transparently.
When we later provider RO slave, some queries could be processed locally
instead of getting sent to the master.
The point being that the client does not have to care itself whether it's
connecting to a master or a slave, -core knows what it can handle for the
client and handles it (proxying the connection).

Now, that does not solve the client side automatic failover per-se, it's
another way to think about it:
- both master & slave accept connection in any mode
- master & slave are able to "speak" to each other (life link)
- when master knows it's crashing (elog(FATAL)), it can say so to the slave
- when said so, slave can switch to master

It obviously only catches some errors on master, the ones we're able to log
about. So it does nothing on its own for allowing HA in case of master crash.
But...

Failover that actually works is not something we can provide with
trivial changes to Postgres. It's really a major project in its
own right: you need heartbeat detection, STONITH capability,
IP address redirection, etc. I think we should be recommending
external failover-management project(s) instead of offering a
half-baked home-grown solution. Searching freshmeat for "failover"
finds plenty of potential candidates, but not having used any of
them I'm not sure which are worth closer investigation.

We have worked here with heartbeat, and automating failover is hard. Not for
technical reasons only, also because:
- current PostgreSQL offers no sync replication, switching means trading or
losing the D in ACID,
- you do not want to lose any commited data.

If 8.4 resolve this, failover implementation will be a lot easier.

What I see my proposal fit is the ability to handle a part of the smartness
in -core directly, so the hard part of the STONITH/failover/switchback could
be implemented in cooperation with -core, not playing tricks against it.

For example, switching back when master gets back online would only means for
the master to tell the slave to now redirect the queries to him as soon as
it's ready --- which still is the hard part, sync back data.

Having clients able to blindly connect to master or any slave and having the
current cluster topology smartness into -core would certainly help here, even
if not fullfilling all HA goals.

Of course, in the case of master hard crash, we still have to get sure it
won't restart on its own, and we have to have an external way to get a chosen
slave become the master.

I'm even envisioning than -core could help STONITH projects with having sth
like the recovery.conf file for the master to restart in not-up-to-date slave
mode. Whether we implement resyncing to the new master in -core or from
external scripts is another concern, but certainly -core could help here
(even if not in 8.4, of course).

I'm still thinking that this proposal has a place in the scheme of an
integrated HA solution and offers interresting bits.

Regards,
--
dim

#15Hannu Krosing
hannu@tm.ee
In reply to: Simon Riggs (#12)
Re: Automatic Client Failover

On Tue, 2008-08-05 at 07:52 +0100, Simon Riggs wrote:

On Mon, 2008-08-04 at 22:56 -0400, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

I think the proposal was for an extremely simple "works 75% of the time"
failover solution. While I can see the attraction of that, the
consequences of having failover *not* work are pretty severe.

Exactly. The point of failover (or any other HA feature) is to get
several nines worth of reliability. "It usually works" is simply
not playing in the right league.

Why would you all presume that I haven't thought about the things you
mention? Where did I say "...and this would be the only feature required
for full and correct HA failover." The post is specifically about Client
Failover, as the title clearly states.

I guess having the title "Automatic Client Failover" suggest to most
readers, that you are trying to solve the client side separately from
server.

Your comments were illogical anyway, since if it was so bad a technique
then it would not work for pgpool either, since it is also a client. If
pgpool can do this, why can't another client? Why can't *all* clients?

IIRC pgpool was itself a poor-mans replication solution, so it _is_ the
point of doing failover.

With correctly configured other components the primary will shut down if
it is no longer the boss. The client will then be disconnected. If it
switches to its secondary connection, we can have an option to read
session_replication_role to ensure that this is set to origin.

Probably this should not be an option, but a must.

maybe session_replication_role should be a DBA-defined function, so that
the same client failover mechanism can be applied to different
replication solutions, both server-built-in and external.

create function session_replication_role()
returns enum('master','ro-slave','please-wait-coming-online','...')
$$
...

This
covers the case where the client has lost connection with primary,
though it is still up, yet can reach the standby which has not changed
state.

DB2, SQLServer and Oracle all provide this feature, BTW. We don't need
to follow, but we should do that consciously. I'm comfortable with us
deciding not to do it, if that is our considered judgement.

The main argument seemed to be, that it can't be "Automatic Client-ONLY
Failover."

--------------
Hannu

#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Hannu Krosing (#15)
Re: Automatic Client Failover

On Tue, 2008-08-05 at 11:50 +0300, Hannu Krosing wrote:

On Tue, 2008-08-05 at 07:52 +0100, Simon Riggs wrote:

On Mon, 2008-08-04 at 22:56 -0400, Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

I think the proposal was for an extremely simple "works 75% of the time"
failover solution. While I can see the attraction of that, the
consequences of having failover *not* work are pretty severe.

Exactly. The point of failover (or any other HA feature) is to get
several nines worth of reliability. "It usually works" is simply
not playing in the right league.

Why would you all presume that I haven't thought about the things you
mention? Where did I say "...and this would be the only feature required
for full and correct HA failover." The post is specifically about Client
Failover, as the title clearly states.

I guess having the title "Automatic Client Failover" suggest to most
readers, that you are trying to solve the client side separately from
server.

Yes, that's right: separately. Why would anybody presume I meant "and by
the way you can turn off all other HA measures not mentioned here"? Not
mentioning a topic means no change or no impact in that area, at least
on all other hackers threads.

Your comments were illogical anyway, since if it was so bad a technique
then it would not work for pgpool either, since it is also a client. If
pgpool can do this, why can't another client? Why can't *all* clients?

IIRC pgpool was itself a poor-mans replication solution, so it _is_ the
point of doing failover.

Agreed.

With correctly configured other components the primary will shut down if
it is no longer the boss. The client will then be disconnected. If it
switches to its secondary connection, we can have an option to read
session_replication_role to ensure that this is set to origin.

Probably this should not be an option, but a must.

Perhaps, but some people doing read only queries don't really care which
one they are connected to.

maybe session_replication_role should be a DBA-defined function, so that
the same client failover mechanism can be applied to different
replication solutions, both server-built-in and external.

create function session_replication_role()
returns enum('master','ro-slave','please-wait-coming-online','...')
$$
...

Maybe, trouble is "please wait coming online" is the message a Hot
Standby would give also. Happy to list out all the states so we can make
this work for everyone.

This
covers the case where the client has lost connection with primary,
though it is still up, yet can reach the standby which has not changed
state.

DB2, SQLServer and Oracle all provide this feature, BTW. We don't need
to follow, but we should do that consciously. I'm comfortable with us
deciding not to do it, if that is our considered judgement.

The main argument seemed to be, that it can't be "Automatic Client-ONLY
Failover."

No argument. Never was. It can't be.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#17Markus Wanner
markus@bluegap.ch
In reply to: Tom Lane (#5)
Re: Automatic Client Failover

Hi,

Tom Lane wrote:

Huh? The pgpool is on the server, not on the client side.

Not necessarily. Having pgpool on the client side works just as well.

There is one really bad consequence of the oversimplified failover
design that Simon proposes, which is that clients might try to fail over
for reasons other than a primary server failure.

Why is that? It's just fine for a client to (re)connect to another
server due to a fluky connection to the current server. I had something
pretty similar in mind for Postgres-R. (Except that we should definitely
allow to specify more than just a primary and a secondary server.)

(Think network partition.)

Uh... well, yeah, of course the servers themselves need to exchange
their state and make sure they only accept clients if they are up and
running (as seen by the cluster). That's what the 'view' of a GCS is all
about. Or STONITH, for that matter.

You really want any such behavior to be managed centrally,
IMHO.

Controlling that client behavior reliably would involve using multiple
(at least N+1) connections to different servers, so you can control the
client even if N of the servers fail. That's certainly more complex than
what Simon proposed.

Speaking in terms of orthogonality, client failover is orthogonal to the
(cluster-wide) server state management. Which in turn is orthogonal to
how the nodes replicate data. (Modulo some side effects like nodes
lagging behind for async replication...)

Regards

Markus Wanner

#18Markus Wanner
markus@bluegap.ch
In reply to: Bruce Momjian (#13)
Re: Automatic Client Failover

Hi,

Greg Stark wrote:

a cwnrallu

What is that?

Regards

Markus Wanner

#19Markus Wanner
markus@bluegap.ch
In reply to: Simon Riggs (#16)
Re: Automatic Client Failover

Hi,

Simon Riggs wrote:

On Tue, 2008-08-05 at 11:50 +0300, Hannu Krosing wrote:

I guess having the title "Automatic Client Failover" suggest to most
readers, that you are trying to solve the client side separately from
server.

Yes, that's right: separately. Why would anybody presume I meant "and by
the way you can turn off all other HA measures not mentioned here"? Not
mentioning a topic means no change or no impact in that area, at least
on all other hackers threads.

I think the pgbouncer-in-core idea caused some confusion here.

IMO the client failover method is very to what DNS round-robin setups do
for webservers: even if clients might failover 'automatically', you
still have to maintain the server states (which servers do you list in
the DNS?) and care about 'replication' of your site to the webservers.

Regards

Markus Wanner

#20Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Markus Wanner (#17)
Re: Automatic Client Failover

Le mardi 05 août 2008, Markus Wanner a écrit :

 > (Think network partition.)

Uh... well, yeah, of course the servers themselves need to exchange
their state and make sure they only accept clients if they are up and
running (as seen by the cluster). That's what the 'view' of a GCS is all
about. Or STONITH, for that matter.

That's where I'm thinking that some -core smartness would makes this part
simpler, hence the confusion (sorry about that) on the thread.

If slave nodes were able to accept connection and redirect them to master, the
client wouldn't need to care about connecting to master or slave, just to
connect to a live node.

So the proposal for Automatic Client Failover becomes much more simpler.
--
dim

#21Markus Wanner
markus@bluegap.ch
In reply to: Dimitri Fontaine (#20)
#22Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Markus Wanner (#21)
#23Markus Wanner
markus@bluegap.ch
In reply to: Dimitri Fontaine (#22)
#24Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Markus Wanner (#23)
#25Markus Wanner
markus@bluegap.ch
In reply to: Dimitri Fontaine (#24)
#26Markus Wanner
markus@bluegap.ch
In reply to: Dimitri Fontaine (#24)
#27Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Markus Wanner (#26)
#28Markus Wanner
markus@bluegap.ch
In reply to: Dimitri Fontaine (#27)
#29Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#1)
#30Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#29)
#31Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#30)
#32Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#31)