9.0 replication -- multiple hot_standby servers

Started by Dean Gibson (DB Administrator)over 15 years ago4 messagesgeneral
Jump to latest
#1Dean Gibson (DB Administrator)
postgresql@ultimeth.com

Oops; previously sent from the wrong eMail address, so I don't know if
this actually got sent:

Two days ago I upgraded five DB boxes (for load balancing) from 8.3.0 to
9.0.1 in order to use replication. The replication configuration went
reasonably well, and now all the four "hot_standby" servers are
(streaming) replicating just fine from the primary DB server. If the
primary fails and I "touch" the trigger file on one of the standby
boxes, that goes into primary mode just as it should. Of course, I have
to externally redirect updates to the new server.

My question is, how do I configure the other three (still) hot_standby
boxes to now use the new primary? Clearly I can change the
"recovery.conf" file on each standby box, but that seems like an
unnecessary nuisance.

What I would like to do is put multiple "primary_conninfo" lines into
the "recovery.conf" file, and have each standby server dynamically pick
the one that is the primary. Further, when the primary goes down, the
standby should re-choose the new primary using the multiple
"primary_conninfo" lines.

Such a feature (if it does not already exist) would really be useful !!!

Question: While the documentation described how to switch a standby
server from hot_standby to primary, I didn't see how to switch it back
to hot_standby. Is the following the best (or only) method ???

1. Optionally, bring the previous primary back up into hot_standby mode.
2. STOP the current primary server.
3. Using the trigger file on another hot-standby box, bring it into
primary mode.
4. Externally, start redirecting updates to the new primary server.
5. On the stopped server, delete the trigger file, and rename the
"recovery.done" file back to "recovery.conf".
6. Restart the stopped server; it should come back up into
hot_standby mode.

Ideally, this could be done seamlessly without the (small) possibility
of lost updates, but I don't see how to do that. Fortunately, it's not
a big issue in my case. Here's what I think would be *ideal* from an
update integrity standpoint:

1. If a hot_standby box gets a non-read-only SQL transaction, if
checks to see if it is still talking to a primary server:
* If it is, it does whatever is done now (I presume it reports
an error).
* If not, it "touches" the trigger file and switches into
primary mode. If there are other hot_standby servers,
ideally the new primary signals them that it is the new
primary (in order to keep them from accessing the old
primary, which may still be visible to them).
2. This way, an external application feeding SQL update-type
transactions, upon finding that the old primary is down, just
switches to any hot_standby server as its new target and retries
the update; this will automatically cause the desired hot_standby
server to switch to primary.
3. Since we don't know what the issue was with the the previous
primary, it needs to be recovered manually (and isolated from
other hot_standby servers in the meantime).

Sincerely, Dean

--
Mail to my list address MUST be sent via the mailing list.
All other mail to my list address will bounce.

#2Alan Hodgson
ahodgson@simkin.ca
In reply to: Dean Gibson (DB Administrator) (#1)
Re: 9.0 replication -- multiple hot_standby servers

On October 29, 2010, "Dean Gibson (DB Administrator)"
<postgresql@ultimeth.com> wrote:

My question is, how do I configure the other three (still) hot_standby
boxes to now use the new primary? Clearly I can change the
"recovery.conf" file on each standby box, but that seems like an
unnecessary nuisance.

I'm curious about this too. It seems that currently I'd have to rebuild any
additional slaves basically from scratch to use the new master.

--
A hybrid Escalade is missing the point much in the same way that having a
diet soda with your extra large pepperoni pizza is missing the point.

#3Dean Gibson (DB Administrator)
postgresql@ultimeth.com
In reply to: Alan Hodgson (#2)
Re: 9.0 replication -- multiple hot_standby servers

On 2010-10-29 11:17, Alan Hodgson wrote:

I'm curious about this too. It seems that currently I'd have to
rebuild any additional slaves basically from scratch to use the new
master.

I think so long as you "pointed" (via primary_conninfo) the additional
slaves to the new (pending) master, before you "touch"ed the pending
master's trigger file, you should be OK, as all the DBs should be in
sync at that point.

When the primary DB server goes down, updates are no longer accepted.
In such a situation, the human goal is to get the updates accepted
again, and human beings in a hurry are apt to forget the exact sequence
of steps to follow. If one forgets, and enables the new master without
"repointing" the other slaves, then you have a situation where you have
to backup/restore the new primary to each of the slaves, in order to
recover the slaves.

The failover shouldn't be that brittle.

A similar situation exists when having to reboot all the DB servers
(say, for maintenance), or just reboot one in a period where you can
afford to suspend updates: As near as I can tell (and I will find out
over the weekend), the correct procedure is to stop the primary FIRST,
and then reboot whatever slaves you need, and then reboot (if needed) or
restart the primary.

I wonder if this thread should be in the "ADMIN" group (and if so,
should I repost the original message there) ???

--
Mail to my list address MUST be sent via the mailing list.
All other mail to my list address will bounce.

#4Alan Hodgson
ahodgson@simkin.ca
In reply to: Dean Gibson (DB Administrator) (#3)
Re: 9.0 replication -- multiple hot_standby servers

On October 29, 2010, "Dean Gibson (DB Administrator)"
<postgresql@ultimeth.com> wrote:

On 2010-10-29 11:17, Alan Hodgson wrote:

I'm curious about this too. It seems that currently I'd have to
rebuild any additional slaves basically from scratch to use the new
master.

I think so long as you "pointed" (via primary_conninfo) the additional
slaves to the new (pending) master, before you "touch"ed the pending
master's trigger file, you should be OK, as all the DBs should be in
sync at that point.

Yeah they're in sync data-wise, but do they think they're the same WAL
stream for continuity? Would be nice.

--
A hybrid Escalade is missing the point much in the same way that having a
diet soda with your extra large pepperoni pizza is missing the point.