Multiple Slave Failover with PITR

Started by Ken Brushalmost 14 years ago11 messages
#1Ken Brush
kbrush@gmail.com

Hello everyone,

I notice that the documentation at:
http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial

Doesn't contain steps in a Multiple Slave setup for re-establishing
them after a slave has become the new master.

Based on the documentation, here are the most fail-proof steps I came up with:

1. Master dies :(
2. Touch the trigger file on the most caught up slave.
3. Slave is now the new master :)
4. use pg_basebackup or other binary replication trick (rsync, tar
over ssh, etc...) to bring the other slaves up to speed with the new
master.
5. start the other slaves pointing to the new master.

But, that can take time (about 1-2 hours) with my medium sized DB
(580GB currently).

After testing a few different ideas that I gleaned from posts on the
mail list, I came up with this alternative method:

1. Master dies :(
2. Touch the trigger file on the most caught up slave
3. Slave is now the new master.
4. On the other slaves do the following:
5. Shutdown postgres on the slave
6. Delete every file in /data/pgsql/data/pg_xlog
7. Modify the recovery.conf file to point to the new master and
include the line "recovery_target_timeline='latest'"
8. Copy the history file from the new master to the slave (it's the
most recent #.history file in the xlog directory)
9. Startup postgres on the slave and watch it sync up to the new
master (about 1-5 minutes usually)

My question is this. Is the alternative method adequate? I tested it a
bit and couldn't find any problems with data loss or inconsistency.

I still use the fail-proof method above to re-incorporate the old
master as a new slave.

Sincerely,
-Ken

#2Albe Laurenz
laurenz.albe@wien.gv.at
In reply to: Ken Brush (#1)
Re: Multiple Slave Failover with PITR

Ken Brush wrote:

I notice that the documentation at:
http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial

Doesn't contain steps in a Multiple Slave setup for re-establishing
them after a slave has become the new master.

Based on the documentation, here are the most fail-proof steps I came

up with:

1. Master dies :(
2. Touch the trigger file on the most caught up slave.
3. Slave is now the new master :)
4. use pg_basebackup or other binary replication trick (rsync, tar
over ssh, etc...) to bring the other slaves up to speed with the new
master.
5. start the other slaves pointing to the new master.

But, that can take time (about 1-2 hours) with my medium sized DB
(580GB currently).

After testing a few different ideas that I gleaned from posts on the
mail list, I came up with this alternative method:

1. Master dies :(
2. Touch the trigger file on the most caught up slave
3. Slave is now the new master.
4. On the other slaves do the following:
5. Shutdown postgres on the slave
6. Delete every file in /data/pgsql/data/pg_xlog
7. Modify the recovery.conf file to point to the new master and
include the line "recovery_target_timeline='latest'"
8. Copy the history file from the new master to the slave (it's the
most recent #.history file in the xlog directory)
9. Startup postgres on the slave and watch it sync up to the new
master (about 1-5 minutes usually)

My question is this. Is the alternative method adequate? I tested it a
bit and couldn't find any problems with data loss or inconsistency.

That sounds like it should work fine.

Yours,
Laurenz Albe

#3Sergey Konoplev
gray.ru@gmail.com
In reply to: Albe Laurenz (#2)
Re: Multiple Slave Failover with PITR

On Wed, Mar 28, 2012 at 11:35 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

1. Master dies :(
2. Touch the trigger file on the most caught up slave

If the master was stopped properly will the slaves be in sync to each other?

3. Slave is now the new master.
4. On the other slaves do the following:
5. Shutdown postgres on the slave
6. Delete every file in /data/pgsql/data/pg_xlog
7. Modify the recovery.conf file to point to the new master and
include the line "recovery_target_timeline='latest'"
8. Copy the history file from the new master to the slave (it's the
most recent #.history file in the xlog directory)

It will work in the case of archive_command presence only and I will
need to sync the whole pg_xlog content if do not have archive_command
in recovery.conf, correct?

9. Startup postgres on the slave and watch it sync up to the new
master (about 1-5 minutes usually)

^^^

--
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray.ru@gmail.com Skype: gray-hemp

#4Ken Brush
kbrush@gmail.com
In reply to: Sergey Konoplev (#3)
Re: Multiple Slave Failover with PITR

On Wed, Apr 11, 2012 at 9:03 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:

On Wed, Mar 28, 2012 at 11:35 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

1. Master dies :(
2. Touch the trigger file on the most caught up slave

If the master was stopped properly will the slaves be in sync to each other?

I don't think you can guarantee that. Hence why you pick the most
caught up slave, it will catch the other slaves up to it's state, once
it becomes the master.

3. Slave is now the new master.
4. On the other slaves do the following:
5. Shutdown postgres on the slave
6. Delete every file in /data/pgsql/data/pg_xlog
7. Modify the recovery.conf file to point to the new master and
include the line "recovery_target_timeline='latest'"
8. Copy the history file from the new master to the slave (it's the
most recent #.history file in the xlog directory)

It will work in the case of archive_command presence only and I will
need to sync the whole pg_xlog content if do not have archive_command
in recovery.conf, correct?

The new master will sync out the WAL logs from pg_xlog that the slaves
need. The wal sender/receiver system is what I rely on for this.

Sincerely,
-Ken

#5Sergey Konoplev
gray.ru@gmail.com
In reply to: Ken Brush (#4)
Re: Multiple Slave Failover with PITR

On Wed, Apr 11, 2012 at 8:12 PM, Ken Brush <kbrush@gmail.com> wrote:

8. Copy the history file from the new master to the slave (it's the
most recent #.history file in the xlog directory)

It will work in the case of archive_command presence only and I will
need to sync the whole pg_xlog content if do not have archive_command
in recovery.conf, correct?

The new master will sync out the WAL logs from pg_xlog that the slaves
need. The wal sender/receiver system is what I rely on for this.

So you do not have archive_command in recovery.conf, do you?

--
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray.ru@gmail.com Skype: gray-hemp

#6Ken Brush
kbrush@gmail.com
In reply to: Sergey Konoplev (#5)
Re: Multiple Slave Failover with PITR

On Wed, Apr 11, 2012 at 9:50 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:

On Wed, Apr 11, 2012 at 8:12 PM, Ken Brush <kbrush@gmail.com> wrote:

8. Copy the history file from the new master to the slave (it's the
most recent #.history file in the xlog directory)

It will work in the case of archive_command presence only and I will
need to sync the whole pg_xlog content if do not have archive_command
in recovery.conf, correct?

The new master will sync out the WAL logs from pg_xlog that the slaves
need. The wal sender/receiver system is what I rely on for this.

So you do not have archive_command in recovery.conf, do you?

Correct, I do not.

-Ken

#7Sergey Konoplev
gray.ru@gmail.com
In reply to: Ken Brush (#6)
Re: Multiple Slave Failover with PITR

On Wed, Apr 11, 2012 at 9:08 PM, Ken Brush <kbrush@gmail.com> wrote:

On Wed, Apr 11, 2012 at 9:50 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:

On Wed, Apr 11, 2012 at 8:12 PM, Ken Brush <kbrush@gmail.com> wrote:

8. Copy the history file from the new master to the slave (it's the
most recent #.history file in the xlog directory)

It will work in the case of archive_command presence only and I will
need to sync the whole pg_xlog content if do not have archive_command
in recovery.conf, correct?

The new master will sync out the WAL logs from pg_xlog that the slaves
need. The wal sender/receiver system is what I rely on for this.

So you do not have archive_command in recovery.conf, do you?

Correct, I do not.

Okay, thank you. Now things are much more clean.

-Ken

--
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray.ru@gmail.com Skype: gray-hemp

#8Bruce Momjian
bruce@momjian.us
In reply to: Ken Brush (#1)
Re: [GENERAL] Multiple Slave Failover with PITR

Do we ever want to document a way to connect slaves to a new master,
rather than recreating the slave?

---------------------------------------------------------------------------

On Tue, Mar 27, 2012 at 10:47:48AM -0700, Ken Brush wrote:

Hello everyone,

I notice that the documentation at:
http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial

Doesn't contain steps in a Multiple Slave setup for re-establishing
them after a slave has become the new master.

Based on the documentation, here are the most fail-proof steps I came up with:

1. Master dies :(
2. Touch the trigger file on the most caught up slave.
3. Slave is now the new master :)
4. use pg_basebackup or other binary replication trick (rsync, tar
over ssh, etc...) to bring the other slaves up to speed with the new
master.
5. start the other slaves pointing to the new master.

But, that can take time (about 1-2 hours) with my medium sized DB
(580GB currently).

After testing a few different ideas that I gleaned from posts on the
mail list, I came up with this alternative method:

1. Master dies :(
2. Touch the trigger file on the most caught up slave
3. Slave is now the new master.
4. On the other slaves do the following:
5. Shutdown postgres on the slave
6. Delete every file in /data/pgsql/data/pg_xlog
7. Modify the recovery.conf file to point to the new master and
include the line "recovery_target_timeline='latest'"
8. Copy the history file from the new master to the slave (it's the
most recent #.history file in the xlog directory)
9. Startup postgres on the slave and watch it sync up to the new
master (about 1-5 minutes usually)

My question is this. Is the alternative method adequate? I tested it a
bit and couldn't find any problems with data loss or inconsistency.

I still use the fail-proof method above to re-incorporate the old
master as a new slave.

Sincerely,
-Ken

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#9Daniel Farina
daniel@heroku.com
In reply to: Bruce Momjian (#8)
Re: [GENERAL] Multiple Slave Failover with PITR

On Sun, Sep 2, 2012 at 5:12 AM, Bruce Momjian <bruce@momjian.us> wrote:

Do we ever want to document a way to connect slaves to a new master,
rather than recreating the slave?

Please, please please do so. And hopefully it'll be less tricky
sooner than later.

--
fdr

#10Sergey Konoplev
gray.ru@gmail.com
In reply to: Bruce Momjian (#8)
1 attachment(s)
Re: [GENERAL] Multiple Slave Failover with PITR

On Sun, Sep 2, 2012 at 4:12 PM, Bruce Momjian <bruce@momjian.us> wrote:

Do we ever want to document a way to connect slaves to a new master,
rather than recreating the slave?

I have written an instruction for myself and I am planning to publish
it on http://code.google.com/p/pgcookbook/. See the attachment.

Hope you will find it useful. If anybody would like to provide any
criticism I will highly appreciate it.

---------------------------------------------------------------------------

On Tue, Mar 27, 2012 at 10:47:48AM -0700, Ken Brush wrote:

Hello everyone,

I notice that the documentation at:
http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial

Doesn't contain steps in a Multiple Slave setup for re-establishing
them after a slave has become the new master.

Based on the documentation, here are the most fail-proof steps I came up with:

1. Master dies :(
2. Touch the trigger file on the most caught up slave.
3. Slave is now the new master :)
4. use pg_basebackup or other binary replication trick (rsync, tar
over ssh, etc...) to bring the other slaves up to speed with the new
master.
5. start the other slaves pointing to the new master.

But, that can take time (about 1-2 hours) with my medium sized DB
(580GB currently).

After testing a few different ideas that I gleaned from posts on the
mail list, I came up with this alternative method:

1. Master dies :(
2. Touch the trigger file on the most caught up slave
3. Slave is now the new master.
4. On the other slaves do the following:
5. Shutdown postgres on the slave
6. Delete every file in /data/pgsql/data/pg_xlog
7. Modify the recovery.conf file to point to the new master and
include the line "recovery_target_timeline='latest'"
8. Copy the history file from the new master to the slave (it's the
most recent #.history file in the xlog directory)
9. Startup postgres on the slave and watch it sync up to the new
master (about 1-5 minutes usually)

My question is this. Is the alternative method adequate? I tested it a
bit and couldn't find any problems with data loss or inconsistency.

I still use the fail-proof method above to re-incorporate the old
master as a new slave.

Sincerely,
-Ken

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +79160686204

Attachments:

instruction.txttext/plain; charset=US-ASCII; name=instruction.txtDownload
#11Robert Haas
robertmhaas@gmail.com
In reply to: Daniel Farina (#9)
Re: [GENERAL] Multiple Slave Failover with PITR

On Mon, Sep 3, 2012 at 2:58 AM, Daniel Farina <daniel@heroku.com> wrote:

On Sun, Sep 2, 2012 at 5:12 AM, Bruce Momjian <bruce@momjian.us> wrote:

Do we ever want to document a way to connect slaves to a new master,
rather than recreating the slave?

Please, please please do so. And hopefully it'll be less tricky
sooner than later.

Agreed. But let's make sure that whatever we document is, in fact, correct.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company