how to switch old replication Master to new Standby after promoting old Standby

Started by John Lumbyabout 10 years ago10 messagesgeneral
Jump to latest
#1John Lumby
johnlumby@hotmail.com

From: johnlumby@hotmail.com
To: pgsql-general-owner@postgresql.org
Subject: how to switch old replication Master to new Standby after promoting old Standby
Date: Mon, 14 Mar 2016 13:23:29 -0400

In a scenario involving replication where no failure occurs but I want to interchange Master <->Standby,
*and* want to avoid making another full base backup and rsyncing it across,
I have found it easy to promote old Standby to new Master with pg_ctl promote,
but very difficult to restart the old Primary as a new Standby *without* performing new base backup.

Assume current 9.5.1 and using streaming replication with a named replication slot if relevant

Second assumption - I am able to temporarily prevent any relational updates to the database before I start the switchover
e.g. the
default_transaction_read_only = on
setting together with being able to control what transactions do if anything

So I reach a point where both systems have postgresql running without any replication,
both have identical content in all databases (that I can control),
and I am willing to tolerate short restarts if need be and also to scp/rsync the contents of pg_xlog
and other small files but *not* the entire cluster directory or any database base directories.

What do I do next?

Here is what I have found seems to work but I am not sure it is robust:

1. shut down both new Master and intended-to-be-new-Standby
2. on intended-to-be-new-Standby, remove the entire content of pg_xlog and the global/pg_control
3. from new Master , tar + scp the entire content of pg_xlog and the global/pg_control to intended-to-be-new-Standby
4. create intended-to-be-new-Standby's recovery.conf,
specifying recovery_target_timeline = 'latest'
(but I think it works with this setting omitted)
5. start new Master
6. start new Standby and up it comes
in its postgres log(**) I see
LOG: 00000: database system was shut down at 2016-03-14 16:41:24 GMT
LOCATION: StartupXLOG, xlog.c:5936
LOG: 00000: entering standby mode
LOCATION: StartupXLOG, xlog.c:6016
LOG: 00000: consistent recovery state reached at C/480000D0
LOCATION: CheckRecoveryConsistency, xlog.c:7493
LOG: 00000: invalid record length at C/480000D0
LOCATION: ReadRecord, xlog.c:3960
LOG: 00000: started streaming WAL from primary at C/48000000 on timeline 3
LOCATION: WalReceiverMain, walreceiver.c:358
LOG: 00000: redo starts at C/480000D0
LOCATION: StartupXLOG, xlog.c:6700

Is the invalid record length msg anything to worry about?

But this method is purely empirical. Is it robust? Anyone have any better recommendations?

Cheers, John Lumby

(**Note this log was from a pre-release 9.5, 9.5alpha2
I don't have 9.5.1 to hand at present

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Shulgin, Oleksandr
oleksandr.shulgin@zalando.de
In reply to: John Lumby (#1)
Re: how to switch old replication Master to new Standby after promoting old Standby

On Mon, Mar 14, 2016 at 6:28 PM, John Lumby <johnlumby@hotmail.com> wrote:

From: johnlumby@hotmail.com
To: pgsql-general-owner@postgresql.org
Subject: how to switch old replication Master to new Standby after
promoting old Standby
Date: Mon, 14 Mar 2016 13:23:29 -0400

In a scenario involving replication where no failure occurs but I want to
interchange Master <->Standby,
*and* want to avoid making another full base backup and rsyncing it across,
I have found it easy to promote old Standby to new Master with pg_ctl
promote,
but very difficult to restart the old Primary as a new Standby *without*
performing new base backup.

Assume current 9.5.1 and using streaming replication with a named
replication slot if relevant

Second assumption - I am able to temporarily prevent any relational
updates to the database before I start the switchover
e.g. the
default_transaction_read_only = on
setting together with being able to control what transactions do if
anything

So I reach a point where both systems have postgresql running without any
replication,
both have identical content in all databases (that I can control),
and I am willing to tolerate short restarts if need be and also to
scp/rsync the contents of pg_xlog
and other small files but *not* the entire cluster directory or any
database base directories.

What do I do next?

Here is what I have found seems to work but I am not sure it is robust:

1. shut down both new Master and intended-to-be-new-Standby
2. on intended-to-be-new-Standby, remove the entire content of pg_xlog and
the global/pg_control
3. from new Master , tar + scp the entire content of pg_xlog and the
global/pg_control to intended-to-be-new-Standby

That does seem like a very risky strategy to me. Have you taken a look at
pg_rewind (which is now part of the distribution)?

--
Alex

#3John Lumby
johnlumby@hotmail.com
In reply to: Shulgin, Oleksandr (#2)
Re: how to switch old replication Master to new Standby after promoting old Standby

Many thanks Alex,
I had not noticed pg_rewind before.   The name pg_rewind is not exactly very descriptive
of what this utility does.    But it is exactly what I need,  thank-you.

I tried it an eventually got it to work but did hit one strange problem -
I ran it like this
pg_rewind -D /mnt/bluebild/pgrepl95 --source-server=host=10.19.0.21 port=5432 user=postgres password=postgres -P --debug

after which my postgresql server would not start and I discovered that the rewind had actually copied the source's postgresql.conf,
replacing the one on the target (i.e. the old primary).  

And indeed in its debug I found

received chunk for file "postgresql.conf", offset 0, size 16482
received chunk for file "postgresql.conf.20160314114055", offset 0, size 16464

And I now see in its description in the Doc that it intends to do this.
But why would it do that?

Maybe a note about it should be added to the wiki
https://wiki.postgresql.org/wiki/Streaming_Replication
    (not sure if I can)

Cheers,  and thanks,    John
________________________________

From: oleksandr.shulgin@zalando.de
Date: Mon, 14 Mar 2016 18:54:11 +0100
Subject: Re: [GENERAL] how to switch old replication Master to new
Standby after promoting old Standby
To: johnlumby@hotmail.com
CC: pgsql-general@postgresql.org

On Mon, Mar 14, 2016 at 6:28 PM, John Lumby
<johnlumby@hotmail.com<mailto:johnlumby@hotmail.com>> wrote:

From: johnlumby@hotmail.com<mailto:johnlumby@hotmail.com>
To:
pgsql-general-owner@postgresql.org<mailto:pgsql-general-owner@postgresql.org>
Subject: how to switch old replication Master to new Standby after
promoting old Standby
Date: Mon, 14 Mar 2016 13:23:29 -0400

In a scenario involving replication where no failure occurs but I want
to interchange Master <->Standby,
*and* want to avoid making another full base backup and rsyncing it across,
I have found it easy to promote old Standby to new Master with pg_ctl
promote,
but very difficult to restart the old Primary as a new Standby
*without* performing new base backup.

Assume current 9.5.1 and using streaming replication with a named
replication slot if relevant

Second assumption - I am able to temporarily prevent any relational
updates to the database before I start the switchover
e.g. the
default_transaction_read_only = on
setting together with being able to control what transactions do if anything

So I reach a point where both systems have postgresql running without
any replication,
both have identical content in all databases (that I can control),
and I am willing to tolerate short restarts if need be and also to
scp/rsync the contents of pg_xlog
and other small files but *not* the entire cluster directory or any
database base directories.

What do I do next?

Here is what I have found seems to work but I am not sure it is robust:

1. shut down both new Master and intended-to-be-new-Standby
2. on intended-to-be-new-Standby, remove the entire content of pg_xlog
and the global/pg_control
3. from new Master , tar + scp the entire content of pg_xlog and the
global/pg_control to intended-to-be-new-Standby

That does seem like a very risky strategy to me. Have you taken a look
at pg_rewind (which is now part of the distribution)?

--
Alex

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Michael Paquier
michael@paquier.xyz
In reply to: Shulgin, Oleksandr (#2)
Re: how to switch old replication Master to new Standby after promoting old Standby

On Mon, Mar 14, 2016 at 6:54 PM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:

On Mon, Mar 14, 2016 at 6:28 PM, John Lumby <johnlumby@hotmail.com> wrote:

1. shut down both new Master and intended-to-be-new-Standby
2. on intended-to-be-new-Standby, remove the entire content of pg_xlog and
the global/pg_control
3. from new Master , tar + scp the entire content of pg_xlog and the
global/pg_control to intended-to-be-new-Standby

This is not robust and will corrupt your standby. Just take the case
of a relation data block modified on the to-be-new standby, and not
replayed since the last checkpoint before WAL forked: data will be
corrupted. Inconsistent pg_clog will likely break things.

That does seem like a very risky strategy to me. Have you taken a look at
pg_rewind (which is now part of the distribution)?

pg_rewind has been designed for that, and ensures that the
soon-to-be-standby has a minimum recovery target sufficient. You had
better use it.
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Michael Paquier
michael@paquier.xyz
In reply to: John Lumby (#3)
Re: Re: how to switch old replication Master to new Standby after promoting old Standby

On Mon, Mar 14, 2016 at 11:08 PM, John Lumby <johnlumby@hotmail.com> wrote:

And indeed in its debug I found
received chunk for file "postgresql.conf", offset 0, size 16482
received chunk for file "postgresql.conf.20160314114055", offset 0, size 16464

And I now see in its description in the Doc that it intends to do this.
But why would it do that?

To make its code more simple. This way there is no need to apply any
kind of file-based filters to decide if some files should be copied or
not, and it is not that much a big deal to copy the configuration
files of the target node before performing the rewind.

Maybe a note about it should be added to the wiki
https://wiki.postgresql.org/wiki/Streaming_Replication
(not sure if I can)

With a community account you could edit this page.
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6John Lumby
johnlumby@hotmail.com
In reply to: Michael Paquier (#5)
Re: how to switch old replication Master to new Standby after promoting old Standby

Thank you both for the advice.
pg_rewind is a nice utility and not only more robust than what I came up with
but also easier to use and avoids need to shut down new Primary.

Re editing the wiki,  I do have a community account but it seems I need more than that :

           ==>  Editing this wiki now requires "editor" privileges.

If anyone who has such privileges would like to edit the page,
here is what I would add to it  --  feel free to edit/rewrite

after the bullet

 . How to restart streaming replication after failover

and before the sub-bullet

    Repeat the operations from 6th;

add this:

    Starting with the old Standby now running as Unreplicated and the old Primary shut down but servicable,
    with its databases intact,  the task is to put the old Primary into Standby mode
    as rapidly and unintrusively as possible.
    This implies not requiring to shut new Primary down and not requiring to make another full base backup.
    A utility named pg_rewind makes this much simpler and more robust,   and it is included in standard
    postgresql distribution since 9.5.   -   it is documented under PostgreSQL Server Applications.

    To use pg_rewind :
     First and most important,   it is essential to have *previously* set the configuration parameter
          wal_log_hints = on
       in both the old Primary and old Standby,  *before* the failover.
       An alternative is described in the documentation but setting this parameter is simpler.
       If you did not set this or the alternative,   then ,  set it for future,
       and don't use pg_rewind this time.  See next.
     Secondly ,  note that pg_rewind will potentially update *every* file in the old Primary cluster,
       including configuration files.  It is likely that configuration files may not match exactly on the two systems,
       so make a copy of postgresql.conf and postgresql.auto.conf for later restore.
     Thirdly,  double-check that old Primary is shut down.
     Now run pg_rewind on old Primary using the form

          pg_rewind -D ${pg_cluster_dir} \
                    --source-server="host=${source_server_ip} port=${source_server_port} user=${replication_user} password=${replication_password}" \
                    -P

     You can add --debug if you want a blow-by-blow account of every change it makes.

     Now restore your configuration files,  first perhaps comparing what differences there were.
     Finally,  create the recovery.conf for the new Standby

     You can now start the new Standby.

    There are some limitations with pg_rewind described in documentation.
    If you could not use it or it failed,  then treat your old Primary as an empty cluster
    and commission it from the start as described next

Cheers,   John Lumby
----------------------------------------

Date: Mon, 14 Mar 2016 23:46:28 +0100
Subject: Re: [GENERAL] Re: how to switch old replication Master to new Standby after promoting old Standby
From: michael.paquier@gmail.com
To: johnlumby@hotmail.com
CC: oleksandr.shulgin@zalando.de; pgsql-general@postgresql.org

On Mon, Mar 14, 2016 at 11:08 PM, John Lumby <johnlumby@hotmail.com> wrote:

And indeed in its debug I found
received chunk for file "postgresql.conf", offset 0, size 16482
received chunk for file "postgresql.conf.20160314114055", offset 0, size 16464

And I now see in its description in the Doc that it intends to do this.
But why would it do that?

To make its code more simple. This way there is no need to apply any
kind of file-based filters to decide if some files should be copied or
not, and it is not that much a big deal to copy the configuration
files of the target node before performing the rewind.

Maybe a note about it should be added to the wiki
https://wiki.postgresql.org/wiki/Streaming_Replication
(not sure if I can)

With a community account you could edit this page.
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: John Lumby (#6)
Re: Re: how to switch old replication Master to new Standby after promoting old Standby

On 03/15/2016 06:54 AM, John Lumby wrote:

Thank you both for the advice.
pg_rewind is a nice utility and not only more robust than what I came up with
but also easier to use and avoids need to shut down new Primary.

Re editing the wiki, I do have a community account but it seems I need more than that :

==> Editing this wiki now requires "editor" privileges.

https://wiki.postgresql.org/wiki/WikiEditing
"Editing this wiki now requires "editor" privileges.

If you just created a new community account or if your current account
used to have "editor" privileges, you can ask on either the PostgreSQL
-www Mailinglist or the PostgreSQL IRC Channel for "editor" privileges.
Please include your community account name in those requests.

This is due, in large part, to recent spam activity. "

If anyone who has such privileges would like to edit the page,
here is what I would add to it -- feel free to edit/rewrite

after the bullet

. How to restart streaming replication after failover

and before the sub-bullet

Repeat the operations from 6th;

add this:

Starting with the old Standby now running as Unreplicated and the old Primary shut down but servicable,
with its databases intact, the task is to put the old Primary into Standby mode
as rapidly and unintrusively as possible.
This implies not requiring to shut new Primary down and not requiring to make another full base backup.
A utility named pg_rewind makes this much simpler and more robust, and it is included in standard
postgresql distribution since 9.5. - it is documented under PostgreSQL Server Applications.

To use pg_rewind :
First and most important, it is essential to have *previously* set the configuration parameter
wal_log_hints = on
in both the old Primary and old Standby, *before* the failover.
An alternative is described in the documentation but setting this parameter is simpler.
If you did not set this or the alternative, then , set it for future,
and don't use pg_rewind this time. See next.
Secondly , note that pg_rewind will potentially update *every* file in the old Primary cluster,
including configuration files. It is likely that configuration files may not match exactly on the two systems,
so make a copy of postgresql.conf and postgresql.auto.conf for later restore.
Thirdly, double-check that old Primary is shut down.
Now run pg_rewind on old Primary using the form

pg_rewind -D ${pg_cluster_dir} \
--source-server="host=${source_server_ip} port=${source_server_port} user=${replication_user} password=${replication_password}" \
-P

You can add --debug if you want a blow-by-blow account of every change it makes.

Now restore your configuration files, first perhaps comparing what differences there were.
Finally, create the recovery.conf for the new Standby

You can now start the new Standby.

There are some limitations with pg_rewind described in documentation.
If you could not use it or it failed, then treat your old Primary as an empty cluster
and commission it from the start as described next

Cheers, John Lumby
----------------------------------------

Date: Mon, 14 Mar 2016 23:46:28 +0100
Subject: Re: [GENERAL] Re: how to switch old replication Master to new Standby after promoting old Standby
From: michael.paquier@gmail.com
To: johnlumby@hotmail.com
CC: oleksandr.shulgin@zalando.de; pgsql-general@postgresql.org

On Mon, Mar 14, 2016 at 11:08 PM, John Lumby <johnlumby@hotmail.com> wrote:

And indeed in its debug I found
received chunk for file "postgresql.conf", offset 0, size 16482
received chunk for file "postgresql.conf.20160314114055", offset 0, size 16464

And I now see in its description in the Doc that it intends to do this.
But why would it do that?

To make its code more simple. This way there is no need to apply any
kind of file-based filters to decide if some files should be copied or
not, and it is not that much a big deal to copy the configuration
files of the target node before performing the rewind.

Maybe a note about it should be added to the wiki
https://wiki.postgresql.org/wiki/Streaming_Replication
(not sure if I can)

With a community account you could edit this page.
--
Michael

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8John Lumby
johnlumby@hotmail.com
In reply to: Adrian Klaver (#7)
Re: how to switch old replication Master to new Standby after promoting old Standby - pg_rewind log file missing

I have already hit a different problem with pg_rewind and would like to check my understanding.

The problem is,  on 9.5.1 ,  pg_rewind fails to find a log file that it thinks it needs :

pg_rewind -D ${pg_cluster_dir} --source-server='host=10.19.0.1 port=5432 xxx' -P --debug

_____________________________________________________
connected to server
fetched file "global/pg_control", length 8192
fetched file "pg_xlog/00000003.history", length 85
servers diverged at WAL position 0/2B000060 on timeline 2
could not open file "/mnt/septcomn/sysbuild/pgrepl95/pg_xlog/00000002000000000000002A": No such file or directory

could not find previous WAL record at 0/2A0034C8
Failure, exiting
______________________________________________________

Well,  my postgresql servers have archive off (I don't need archiving,  only replication)
and that particular file,    00000002000000000000002A  ,  is no longer present on either new Primary or this old Primary.
so it has gone for good.

But my question is,  given that the divergence point was 2B000060,
why is it looking for a file earlier than that?

Ah well ...

By the way,   this scenario was on a system which started off life as Standby (running ok),
was promoted to new Primary (ok) while the original Primary was recycled as new Standby (ok),
so one complete failover completed ok,  and just now I tried to flip them back to original roles -
current Standby promoted  (ok) but now hit failure trying to recycle current Primary back to Standby.
I don't know if this is significant    -  i.e. my second question is,
is it more likely that a needed(?) log file goes missing when performing the second failover in a sequence of flips?

pg_controldata output if relevant :

from current just-promoted Primary

pg_control version number:            942
Catalog version number:               201510051
Database system identifier:           6261999204763226337
Database cluster state:               in production
pg_control last modified:             Tue 15 Mar 2016 11:43:26 AM EDT
Latest checkpoint location:           0/33000098
Prior checkpoint location:            0/32000098
Latest checkpoint's REDO location:    0/33000060
Latest checkpoint's REDO WAL file:    000000030000000000000033
Latest checkpoint's TimeLineID:       3
Latest checkpoint's PrevTimeLineID:   3
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0/1624
Latest checkpoint's NextOID:          17820
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        616
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  1624
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Tue 15 Mar 2016 11:43:26 AM EDT
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    hot_standby
wal_log_hints setting:                on
max_connections setting:              100
max_worker_processes setting:         8
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0

from previous Primary I am trying and failing to rewind :

pg_control version number:            942
Catalog version number:               201510051
Database system identifier:           6261999204763226337
Database cluster state:               shut down
pg_control last modified:             Tue 15 Mar 2016 11:34:36 AM EDT
Latest checkpoint location:           0/32000108
Prior checkpoint location:            0/32000098
Latest checkpoint's REDO location:    0/32000108
Latest checkpoint's REDO WAL file:    000000020000000000000032
Latest checkpoint's TimeLineID:       2
Latest checkpoint's PrevTimeLineID:   2
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0/1624
Latest checkpoint's NextOID:          17820
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        616
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Tue 15 Mar 2016 11:34:36 AM EDT
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    hot_standby
wal_log_hints setting:                on
max_connections setting:              100
max_worker_processes setting:         8
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0

Cheers,  John

----------------------------------------

Subject: Re: [GENERAL] Re: how to switch old replication Master to new Standby after promoting old Standby
To: johnlumby@hotmail.com; michael.paquier@gmail.com
CC: oleksandr.shulgin@zalando.de; pgsql-general@postgresql.org
From: adrian.klaver@aklaver.com
Date: Tue, 15 Mar 2016 07:05:43 -0700

On 03/15/2016 06:54 AM, John Lumby wrote:

If anyone who has such privileges would like to edit the page,
here is what I would add to it -- feel free to edit/rewrite

after the bullet

. How to restart streaming replication after failover

and before the sub-bullet

Repeat the operations from 6th;

add this:

Starting with the old Standby now running as Unreplicated and the old Primary shut down but servicable,
with its databases intact, the task is to put the old Primary into Standby mode
as rapidly and unintrusively as possible.
This implies not requiring to shut new Primary down and not requiring to make another full base backup.
A utility named pg_rewind makes this much simpler and more robust, and it is included in standard
postgresql distribution since 9.5. - it is documented under PostgreSQL Server Applications.

To use pg_rewind :
First and most important, it is essential to have *previously* set the configuration parameter
wal_log_hints = on
in both the old Primary and old Standby, *before* the failover.
An alternative is described in the documentation but setting this parameter is simpler.
If you did not set this or the alternative, then , set it for future,
and don't use pg_rewind this time. See next.
Secondly , note that pg_rewind will potentially update *every* file in the old Primary cluster,
including configuration files. It is likely that configuration files may not match exactly on the two systems,
so make a copy of postgresql.conf and postgresql.auto.conf for later restore.
Thirdly, double-check that old Primary is shut down.
Now run pg_rewind on old Primary using the form

pg_rewind -D ${pg_cluster_dir} \
--source-server="host=${source_server_ip} port=${source_server_port} user=${replication_user} password=${replication_password}" \
-P

You can add --debug if you want a blow-by-blow account of every change it makes.

Now restore your configuration files, first perhaps comparing what differences there were.
Finally, create the recovery.conf for the new Standby

You can now start the new Standby.

There are some limitations with pg_rewind described in documentation.
If you could not use it or it failed, then treat your old Primary as an empty cluster
and commission it from the start as described next

Cheers, John Lumby

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Michael Paquier
michael@paquier.xyz
In reply to: John Lumby (#8)
Re: how to switch old replication Master to new Standby after promoting old Standby - pg_rewind log file missing

On Tue, Mar 15, 2016 at 5:14 PM, John Lumby <johnlumby@hotmail.com> wrote:

But my question is, given that the divergence point was 2B000060,
why is it looking for a file earlier than that?

(please do not top-post, this is annoying as it breaks the logic of the thread)
pg_rewind begins scanning WAL records not at the point where WAL
forked, but from the previous checkpoint record before WAL forked.
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10John Lumby
johnlumby@hotmail.com
In reply to: Michael Paquier (#9)
Re: how to switch old replication Master to new Standby after promoting old Standby - pg_rewind log file missing

Date: Tue, 15 Mar 2016 17:21:50 +0100
Subject: Re: how to switch old replication Master to new Standby after promoting old Standby - pg_rewind log file missing
From: michael.paquier@gmail.com
To: johnlumby@hotmail.com
CC: pgsql-general@postgresql.org; oleksandr.shulgin@zalando.de; adrian.klaver@aklaver.com

On Tue, Mar 15, 2016 at 5:14 PM, John Lumby <johnlumby@hotmail.com> wrote:

But my question is, given that the divergence point was 2B000060,
why is it looking for a file earlier than that?

(please do not top-post, this is annoying as it breaks the logic of the thread)

sorry, ok

pg_rewind begins scanning WAL records not at the point where WAL
forked, but from the previous checkpoint record before WAL forked.

In that case,  I wonder if it would be useful to extend the replication_slots concept
to be able to specify to retain all WALs back to
 "previous checkpoint before most recent timeline switch"
in order to guarantee that pg_rewind shall always be able to find the ones it needs
in source without requiring archiving and archive-retrieval?

--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general