PostgreSQL Failback without rebuild

Started by James Sewellabout 12 years ago9 messageshackers

james.sewell@lisasoft.com

about 12 years ago

Hello All,

I have been reading through some of the recent discussions about failback
when in a streaming replication setup. I define failback as:

1. Node A is master, Node B is slave
2. Node A crashes || Node A is stopped || nothing happens
3. Promote Node B to Master
4. Attach Node A as slave

My understanding is currently to achieve step three you need to take a base
backup of Node B and deploy it to Node A before starting streaming
replication (or use rsync etc...).

This is very undesirable for many users, especially if they have a very
large database.

From the discussions I can see that the problem is to do with Node A
writing changes to disk that Node B are not streamed before Node A crashes.

Has there been any consensus on this issue? Are there any solutions which
might make it into 9.4 or 9.5? I've seen some proposals and a tool
(pg_rewind), but all seem to have draw backs.

I've been looking mainly at these threads:

/messages/by-id/CAF8Q-Gy7xa60HwXc0MKajjkWFEbFDWTG=gGyu1KmT+s2xcQ-bw@mail.gmail.com

/messages/by-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtbJgWrFu513s+Q@mail.gmail.com

/messages/by-id/519DF910.4020609@vmware.com

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099

------------------------------
The contents of this email are confidential and may be subject to legal or
professional privilege and copyright. No representation is made that this
email is free of viruses or other defects. If you have received this
communication in error, you may not copy or distribute any part of it or
otherwise disclose its contents to anyone. Please advise the sender of your
incorrect receipt of this correspondence.

Amit Kapila

amit.kapila16@gmail.com

about 12 years ago

In reply to: James Sewell (#1)

Re: PostgreSQL Failback without rebuild

On Wed, Feb 5, 2014 at 10:30 AM, James Sewell <james.sewell@lisasoft.com>
wrote:

Hello All,

I have been reading through some of the recent discussions about failback

when in a streaming replication setup. I define failback as:

Node A is master, Node B is slave
Node A crashes || Node A is stopped || nothing happens
Promote Node B to Master
Attach Node A as slave

My understanding is currently to achieve step three you need to take a

base backup of Node B and deploy it to Node A before starting streaming
replication (or use rsync etc...).

I think in above sentence you mean to say "to achieve step *four* .."

This is very undesirable for many users, especially if they have a very

large database.

From the discussions I can see that the problem is to do with Node A

writing changes to disk that Node B are not streamed before Node A crashes.

Yes, this is right.

Has there been any consensus on this issue? Are there any solutions which

might make it into 9.4 or 9.5?

As far as I know, there is still no solution provided in 9.4, can't say
anything
for 9.5 with any certainity. However in 9.4, there is a new parameter
wal_log_hints which can be useful to overcome drawback of pg_rewind.

I've seen some proposals and a tool (pg_rewind), but all seem to have

draw backs.

As far as I remember, one of the main drawbacks for pg_rewind was related to
hint bits which can be avoided by wal_log_hints. pg_rewind is not part of
core
PostgreSQL code, however if you wish, you can try that tool to see if can it
solve your purpose.

Note - James, in previous reply, I missed to cc to hackers, so sending it
again.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Michael Paquier

michael@paquier.xyz

about 12 years ago

In reply to: Amit Kapila (#2)

Re: PostgreSQL Failback without rebuild

On Wed, Feb 5, 2014 at 3:14 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Feb 5, 2014 at 10:30 AM, James Sewell <james.sewell@lisasoft.com>

I've seen some proposals and a tool (pg_rewind), but all seem to have draw
backs.

As far as I remember, one of the main drawbacks for pg_rewind was related to
hint bits which can be avoided by wal_log_hints. pg_rewind is not part of
core
PostgreSQL code, however if you wish, you can try that tool to see if can it
solve your purpose.

For 9.3, pg_rewind is only safe with page checksums enabled. For 9.4,
yes wal_log_hints or checksums is mandatory. The code contains as well
some safety checks as well to ensure that a node not using those
parameters cannot be rewinded.
Regards,
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

James Sewell

james.sewell@lisasoft.com

about 12 years ago

In reply to: Michael Paquier (#3)

Re: PostgreSQL Failback without rebuild

I've just noticed that on PostgreSQL 9.3 I can do the following with a
master node A and a slave node B (as long as I have set
recovery_target_timeline = 'latest'):

1. Stop Node A
2. Promote Node B
3. Attach Node A as slave

This is sufficient for my needs (I know it doesn't cover a crash), can
anyone see any potential problems with this approach?

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099

On Wed, Feb 5, 2014 at 6:03 PM, Michael Paquier
<michael.paquier@gmail.com>wrote:

On Wed, Feb 5, 2014 at 3:14 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Feb 5, 2014 at 10:30 AM, James Sewell <james.sewell@lisasoft.com

I've seen some proposals and a tool (pg_rewind), but all seem to have

draw

backs.

As far as I remember, one of the main drawbacks for pg_rewind was

related to

hint bits which can be avoided by wal_log_hints. pg_rewind is not part of
core
PostgreSQL code, however if you wish, you can try that tool to see if

can it

solve your purpose.

For 9.3, pg_rewind is only safe with page checksums enabled. For 9.4,
yes wal_log_hints or checksums is mandatory. The code contains as well
some safety checks as well to ensure that a node not using those
parameters cannot be rewinded.
Regards,
--
Michael

Michael Paquier

michael@paquier.xyz

about 12 years ago

In reply to: James Sewell (#4)

Re: PostgreSQL Failback without rebuild

On Fri, Feb 7, 2014 at 1:57 PM, James Sewell <james.sewell@lisasoft.com>wrote:

I've just noticed that on PostgreSQL 9.3 I can do the following with a
master node A and a slave node B (as long as I have set
recovery_target_timeline = 'latest'):

1. Stop Node A
2. Promote Node B
3. Attach Node A as slave

This is sufficient for my needs (I know it doesn't cover a crash), can
anyone see any potential problems with this approach?

Yes, node A could get ahead of the point where WAL forked when promoting B.
In this case you cannot reconnect A to B, and need to actually recreate a
node from a fresh base backup, or rewind it. pg_rewind targets the latter,
postgres core is able to to the former, and depending on things like your
environment and/or the size of your server, you might prefer one or the
other.
Regards,
--
Michael

James Sewell

james.sewell@lisasoft.com

about 12 years ago

In reply to: Michael Paquier (#5)

Re: PostgreSQL Failback without rebuild

Node A could get ahead even if it has been shut down cleanly BEFORE the
promotion?

I'd always assumed if I shut down the master the slave would be at the same
point after shutdown - is this incorrect?

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________

Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 *W* www.lisasoft.com *F *(+61) 3 8370 8099

On Fri, Feb 7, 2014 at 4:58 PM, Michael Paquier
<michael.paquier@gmail.com>wrote:

On Fri, Feb 7, 2014 at 1:57 PM, James Sewell <james.sewell@lisasoft.com>wrote:

I've just noticed that on PostgreSQL 9.3 I can do the following with a
master node A and a slave node B (as long as I have set
recovery_target_timeline = 'latest'):

1. Stop Node A
2. Promote Node B
3. Attach Node A as slave

This is sufficient for my needs (I know it doesn't cover a crash), can
anyone see any potential problems with this approach?

Yes, node A could get ahead of the point where WAL forked when promoting
B. In this case you cannot reconnect A to B, and need to actually recreate
a node from a fresh base backup, or rewind it. pg_rewind targets the
latter, postgres core is able to to the former, and depending on things
like your environment and/or the size of your server, you might prefer one
or the other.
Regards,
--
Michael

Michael Paquier

michael@paquier.xyz

about 12 years ago

In reply to: James Sewell (#6)

Re: PostgreSQL Failback without rebuild

On Fri, Feb 7, 2014 at 3:02 PM, James Sewell <james.sewell@lisasoft.com>wrote:

Node A could get ahead even if it has been shut down cleanly BEFORE the
promotion?
I'd always assumed if I shut down the master the slave would be at the
same point after shutdown - is this incorrect?

Yes and no. A node will wait at shutdown until all the *connected* slaves
have flushed necessary WALs to disk. But if the slave is not connected at
the moment of shutdown, it might not be the case.
--
Michael

Laurenz Albe

laurenz.albe@cybertec.at

about 12 years ago

In reply to: Michael Paquier (#7)

Re: PostgreSQL Failback without rebuild

Michael Paquier wrote:

On Fri, Feb 7, 2014 at 3:02 PM, James Sewell <james.sewell@lisasoft.com> wrote:

Node A could get ahead even if it has been shut down cleanly BEFORE the promotion?
I'd always assumed if I shut down the master the slave would be at the same point after shutdown
- is this incorrect?

Yes and no. A node will wait at shutdown until all the *connected* slaves have flushed necessary WALs
to disk. But if the slave is not connected at the moment of shutdown, it might not be the case.

Even if the slave is connected, there was a bug:
/messages/by-id/E1UrWwA-0004LR-D2@gemulon.postgresql.org

So you need at least 9.3.0, 9.2.5 or 9.1.10 for this to work properly.

Yours,
Laurenz Albe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

about 12 years ago

In reply to: James Sewell (#4)

Re: PostgreSQL Failback without rebuild

On Fri, Feb 7, 2014 at 03:57:31PM +1100, James Sewell wrote:

I've just noticed that on PostgreSQL 9.3 I can do the following with a master
node A and a slave node B (as long as I have set recovery_target_timeline =
'latest'):

1. Stop Node A
2. Promote Node B
3. Attach Node A as slave

This is sufficient for my needs (I know it doesn't cover a crash), can anyone
see any potential problems with this approach?

I know some people have used rsync to get A to match B after a crash of
A and promotion of B.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers