Speed up Switchover

Started by TJover 12 years ago6 messagesgeneral
Jump to latest
#1TJ
tj@wallago.co.uk

Hi guys,
I am looking for a way of speeding up the process of switching over of
severs.
At the moment we are switching over via the trigger file, reconfiguring
our applications, patching or updating the old primary and rsyncing the
data over to the old primary.

I was wondering if there was an easier way to get the primary setup as a
secondary without having to rsync the data as it can take up to 10 hours.

The command i am using is:
rsync -azve "ssh -c blowfish" --inplace --delete /usr/local/pgsql/data/
pgsql@old-primary:/usr/local/pgsql/data/

We are using PostgreSQL 9.2,FreeBSD 9.1, Streaming replication and WAL
log archiving.
Thanks

TJ

#2Sergey Konoplev
gray.ru@gmail.com
In reply to: TJ (#1)
Re: Speed up Switchover

On Thu, Jul 25, 2013 at 1:03 AM, TJ <tj@wallago.co.uk> wrote:

I am looking for a way of speeding up the process of switching over of
severs.
At the moment we are switching over via the trigger file, reconfiguring our
applications, patching or updating the old primary and rsyncing the data
over to the old primary.

I was wondering if there was an easier way to get the primary setup as a
secondary without having to rsync the data as it can take up to 10 hours.

pg_rewind (https://github.com/vmware/pg_rewind) is what you need.

From its docs:

pg_rewind is a tool for synchronizing a PostgreSQL data directory with another
PostgreSQL data directory that was forked from the first one. The result is
equivalent to rsyncing the first data directory (referred to as the old cluster
from now on) with the second one (the new cluster). The advantage of pg_rewind
over rsync is that pg_rewind uses the WAL to determine changed data blocks,
and does not require reading through all files in the cluster. That makes it
a lot faster when the database is large and only a small portion of it differs
between the clusters.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Samrat Revagade
revagade.samrat@gmail.com
In reply to: Sergey Konoplev (#2)
Re: Speed up Switchover

secondary without having to rsync the data as it can take up to 10 hours.

pg_rewind (https://github.com/vmware/pg_rewind) is what you need.

But I think it has a problem regarding the hint bits which Robert Hass
pointed out.
You can still solve hint bit problem by enabling new checksum feature, But
you have to face consequences such as performance overhead.

You can find the discussion about that on following link:
/messages/by-id/CA+TgmoY4j+p7JY69ry8GpOSMMdZNYqU6dtiONPrcxaVG+SPByg@mail.gmail.com

-----
Greetings,
Samrat Revagade,
NTT-DATA-OSS Center (Pune)
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Speed-up-Switchover-tp5765160p5765235.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Michael Paquier
michael@paquier.xyz
In reply to: Samrat Revagade (#3)
Re: Speed up Switchover

On Fri, Jul 26, 2013 at 3:00 PM, Samrat Revagade
<revagade.samrat@gmail.com>wrote:

secondary without having to rsync the data as it can take up to 10 hours.

pg_rewind (https://github.com/vmware/pg_rewind) is what you need.

But I think it has a problem regarding the hint bits which Robert Hass
pointed out.
You can still solve hint bit problem by enabling new checksum feature, But
you have to face consequences such as performance overhead.

You can find the discussion about that on following link:

/messages/by-id/CA+TgmoY4j+p7JY69ry8GpOSMMdZNYqU6dtiONPrcxaVG+SPByg@mail.gmail.com

Exactly, using checksums is mandatory or you could finish with corrupted
data after rewinding a data directory. Any help in testing or proposing
patches for pg_rewind is welcome. It is still in early development.

Thanks,
--
Michael

#5Sergey Konoplev
gray.ru@gmail.com
In reply to: Samrat Revagade (#3)
Re: Speed up Switchover

On Thu, Jul 25, 2013 at 11:00 PM, Samrat Revagade
<revagade.samrat@gmail.com> wrote:

secondary without having to rsync the data as it can take up to 10 hours.

pg_rewind (https://github.com/vmware/pg_rewind) is what you need.

But I think it has a problem regarding the hint bits which Robert Hass
pointed out.
You can still solve hint bit problem by enabling new checksum feature, But
you have to face consequences such as performance overhead.

Or, if I understand it correct, you can just make sure that there is
no activity on the old master. Right?

You can find the discussion about that on following link:
/messages/by-id/CA+TgmoY4j+p7JY69ry8GpOSMMdZNYqU6dtiONPrcxaVG+SPByg@mail.gmail.com

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Andres Freund
andres@anarazel.de
In reply to: Sergey Konoplev (#2)
Re: Speed up Switchover

On 2013-07-25 22:00:23 -0700, Sergey Konoplev wrote:

On Thu, Jul 25, 2013 at 1:03 AM, TJ <tj@wallago.co.uk> wrote:

I am looking for a way of speeding up the process of switching over of
severs.
At the moment we are switching over via the trigger file, reconfiguring our
applications, patching or updating the old primary and rsyncing the data
over to the old primary.

I was wondering if there was an easier way to get the primary setup as a
secondary without having to rsync the data as it can take up to 10 hours.

pg_rewind (https://github.com/vmware/pg_rewind) is what you need.

Beside the issue pointed out by Robert (via Samrat), it's also for 9.3
onwards only...

If it's a planned failover you can do better by shutting down the
servers manually...

a) shutdown master gracefully (i.e. -m fast, not immediate)
b) use pg_controldata to check for the last wal location
c) verify that the standby has received WAL up to that point, otherwise
restart from a)
d) shutdown standby
e) remove recovery.conf on the standby, thus "silently" promoting it
f) start previous standby, now as master
g) create recovery.conf on the previous master
h) start previous master, now a standby

Note that you're deliberately circumventing security measures that way,
you need be rather careful to understand the reasoning behind those
steps and follow them carefully. But it allows to gracefully failover &
follow in large clusters with only very short outages.
From 9.3 onwards you don't even need d) and e) anymore and you can
replace it with a regular promotion which will be noticeably faster.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general