pg_basebackup on slave running for a long time

Started by Subhankar Chattopadhyayover 9 years ago13 messagesgeneral
Jump to latest
#1Subhankar Chattopadhyay
subho.atg@gmail.com

Hi Team,

We have setup PostgreSQL master-slave topology with Streaming
replication setup.
One of the steps for setting up streaming replication is to do
pg_basebackup on slave from master.

For subsequent update of this database, this step is repeated every
time, deleting the existing data copy of slave and running
pg_basebackup again.

For a huge data size of over 500GB, it takes a lot of time to copy
the data from master to slave.
We were looking for some optimization technique so that it doesnt have
to copy the whole data in every update of the system.

Is there a way to do that? Can somebody throw some light on this?

Subhankar Chattopadhyay
Bangalore, India

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2John R Pierce
pierce@hogranch.com
In reply to: Subhankar Chattopadhyay (#1)
Re: pg_basebackup on slave running for a long time

On 11/20/2016 9:29 PM, Subhankar Chattopadhyay wrote:

We have setup PostgreSQL master-slave topology with Streaming
replication setup.
One of the steps for setting up streaming replication is to do
pg_basebackup on slave from master.

For subsequent update of this database, this step is repeated every
time, deleting the existing data copy of slave and running
pg_basebackup again.

For a huge data size of over 500GB, it takes a lot of time to copy
the data from master to slave.
We were looking for some optimization technique so that it doesnt have
to copy the whole data in every update of the system.

Is there a way to do that? Can somebody throw some light on this?

if you have streaming replication, why do you delete it and start over ??!?

the streaming replication should replicate all updates of the master in
near realtime to the slave(s).

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Subhankar Chattopadhyay
subho.atg@gmail.com
In reply to: John R Pierce (#2)
Re: pg_basebackup on slave running for a long time

Hi John,

Thanks for reply. In situations where slave is behind master, if I
don't start over, will it catch up automatically?
I am using 9.4 version.

On Mon, Nov 21, 2016 at 11:22 AM, John R Pierce <pierce@hogranch.com> wrote:

On 11/20/2016 9:29 PM, Subhankar Chattopadhyay wrote:

We have setup PostgreSQL master-slave topology with Streaming
replication setup.
One of the steps for setting up streaming replication is to do
pg_basebackup on slave from master.

For subsequent update of this database, this step is repeated every
time, deleting the existing data copy of slave and running
pg_basebackup again.

For a huge data size of over 500GB, it takes a lot of time to copy
the data from master to slave.
We were looking for some optimization technique so that it doesnt have
to copy the whole data in every update of the system.

Is there a way to do that? Can somebody throw some light on this?

if you have streaming replication, why do you delete it and start over ??!?

the streaming replication should replicate all updates of the master in near
realtime to the slave(s).

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--

Subhankar Chattopadhyay
Bangalore, India

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4John R Pierce
pierce@hogranch.com
In reply to: Subhankar Chattopadhyay (#3)
Re: pg_basebackup on slave running for a long time

On 11/20/2016 10:00 PM, Subhankar Chattopadhyay wrote:

Thanks for reply. In situations where slave is behind master, if I
don't start over, will it catch up automatically?
I am using 9.4 version.

it should stay within a few seconds under normal conditions. why is it
falling behind, is your write workload too high for the speed of the
connection between the hosts? or is the slave shut down for some
period of time?

If the slave is intermittently offline, and if you're using replication
slots (that was a new feature in 9.4), then the master will hold a queue
of pending data as long as is needed until the slave catches up again.

Alternately, you can increase wal_keep_segments for a long enough
interval to cover the worst case time the slave is offline, or you can
implement a wal archive that the slave can recover from when resuming
streaming replication.

See https://www.postgresql.org/docs/9.4/static/warm-standby.html for
discussions of these various options.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Subhankar Chattopadhyay
subho.atg@gmail.com
In reply to: John R Pierce (#4)
Re: pg_basebackup on slave running for a long time

Hi John,

We are using the database in a cloud setup and the nodes are running
in VMs. The slave can fall behind for various reasons as you have
stated, like slave is shut down or high write workload.

We don't use replication slot but the wal_keep_segments is high enough
for us, 5000 to be exact.

In that case, we can do pg_basebackup only when necessary, like only
when slave is behind. We can check that from the query "SELECT state
FROM pg_stat_replication WHERE pid <> pg_backend_pid();" Will that be
correct way to do it?

On Mon, Nov 21, 2016 at 11:42 AM, John R Pierce <pierce@hogranch.com> wrote:

On 11/20/2016 10:00 PM, Subhankar Chattopadhyay wrote:

Thanks for reply. In situations where slave is behind master, if I
don't start over, will it catch up automatically?
I am using 9.4 version.

it should stay within a few seconds under normal conditions. why is it
falling behind, is your write workload too high for the speed of the
connection between the hosts? or is the slave shut down for some period of
time?

If the slave is intermittently offline, and if you're using replication
slots (that was a new feature in 9.4), then the master will hold a queue of
pending data as long as is needed until the slave catches up again.

Alternately, you can increase wal_keep_segments for a long enough interval
to cover the worst case time the slave is offline, or you can implement a
wal archive that the slave can recover from when resuming streaming
replication.

See https://www.postgresql.org/docs/9.4/static/warm-standby.html for
discussions of these various options.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--

Subhankar Chattopadhyay
Bangalore, India

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6John R Pierce
pierce@hogranch.com
In reply to: Subhankar Chattopadhyay (#5)
Re: pg_basebackup on slave running for a long time

On 11/20/2016 10:37 PM, Subhankar Chattopadhyay wrote:

We are using the database in a cloud setup and the nodes are running
in VMs. The slave can fall behind for various reasons as you have
stated, like slave is shut down or high write workload.

We don't use replication slot but the wal_keep_segments is high enough
for us, 5000 to be exact.

In that case, we can do pg_basebackup only when necessary, like only
when slave is behind. We can check that from the query "SELECT state
FROM pg_stat_replication WHERE pid <> pg_backend_pid();" Will that be
correct way to do it?

maintaining a wal archive, and configuring the slaves so they can
recover from it when they are restarted will likely take care of things
if they are getting so far behind that 5000 wal segments is insufficient..

I'm not that familiar with pg_stat_replication, I'm not sure what that
query is telling you. others can probably chime in with more info on this.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Subhankar Chattopadhyay
subho.atg@gmail.com
In reply to: John R Pierce (#6)
Re: pg_basebackup on slave running for a long time

Yes so if the slave is behind I need to start over pgbasebackup. I saw
according to the documentation this query gives us the replication state.
Can somebody tell me if this would be sufficient to know if I need to start
over the backup ?

On 21 Nov 2016 12:18, "John R Pierce" <pierce@hogranch.com> wrote:

Show quoted text

On 11/20/2016 10:37 PM, Subhankar Chattopadhyay wrote:

We are using the database in a cloud setup and the nodes are running
in VMs. The slave can fall behind for various reasons as you have
stated, like slave is shut down or high write workload.

We don't use replication slot but the wal_keep_segments is high enough
for us, 5000 to be exact.

In that case, we can do pg_basebackup only when necessary, like only
when slave is behind. We can check that from the query "SELECT state
FROM pg_stat_replication WHERE pid <> pg_backend_pid();" Will that be
correct way to do it?

maintaining a wal archive, and configuring the slaves so they can recover
from it when they are restarted will likely take care of things if they are
getting so far behind that 5000 wal segments is insufficient..

I'm not that familiar with pg_stat_replication, I'm not sure what that
query is telling you. others can probably chime in with more info on this.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8John R Pierce
pierce@hogranch.com
In reply to: Subhankar Chattopadhyay (#7)
Re: pg_basebackup on slave running for a long time

On 11/20/2016 11:00 PM, Subhankar Chattopadhyay wrote:

Yes so if the slave is behind I need to start over pgbasebackup. I saw
according to the documentation this query gives us the replication
state. Can somebody tell me if this would be sufficient to know if I
need to start over the backup ?

if the slave is behind but is catching up, no, restarting replication
would be overkill. only if the slave gets so far behind that it can't
catch up, and in that case, a wal archive would be a better choice than
a new base backup.

I've never run into these problems as I run on dedicated hardware
servers, which don't have all these reliability and performance
problems. a complete server failure requiring a full rebuild is
something that would happen less than annually.

--
john r pierce, recycling bits in santa cruz

#9Subhankar Chattopadhyay
subho.atg@gmail.com
In reply to: John R Pierce (#8)
Re: pg_basebackup on slave running for a long time

So, the question here is while I apply update on Slave, how do I know
if if it will be able to catch up or I need Wal archive? Is there a
way I can determine this? In my case, while applying update on slave,
the db process will be stopped, so the query, even if it gives correct
value, won't help. Can anybody help in here?

On Mon, Nov 21, 2016 at 12:40 PM, John R Pierce <pierce@hogranch.com> wrote:

On 11/20/2016 11:00 PM, Subhankar Chattopadhyay wrote:

Yes so if the slave is behind I need to start over pgbasebackup. I saw
according to the documentation this query gives us the replication state.
Can somebody tell me if this would be sufficient to know if I need to start
over the backup ?

if the slave is behind but is catching up, no, restarting replication would
be overkill. only if the slave gets so far behind that it can't catch up,
and in that case, a wal archive would be a better choice than a new base
backup.

I've never run into these problems as I run on dedicated hardware servers,
which don't have all these reliability and performance problems. a
complete server failure requiring a full rebuild is something that would
happen less than annually.

--
john r pierce, recycling bits in santa cruz

--

Subhankar Chattopadhyay
Bangalore, India

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Subhankar Chattopadhyay
subho.atg@gmail.com
In reply to: Subhankar Chattopadhyay (#9)
Re: pg_basebackup on slave running for a long time

John,

Can you explain the Wal Archive procedure, how it can be setup so that
the slave never goes out of sync, even if master deletes the WAL
files?

On Tue, Nov 22, 2016 at 4:04 PM, Subhankar Chattopadhyay
<subho.atg@gmail.com> wrote:

So, the question here is while I apply update on Slave, how do I know
if if it will be able to catch up or I need Wal archive? Is there a
way I can determine this? In my case, while applying update on slave,
the db process will be stopped, so the query, even if it gives correct
value, won't help. Can anybody help in here?

On Mon, Nov 21, 2016 at 12:40 PM, John R Pierce <pierce@hogranch.com> wrote:

On 11/20/2016 11:00 PM, Subhankar Chattopadhyay wrote:

Yes so if the slave is behind I need to start over pgbasebackup. I saw
according to the documentation this query gives us the replication state.
Can somebody tell me if this would be sufficient to know if I need to start
over the backup ?

if the slave is behind but is catching up, no, restarting replication would
be overkill. only if the slave gets so far behind that it can't catch up,
and in that case, a wal archive would be a better choice than a new base
backup.

I've never run into these problems as I run on dedicated hardware servers,
which don't have all these reliability and performance problems. a
complete server failure requiring a full rebuild is something that would
happen less than annually.

--
john r pierce, recycling bits in santa cruz

--

Subhankar Chattopadhyay
Bangalore, India

--

Subhankar Chattopadhyay
Bangalore, India

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11John R Pierce
pierce@hogranch.com
In reply to: Subhankar Chattopadhyay (#9)
Re: pg_basebackup on slave running for a long time

On 11/22/2016 2:34 AM, Subhankar Chattopadhyay wrote:

So, the question here is while I apply update on Slave, how do I know
if if it will be able to catch up or I need Wal archive? Is there a
way I can determine this? In my case, while applying update on slave,
the db process will be stopped, so the query, even if it gives correct
value, won't help. Can anybody help in here?

if the slave it setup with the proper recovery commands to fetch from
the WAL archive, then when the slave is woken up after a slumber it will
attempt to recover as many WAL's as it can from the archive before it
resumes streaming. This will happen automatically.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12John R Pierce
pierce@hogranch.com
In reply to: Subhankar Chattopadhyay (#10)
Re: pg_basebackup on slave running for a long time

On 11/22/2016 3:41 AM, Subhankar Chattopadhyay wrote:

John,

Can you explain the Wal Archive procedure, how it can be setup so that
the slave never goes out of sync, even if master deletes the WAL
files?

The WAL archive will typically be a separate file server that both the
master and slave can reach... it could be accessed via NFS or via scp
or whatever is appropriate for your environment. The master is
configured with an archive command (cp in the case of nfs, or scp for
ssh/scp, or whatever) to copy WAL segments to the archive. The slave
is setup with an recovery command (cp, scp, etc) to fetch from this same
archive.

The archive will continue grow without limit if you don't do some
cleanup on it. one strategy is to periodically (weekly? monthly?) do
a base backup of the master (possibly by using rsync or another file
copy method, rather than pg_basebackup), and keep 2 of these full
backups, and all wal archives since the beginning of the oldest one.
with this backup + archive, you can initialize a new slave without
bothering the master (rsync or scp or cp the latest backup, then let the
slave recover from the wal archive).

this backup+archive will also let you do point-in-time-recovery (aka
PITR). say something catastrophic happens and the data in the master
is bad after some point in time (maybe a jr admin accidentally clobbers
key data, but the app kept running). you can restore the last good
base backup, and recover up to but not including the point in time of
the transaction that clobbered your data.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13Subhankar Chattopadhyay
subho.atg@gmail.com
In reply to: Subhankar Chattopadhyay (#1)
Re: pg_basebackup on slave running for a long time

Thanks John, Well that clarifies about archive a lot!

On 22 Nov 2016 22:22, "John R Pierce" <pierce@hogranch.com> wrote:

On 11/22/2016 3:41 AM, Subhankar Chattopadhyay wrote:

John,

Can you explain the Wal Archive procedure, how it can be setup so that
the slave never goes out of sync, even if master deletes the WAL
files?

The WAL archive will typically be a separate file server that both the
master and slave can reach... it could be accessed via NFS or via scp or
whatever is appropriate for your environment. The master is configured
with an archive command (cp in the case of nfs, or scp for ssh/scp, or
whatever) to copy WAL segments to the archive. The slave is setup with an
recovery command (cp, scp, etc) to fetch from this same archive.

The archive will continue grow without limit if you don't do some cleanup
on it. one strategy is to periodically (weekly? monthly?) do a base
backup of the master (possibly by using rsync or another file copy method,
rather than pg_basebackup), and keep 2 of these full backups, and all wal
archives since the beginning of the oldest one. with this backup +
archive, you can initialize a new slave without bothering the master (rsync
or scp or cp the latest backup, then let the slave recover from the wal
archive).

this backup+archive will also let you do point-in-time-recovery (aka
PITR). say something catastrophic happens and the data in the master is
bad after some point in time (maybe a jr admin accidentally clobbers key
data, but the app kept running). you can restore the last good base
backup, and recover up to but not including the point in time of the
transaction that clobbered your data.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general