Pausing log shipping for streaming replication
Hello,
I have a master multi slave streaming replication setup. One master and two
slaves. I need to do some maintenance on one of the slaves as one of the
drives died however there is some other weird things going on in that array
that I would need to investigate. So I am expecting the machine to be down
at least two hours.
I remember reading that if a master cannot connect to the slave it would
hold the log file from shipping. Is there any other way to hold the file
until the slave comes back online? Would it affect both slaves not getting
their files shipped over?
The good thing is that the slave in question is not serving any connections.
From what I remember emptying out the archive_command would pause log
shipping. Can the same be done by issuing a pg_stop_backup()?
Thanks,
-Joseph Kregloh
On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <jkregloh@sproutloud.com> wrote:
Hello,
I have a master multi slave streaming replication setup. One master and two
slaves. I need to do some maintenance on one of the slaves as one of the
drives died however there is some other weird things going on in that array
that I would need to investigate. So I am expecting the machine to be down
at least two hours.I remember reading that if a master cannot connect to the slave it would
hold the log file from shipping. Is there any other way to hold the file
until the slave comes back online? Would it affect both slaves not getting
their files shipped over?The good thing is that the slave in question is not serving any connections.
From what I remember emptying out the archive_command would pause log
shipping. Can the same be done by issuing a pg_stop_backup()?Thanks,
-Joseph Kregloh
I think you will need to change your archive_command so it saves the
WALs to a location reachable by both slaves and the master, and have
both slaves pull from the same location. I don't think
pg_stop_backup() is useful in this situation.
The master will hold the logs as long as archive_command fails [1]http://www.postgresql.org/docs/9.1/static/continuous-archiving.html. To
the extent that archive_command involves connecting to the slave, then
yes, Postgres will hold the WAL archives while the slave is down.
There are (at least) two reasons that saving the archives to some
other location is useful:
1) You don't risk running out of disk on the master due to batched up
WALs if a slave goes down.
2) The backup of logs can be used to aid in point-in-time recovery.
[1]: http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Dec 15, 2014 at 12:59 PM, Patrick Krecker <patrick@judicata.com>
wrote:
On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <jkregloh@sproutloud.com>
wrote:Hello,
I have a master multi slave streaming replication setup. One master and
two
slaves. I need to do some maintenance on one of the slaves as one of the
drives died however there is some other weird things going on in thatarray
that I would need to investigate. So I am expecting the machine to be
down
at least two hours.
I remember reading that if a master cannot connect to the slave it would
hold the log file from shipping. Is there any other way to hold the file
until the slave comes back online? Would it affect both slaves notgetting
their files shipped over?
The good thing is that the slave in question is not serving any
connections.
From what I remember emptying out the archive_command would pause log
shipping. Can the same be done by issuing a pg_stop_backup()?Thanks,
-Joseph KreglohI think you will need to change your archive_command so it saves the
WALs to a location reachable by both slaves and the master, and have
both slaves pull from the same location. I don't think
pg_stop_backup() is useful in this situation.
Currently my archive_command is to a sh script which internally does an
rsync. It actually rsyncs to both slaves and then a Barman location. If I
fail the archive_command, then i'll have a problem because my primary slave
serves read only queries, so it might start serving out stale data.
What I was thinking is shipping the log files that would go to the second
slave to another machine or location on the master. Then once I am done
with the maintenance i'll move those files over to the incoming folder.
That would give a hopefully contain all the WAL files for the slave to
catch up. Any thoughts against this?
Show quoted text
The master will hold the logs as long as archive_command fails [1]. To
the extent that archive_command involves connecting to the slave, then
yes, Postgres will hold the WAL archives while the slave is down.
There are (at least) two reasons that saving the archives to some
other location is useful:1) You don't risk running out of disk on the master due to batched up
WALs if a slave goes down.
2) The backup of logs can be used to aid in point-in-time recovery.[1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
On Mon, Dec 15, 2014 at 10:29 AM, Joseph Kregloh
<jkregloh@sproutloud.com> wrote:
On Mon, Dec 15, 2014 at 12:59 PM, Patrick Krecker <patrick@judicata.com>
wrote:On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <jkregloh@sproutloud.com>
wrote:Hello,
I have a master multi slave streaming replication setup. One master and
two
slaves. I need to do some maintenance on one of the slaves as one of the
drives died however there is some other weird things going on in that
array
that I would need to investigate. So I am expecting the machine to be
down
at least two hours.I remember reading that if a master cannot connect to the slave it would
hold the log file from shipping. Is there any other way to hold the file
until the slave comes back online? Would it affect both slaves not
getting
their files shipped over?The good thing is that the slave in question is not serving any
connections.From what I remember emptying out the archive_command would pause log
shipping. Can the same be done by issuing a pg_stop_backup()?Thanks,
-Joseph KreglohI think you will need to change your archive_command so it saves the
WALs to a location reachable by both slaves and the master, and have
both slaves pull from the same location. I don't think
pg_stop_backup() is useful in this situation.Currently my archive_command is to a sh script which internally does an
rsync. It actually rsyncs to both slaves and then a Barman location. If I
fail the archive_command, then i'll have a problem because my primary slave
serves read only queries, so it might start serving out stale data.What I was thinking is shipping the log files that would go to the second
slave to another machine or location on the master. Then once I am done with
the maintenance i'll move those files over to the incoming folder. That
would give a hopefully contain all the WAL files for the slave to catch up.
Any thoughts against this?
Seems OK as long as you have the disk space to support the
accumulation of WALs (considering for the situation where the downtime
is much longer than anticipated).
When you say "i'll move those files over to the incoming folder," what
do you mean? I think that restore_command should be used on the slave
to retrieve the WALs from the archive location. Once the secondary has
caught up, you can change the configuration back to the old setup and
remove the accumulated WALs from the temporary location.
The master will hold the logs as long as archive_command fails [1]. To
the extent that archive_command involves connecting to the slave, then
yes, Postgres will hold the WAL archives while the slave is down.
There are (at least) two reasons that saving the archives to some
other location is useful:1) You don't risk running out of disk on the master due to batched up
WALs if a slave goes down.
2) The backup of logs can be used to aid in point-in-time recovery.[1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 12/15/2014 11:12 AM, Joseph Kregloh wrote:
Hello,
I have a master multi slave streaming replication setup. One master and
two slaves. I need to do some maintenance on one of the slaves as one of
the drives died however there is some other weird things going on in
that array that I would need to investigate. So I am expecting the
machine to be down at least two hours.I remember reading that if a master cannot connect to the slave it would
hold the log file from shipping. Is there any other way to hold the file
until the slave comes back online? Would it affect both slaves not
getting their files shipped over?The good thing is that the slave in question is not serving any connections.
From what I remember emptying out the archive_command would pause log
shipping. Can the same be done by issuing a pg_stop_backup()?Thanks,
-Joseph Kregloh
I kinda turn mine around, so to speak.
My master (web1) PG has this:
archive_command = '/usr/local/bin/pg_arch.sh "%p" "%f"'
/usr/local/bin/pg_arch.sh:
---------------
#!/bin/bash
# pg_arch.sh "%p" "%f"
archive='/pub/pgarchive'
set -e
if [ ! -f $archive/$2 ]
then
/usr/bin/cp $1 $archive/webserv/$2
/usr/bin/ln $archive/webserv/$2 $archive/web2/$2
fi
exit 0
---------------
I have one master (web1) and two slaves (web2 and webserv)
This always copies, and always returns 0. (Note the use of ln so extra
disk space isn't wasted).
At this point I only collect wall, this script never removes it. One
slave is very close, so it gets updated quickly. The other is very far,
and only updates at night when I can copy for less $$.
It doesnt really matter how I get the two slaves updated (the close one,
actually, uses steaming. The far one rsync, but that's besides the point)
The clean up happens in reverse. I have a perl cron job that runs every
half hour. It connects to master (web1) and each slave, and runs
something like:
$db is a slave, $master is the master.
$q = $db->prepare("SELECT pg_last_xlog_replay_location()");
$q->execute();
my ($webrp) = $q->fetchrow_array();
$q = undef;
$db->disconnect();
$q = $master->prepare("select file_name from pg_xlogfile_name_offset(
'$webrp' )");
$q->execute();
my ($web) = $q->fetchrow_array();
$q = undef;
system("/usr/bin/sudo -u postgres /usr/local/pgsq/bin/pg_archivecleanup
/pub/pgarchive/web2 $web")
What we do is have the master query the slave's location and then run
pg_archivecleanup. This way if we loose communication I wont clean up
files. And I wont clean up WAL until the slave has actually applied it.
And each slave is independent, so I can take one down and the master
will just keep collecting. As soon as I bring the slave back up, we get
a response and starting cleaning again.
-Andy
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Dec 15, 2014 at 2:18 PM, Patrick Krecker <patrick@judicata.com>
wrote:
On Mon, Dec 15, 2014 at 10:29 AM, Joseph Kregloh
<jkregloh@sproutloud.com> wrote:On Mon, Dec 15, 2014 at 12:59 PM, Patrick Krecker <patrick@judicata.com>
wrote:On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <
jkregloh@sproutloud.com>
wrote:
Hello,
I have a master multi slave streaming replication setup. One master
and
two
slaves. I need to do some maintenance on one of the slaves as one ofthe
drives died however there is some other weird things going on in that
array
that I would need to investigate. So I am expecting the machine to be
down
at least two hours.I remember reading that if a master cannot connect to the slave it
would
hold the log file from shipping. Is there any other way to hold the
file
until the slave comes back online? Would it affect both slaves not
getting
their files shipped over?The good thing is that the slave in question is not serving any
connections.From what I remember emptying out the archive_command would pause log
shipping. Can the same be done by issuing a pg_stop_backup()?Thanks,
-Joseph KreglohI think you will need to change your archive_command so it saves the
WALs to a location reachable by both slaves and the master, and have
both slaves pull from the same location. I don't think
pg_stop_backup() is useful in this situation.Currently my archive_command is to a sh script which internally does an
rsync. It actually rsyncs to both slaves and then a Barman location. If I
fail the archive_command, then i'll have a problem because my primaryslave
serves read only queries, so it might start serving out stale data.
What I was thinking is shipping the log files that would go to the second
slave to another machine or location on the master. Then once I am donewith
the maintenance i'll move those files over to the incoming folder. That
would give a hopefully contain all the WAL files for the slave to catchup.
Any thoughts against this?
Seems OK as long as you have the disk space to support the
accumulation of WALs (considering for the situation where the downtime
is much longer than anticipated).
Plenty of disk space to accumulate WAL files.
When you say "i'll move those files over to the incoming folder," what
do you mean? I think that restore_command should be used on the slave
to retrieve the WALs from the archive location. Once the secondary has
caught up, you can change the configuration back to the old setup and
remove the accumulated WALs from the temporary location.
I will disable Postgres from starting up. Once the machine is up and
running I will move all of the "saved" WAL files into the folder I have
designated in the restore command in my recovery.conf. In my case the
following line:
restore_command = 'cp -f /usr/local/pgsql/archive/%f %p < /dev/null'
Once all of the WAL files are there and the master is shipping WAL files I
will start Postgres on the machine and it will begin processing the files
until it catches up.
Show quoted text
The master will hold the logs as long as archive_command fails [1]. To
the extent that archive_command involves connecting to the slave, then
yes, Postgres will hold the WAL archives while the slave is down.
There are (at least) two reasons that saving the archives to some
other location is useful:1) You don't risk running out of disk on the master due to batched up
WALs if a slave goes down.
2) The backup of logs can be used to aid in point-in-time recovery.[1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
On 16 Dec 2014 01:13, "Joseph Kregloh" <jkregloh@sproutloud.com> wrote:
Hello,
I have a master multi slave streaming replication setup. One master and
two slaves. I need to do some maintenance on one of the slaves as one of
the drives died however there is some other weird things going on in that
array that I would need to investigate. So I am expecting the machine to be
down at least two hours.
I remember reading that if a master cannot connect to the slave it would
hold the log file from shipping. Is there any other way to hold the file
until the slave comes back online? Would it affect both slaves not getting
their files shipped over?
The good thing is that the slave in question is not serving any
connections.
From what I remember emptying out the archive_command would pause log
shipping. Can the same be done by issuing a pg_stop_backup()?
Are you using streaming replication or log shipping? I believe both are
different. Can you share you archive_command and recovery.conf content?
Show quoted text
Thanks,
-Joseph Kregloh
On Mon, Dec 15, 2014 at 7:12 PM, Sameer Kumar <sameer.kumar@ashnik.com>
wrote:
On 16 Dec 2014 01:13, "Joseph Kregloh" <jkregloh@sproutloud.com> wrote:
Hello,
I have a master multi slave streaming replication setup. One master and
two slaves. I need to do some maintenance on one of the slaves as one of
the drives died however there is some other weird things going on in that
array that I would need to investigate. So I am expecting the machine to be
down at least two hours.I remember reading that if a master cannot connect to the slave it would
hold the log file from shipping. Is there any other way to hold the file
until the slave comes back online? Would it affect both slaves not getting
their files shipped over?The good thing is that the slave in question is not serving any
connections.
From what I remember emptying out the archive_command would pause log
shipping. Can the same be done by issuing a pg_stop_backup()?
Are you using streaming replication or log shipping? I believe both are
different. Can you share you archive_command and recovery.conf content?Streaming replication.
archive_command:
archive_command = '/usr/local/pgsql/data/log_shipper.sh "%p" "%f"'
log_shipper.sh
#!/usr/local/bin/bash
rsync -a $1 pgprod@prod-db-slave:archive/$2 < /dev/null;
rsync -a $1 pgprod@prod-db-slave:p3_wal_files/$2 < /dev/null; # Temp
storage for WAL files
recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=prod-db port=5434 user=USER password=PW'
trigger_file = '/tmp/pgsql_prod_db.trigger'
restore_command = 'cp -f /usr/local/pgsql/archive/%f %p < /dev/null'
archive_cleanup_command = 'pg_archivecleanup /usr/local/pgsql/archive/ %r'
Show quoted text
Thanks,
-Joseph Kregloh
On Tue, Dec 16, 2014 at 11:36 PM, Joseph Kregloh <jkregloh@sproutloud.com>
wrote:
archive_command:
archive_command = '/usr/local/pgsql/data/log_shipper.sh "%p" "%f"'log_shipper.sh
#!/usr/local/bin/bash
rsync -a $1 pgprod@prod-db-slave:archive/$2 < /dev/null;
rsync -a $1 pgprod@prod-db-slave:p3_wal_files/$2 < /dev/null; # Temp
storage for WAL files
Sorry, I read your other posts just now. This seems sane to me.
recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=prod-db port=5434 user=USER password=PW'
trigger_file = '/tmp/pgsql_prod_db.trigger'
restore_command = 'cp -f /usr/local/pgsql/archive/%f %p < /dev/null'
Since you are specifying the restore_command and archive (As per my
understanding of your situation) is full, should not you start polling the
secondary wal archive that you have spcifcied in archive.sh?
archive_cleanup_command = 'pg_archivecleanup /usr/local/pgsql/archive/ %r'
I generally don't prefer this since archives are helpful for a PITR as well
so I prefer cleaning them up manually.
Best Regards,
*Sameer Kumar | Database Consultant*
*ASHNIK PTE. LTD.*
101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533
M: *+65 8110 0350* T: +65 6438 3504 | www.ashnik.com
*[image: icons]*
[image: Email patch] <http://www.ashnik.com/>
This email may contain confidential, privileged or copyright material and
is solely for the use of the intended recipient(s).