Pausing log shipping for streaming replication

Started by Joseph Kreglohover 11 years ago9 messagesgeneral

jkregloh@sproutloud.com

over 11 years ago

Hello,

I have a master multi slave streaming replication setup. One master and two
slaves. I need to do some maintenance on one of the slaves as one of the
drives died however there is some other weird things going on in that array
that I would need to investigate. So I am expecting the machine to be down
at least two hours.

I remember reading that if a master cannot connect to the slave it would
hold the log file from shipping. Is there any other way to hold the file
until the slave comes back online? Would it affect both slaves not getting
their files shipped over?

The good thing is that the slave in question is not serving any connections.

From what I remember emptying out the archive_command would pause log
shipping. Can the same be done by issuing a pg_stop_backup()?

Thanks,
-Joseph Kregloh

Patrick Krecker

patrick@judicata.com

over 11 years ago

In reply to: Joseph Kregloh (#1)

Re: Pausing log shipping for streaming replication

On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <jkregloh@sproutloud.com> wrote:

Hello,

I have a master multi slave streaming replication setup. One master and two
slaves. I need to do some maintenance on one of the slaves as one of the
drives died however there is some other weird things going on in that array
that I would need to investigate. So I am expecting the machine to be down
at least two hours.

I remember reading that if a master cannot connect to the slave it would
hold the log file from shipping. Is there any other way to hold the file
until the slave comes back online? Would it affect both slaves not getting
their files shipped over?

The good thing is that the slave in question is not serving any connections.

From what I remember emptying out the archive_command would pause log
shipping. Can the same be done by issuing a pg_stop_backup()?

Thanks,
-Joseph Kregloh

I think you will need to change your archive_command so it saves the
WALs to a location reachable by both slaves and the master, and have
both slaves pull from the same location. I don't think
pg_stop_backup() is useful in this situation.

The master will hold the logs as long as archive_command fails [1]http://www.postgresql.org/docs/9.1/static/continuous-archiving.html. To
the extent that archive_command involves connecting to the slave, then
yes, Postgres will hold the WAL archives while the slave is down.
There are (at least) two reasons that saving the archives to some
other location is useful:

1) You don't risk running out of disk on the master due to batched up
WALs if a slave goes down.
2) The backup of logs can be used to aid in point-in-time recovery.

[1]: http://www.postgresql.org/docs/9.1/static/continuous-archiving.html

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Joseph Kregloh

jkregloh@sproutloud.com

over 11 years ago

In reply to: Patrick Krecker (#2)

Re: Pausing log shipping for streaming replication

On Mon, Dec 15, 2014 at 12:59 PM, Patrick Krecker <patrick@judicata.com>
wrote:

On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <jkregloh@sproutloud.com>
wrote:

Hello,

I have a master multi slave streaming replication setup. One master and

two

slaves. I need to do some maintenance on one of the slaves as one of the
drives died however there is some other weird things going on in that

array

that I would need to investigate. So I am expecting the machine to be

down

at least two hours.

I remember reading that if a master cannot connect to the slave it would
hold the log file from shipping. Is there any other way to hold the file
until the slave comes back online? Would it affect both slaves not

getting

their files shipped over?

The good thing is that the slave in question is not serving any

connections.

From what I remember emptying out the archive_command would pause log
shipping. Can the same be done by issuing a pg_stop_backup()?

Thanks,
-Joseph Kregloh

I think you will need to change your archive_command so it saves the
WALs to a location reachable by both slaves and the master, and have
both slaves pull from the same location. I don't think
pg_stop_backup() is useful in this situation.

Currently my archive_command is to a sh script which internally does an
rsync. It actually rsyncs to both slaves and then a Barman location. If I
fail the archive_command, then i'll have a problem because my primary slave
serves read only queries, so it might start serving out stale data.

What I was thinking is shipping the log files that would go to the second
slave to another machine or location on the master. Then once I am done
with the maintenance i'll move those files over to the incoming folder.
That would give a hopefully contain all the WAL files for the slave to
catch up. Any thoughts against this?

Show quoted text

The master will hold the logs as long as archive_command fails [1]. To
the extent that archive_command involves connecting to the slave, then
yes, Postgres will hold the WAL archives while the slave is down.
There are (at least) two reasons that saving the archives to some
other location is useful:

1) You don't risk running out of disk on the master due to batched up
WALs if a slave goes down.
2) The backup of logs can be used to aid in point-in-time recovery.

[1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html

Patrick Krecker

patrick@judicata.com

over 11 years ago

In reply to: Joseph Kregloh (#3)

Re: Pausing log shipping for streaming replication

On Mon, Dec 15, 2014 at 10:29 AM, Joseph Kregloh
<jkregloh@sproutloud.com> wrote:

On Mon, Dec 15, 2014 at 12:59 PM, Patrick Krecker <patrick@judicata.com>
wrote:

On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <jkregloh@sproutloud.com>
wrote:

Hello,

I have a master multi slave streaming replication setup. One master and
two
slaves. I need to do some maintenance on one of the slaves as one of the
drives died however there is some other weird things going on in that
array
that I would need to investigate. So I am expecting the machine to be
down
at least two hours.

I remember reading that if a master cannot connect to the slave it would
hold the log file from shipping. Is there any other way to hold the file
until the slave comes back online? Would it affect both slaves not
getting
their files shipped over?

The good thing is that the slave in question is not serving any
connections.

From what I remember emptying out the archive_command would pause log
shipping. Can the same be done by issuing a pg_stop_backup()?

Thanks,
-Joseph Kregloh

I think you will need to change your archive_command so it saves the
WALs to a location reachable by both slaves and the master, and have
both slaves pull from the same location. I don't think
pg_stop_backup() is useful in this situation.

Currently my archive_command is to a sh script which internally does an
rsync. It actually rsyncs to both slaves and then a Barman location. If I
fail the archive_command, then i'll have a problem because my primary slave
serves read only queries, so it might start serving out stale data.

What I was thinking is shipping the log files that would go to the second
slave to another machine or location on the master. Then once I am done with
the maintenance i'll move those files over to the incoming folder. That
would give a hopefully contain all the WAL files for the slave to catch up.
Any thoughts against this?

Seems OK as long as you have the disk space to support the
accumulation of WALs (considering for the situation where the downtime
is much longer than anticipated).

When you say "i'll move those files over to the incoming folder," what
do you mean? I think that restore_command should be used on the slave
to retrieve the WALs from the archive location. Once the secondary has
caught up, you can change the configuration back to the old setup and
remove the accumulated WALs from the temporary location.

The master will hold the logs as long as archive_command fails [1]. To
the extent that archive_command involves connecting to the slave, then
yes, Postgres will hold the WAL archives while the slave is down.
There are (at least) two reasons that saving the archives to some
other location is useful:

1) You don't risk running out of disk on the master due to batched up
WALs if a slave goes down.
2) The backup of logs can be used to aid in point-in-time recovery.

[1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Andy Colson

andy@squeakycode.net

over 11 years ago

In reply to: Joseph Kregloh (#1)

Re: Pausing log shipping for streaming replication

On 12/15/2014 11:12 AM, Joseph Kregloh wrote:

Hello,

I have a master multi slave streaming replication setup. One master and
two slaves. I need to do some maintenance on one of the slaves as one of
the drives died however there is some other weird things going on in
that array that I would need to investigate. So I am expecting the
machine to be down at least two hours.

I remember reading that if a master cannot connect to the slave it would
hold the log file from shipping. Is there any other way to hold the file
until the slave comes back online? Would it affect both slaves not
getting their files shipped over?

The good thing is that the slave in question is not serving any connections.

From what I remember emptying out the archive_command would pause log
shipping. Can the same be done by issuing a pg_stop_backup()?

Thanks,
-Joseph Kregloh

I kinda turn mine around, so to speak.
My master (web1) PG has this:

archive_command = '/usr/local/bin/pg_arch.sh "%p" "%f"'

/usr/local/bin/pg_arch.sh:
---------------
#!/bin/bash
# pg_arch.sh "%p" "%f"

archive='/pub/pgarchive'
set -e
if [ ! -f $archive/$2 ]
then
/usr/bin/cp $1 $archive/webserv/$2
/usr/bin/ln $archive/webserv/$2 $archive/web2/$2
fi
exit 0
---------------

I have one master (web1) and two slaves (web2 and webserv)

This always copies, and always returns 0. (Note the use of ln so extra
disk space isn't wasted).

At this point I only collect wall, this script never removes it. One
slave is very close, so it gets updated quickly. The other is very far,
and only updates at night when I can copy for less $$.

It doesnt really matter how I get the two slaves updated (the close one,
actually, uses steaming. The far one rsync, but that's besides the point)

The clean up happens in reverse. I have a perl cron job that runs every
half hour. It connects to master (web1) and each slave, and runs
something like:

$db is a slave, $master is the master.

$q = $db->prepare("SELECT pg_last_xlog_replay_location()");
$q->execute();
my ($webrp) = $q->fetchrow_array();
$q = undef;
$db->disconnect();

$q = $master->prepare("select file_name from pg_xlogfile_name_offset(
'$webrp' )");
$q->execute();
my ($web) = $q->fetchrow_array();
$q = undef;

system("/usr/bin/sudo -u postgres /usr/local/pgsq/bin/pg_archivecleanup
/pub/pgarchive/web2 $web")

What we do is have the master query the slave's location and then run
pg_archivecleanup. This way if we loose communication I wont clean up
files. And I wont clean up WAL until the slave has actually applied it.

And each slave is independent, so I can take one down and the master
will just keep collecting. As soon as I bring the slave back up, we get
a response and starting cleaning again.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Joseph Kregloh

jkregloh@sproutloud.com

over 11 years ago

In reply to: Patrick Krecker (#4)

Re: Pausing log shipping for streaming replication

On Mon, Dec 15, 2014 at 2:18 PM, Patrick Krecker <patrick@judicata.com>
wrote:

On Mon, Dec 15, 2014 at 10:29 AM, Joseph Kregloh
<jkregloh@sproutloud.com> wrote:

On Mon, Dec 15, 2014 at 12:59 PM, Patrick Krecker <patrick@judicata.com>
wrote:

On Mon, Dec 15, 2014 at 9:12 AM, Joseph Kregloh <

jkregloh@sproutloud.com>

wrote:

Hello,

I have a master multi slave streaming replication setup. One master

and

two
slaves. I need to do some maintenance on one of the slaves as one of

the

drives died however there is some other weird things going on in that
array
that I would need to investigate. So I am expecting the machine to be
down
at least two hours.

I remember reading that if a master cannot connect to the slave it

would

hold the log file from shipping. Is there any other way to hold the

file

until the slave comes back online? Would it affect both slaves not
getting
their files shipped over?

The good thing is that the slave in question is not serving any
connections.

From what I remember emptying out the archive_command would pause log
shipping. Can the same be done by issuing a pg_stop_backup()?

Thanks,
-Joseph Kregloh

I think you will need to change your archive_command so it saves the
WALs to a location reachable by both slaves and the master, and have
both slaves pull from the same location. I don't think
pg_stop_backup() is useful in this situation.

Currently my archive_command is to a sh script which internally does an
rsync. It actually rsyncs to both slaves and then a Barman location. If I
fail the archive_command, then i'll have a problem because my primary

slave

serves read only queries, so it might start serving out stale data.

What I was thinking is shipping the log files that would go to the second
slave to another machine or location on the master. Then once I am done

with

the maintenance i'll move those files over to the incoming folder. That
would give a hopefully contain all the WAL files for the slave to catch

up.

Any thoughts against this?

Seems OK as long as you have the disk space to support the
accumulation of WALs (considering for the situation where the downtime
is much longer than anticipated).

Plenty of disk space to accumulate WAL files.

When you say "i'll move those files over to the incoming folder," what
do you mean? I think that restore_command should be used on the slave
to retrieve the WALs from the archive location. Once the secondary has
caught up, you can change the configuration back to the old setup and
remove the accumulated WALs from the temporary location.

I will disable Postgres from starting up. Once the machine is up and
running I will move all of the "saved" WAL files into the folder I have
designated in the restore command in my recovery.conf. In my case the
following line:

restore_command = 'cp -f /usr/local/pgsql/archive/%f %p < /dev/null'

Once all of the WAL files are there and the master is shipping WAL files I
will start Postgres on the machine and it will begin processing the files
until it catches up.

Show quoted text

The master will hold the logs as long as archive_command fails [1]. To
the extent that archive_command involves connecting to the slave, then
yes, Postgres will hold the WAL archives while the slave is down.
There are (at least) two reasons that saving the archives to some
other location is useful:

1) You don't risk running out of disk on the master due to batched up
WALs if a slave goes down.
2) The backup of logs can be used to aid in point-in-time recovery.

[1] http://www.postgresql.org/docs/9.1/static/continuous-archiving.html

Sameer Kumar

sameer.kumar@ashnik.com

over 11 years ago

In reply to: Joseph Kregloh (#1)

Re: Pausing log shipping for streaming replication

On 16 Dec 2014 01:13, "Joseph Kregloh" <jkregloh@sproutloud.com> wrote:

Hello,

I have a master multi slave streaming replication setup. One master and

two slaves. I need to do some maintenance on one of the slaves as one of
the drives died however there is some other weird things going on in that
array that I would need to investigate. So I am expecting the machine to be
down at least two hours.

I remember reading that if a master cannot connect to the slave it would

hold the log file from shipping. Is there any other way to hold the file
until the slave comes back online? Would it affect both slaves not getting
their files shipped over?

The good thing is that the slave in question is not serving any

connections.

From what I remember emptying out the archive_command would pause log

shipping. Can the same be done by issuing a pg_stop_backup()?

Are you using streaming replication or log shipping? I believe both are
different. Can you share you archive_command and recovery.conf content?

Show quoted text

Thanks,
-Joseph Kregloh

Joseph Kregloh

jkregloh@sproutloud.com

over 11 years ago

In reply to: Sameer Kumar (#7)

Re: Pausing log shipping for streaming replication

On Mon, Dec 15, 2014 at 7:12 PM, Sameer Kumar <sameer.kumar@ashnik.com>
wrote:

On 16 Dec 2014 01:13, "Joseph Kregloh" <jkregloh@sproutloud.com> wrote:

Hello,

I have a master multi slave streaming replication setup. One master and

two slaves. I need to do some maintenance on one of the slaves as one of
the drives died however there is some other weird things going on in that
array that I would need to investigate. So I am expecting the machine to be
down at least two hours.

I remember reading that if a master cannot connect to the slave it would

hold the log file from shipping. Is there any other way to hold the file
until the slave comes back online? Would it affect both slaves not getting
their files shipped over?

The good thing is that the slave in question is not serving any

connections.

From what I remember emptying out the archive_command would pause log

shipping. Can the same be done by issuing a pg_stop_backup()?

Are you using streaming replication or log shipping? I believe both are
different. Can you share you archive_command and recovery.conf content?

Streaming replication.

archive_command:
archive_command = '/usr/local/pgsql/data/log_shipper.sh "%p" "%f"'

log_shipper.sh
#!/usr/local/bin/bash
rsync -a $1 pgprod@prod-db-slave:archive/$2 < /dev/null;
rsync -a $1 pgprod@prod-db-slave:p3_wal_files/$2 < /dev/null; # Temp
storage for WAL files

recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=prod-db port=5434 user=USER password=PW'
trigger_file = '/tmp/pgsql_prod_db.trigger'
restore_command = 'cp -f /usr/local/pgsql/archive/%f %p < /dev/null'
archive_cleanup_command = 'pg_archivecleanup /usr/local/pgsql/archive/ %r'

Show quoted text

Thanks,
-Joseph Kregloh

Sameer Kumar

sameer.kumar@ashnik.com

over 11 years ago

In reply to: Joseph Kregloh (#8)

Re: Pausing log shipping for streaming replication

On Tue, Dec 16, 2014 at 11:36 PM, Joseph Kregloh <jkregloh@sproutloud.com>
wrote:

archive_command:
archive_command = '/usr/local/pgsql/data/log_shipper.sh "%p" "%f"'

log_shipper.sh
#!/usr/local/bin/bash
rsync -a $1 pgprod@prod-db-slave:archive/$2 < /dev/null;
rsync -a $1 pgprod@prod-db-slave:p3_wal_files/$2 < /dev/null; # Temp
storage for WAL files

Sorry, I read your other posts just now. This seems sane to me.

recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=prod-db port=5434 user=USER password=PW'
trigger_file = '/tmp/pgsql_prod_db.trigger'
restore_command = 'cp -f /usr/local/pgsql/archive/%f %p < /dev/null'

Since you are specifying the restore_command and archive (As per my
understanding of your situation) is full, should not you start polling the
secondary wal archive that you have spcifcied in archive.sh?

archive_cleanup_command = 'pg_archivecleanup /usr/local/pgsql/archive/ %r'

I generally don't prefer this since archives are helpful for a PITR as well
so I prefer cleaning them up manually.

Best Regards,

*Sameer Kumar | Database Consultant*

*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533

M: *+65 8110 0350* T: +65 6438 3504 | www.ashnik.com

*[image: icons]*

[image: Email patch] <http://www.ashnik.com/>

This email may contain confidential, privileged or copyright material and
is solely for the use of the intended recipient(s).

Pausing log shipping for streaming replication

Attachments: