Backup strategy using 'wal_keep_segments'

Started by Rhhh Linover 8 years ago7 messagesgeneral

ruanlinehan@hotmail.com

over 8 years ago

Hi,

Version 9.4...

Per the PG docs, to facilitate continuous WAL archiving and PITR recovery...
"To enable WAL archiving, set the wal_level configuration parameter to archive (or hot_standby), archive_mode to on, and specify the shell command to use in the archive_command configuration parameter."

This instruction is fine and I have a solid understanding of the implementation of these parameters, what they do and the why behind them, i.e. in order to maintain a continuous chain of database changes in the WAL stream.

My question however is this...
A colleague recently suggested that instead of implementing an 'archive_command' to push archivable WALs to a secondary location (for further backup to tape for example), we could instead persist the WAL files in their current location by setting the "wal_keep_segments" parameter to an extreme value e.g. 1000 and have the 'archive_command' do nothing.

So, something like...
wal_keep_segments=1000
archive_command='cd .'

And then periodically copy the archived WAL_files from the pg_xlog directory out to tape, removing as we go?

Is this approach feasible? Assuming obviously, we have sufficient disk space to facilitate 1000 WAL files etc.
But from a point-in-time recovery, and backup perspective - are we missing anything if we were to adopt this non-standard approach?

Regards

Michael Paquier

michael@paquier.xyz

over 8 years ago

In reply to: Rhhh Lin (#1)

Re: Backup strategy using 'wal_keep_segments'

On Mon, Oct 23, 2017 at 5:57 AM, Rhhh Lin <ruanlinehan@hotmail.com> wrote:

Is this approach feasible? Assuming obviously, we have sufficient disk space
to facilitate 1000 WAL files etc.

You expose yourself to race conditions with such methods if a
checkpoint has the bad idea to recycle past segments that your logic
is copying. So I would advise to not do that. Instead of using the
archive command, you should also consider using pg_receivexlog
combined with a replication slot. This brings way more control with
the error handling.
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Rhhh Lin

ruanlinehan@hotmail.com

over 8 years ago

In reply to: Michael Paquier (#2)

Re: Backup strategy using 'wal_keep_segments'

Thanks very much for your reply Michael.

I note that it looks like pgbarman employs pg_receivexlog; I will check it out.

Regards,

Ruan

________________________________
From: Michael Paquier <michael.paquier@gmail.com>
Sent: 22 October 2017 22:17:01
To: Rhhh Lin
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Backup strategy using 'wal_keep_segments'

On Mon, Oct 23, 2017 at 5:57 AM, Rhhh Lin <ruanlinehan@hotmail.com> wrote:

Is this approach feasible? Assuming obviously, we have sufficient disk space
to facilitate 1000 WAL files etc.

Stephen Frost

sfrost@snowman.net

over 8 years ago

In reply to: Rhhh Lin (#1)

Re: Backup strategy using 'wal_keep_segments'

Greetings,

* Rhhh Lin (ruanlinehan@hotmail.com) wrote:

A colleague recently suggested that instead of implementing an 'archive_command' to push archivable WALs to a secondary location (for further backup to tape for example), we could instead persist the WAL files in their current location by setting the "wal_keep_segments" parameter to an extreme value e.g. 1000 and have the 'archive_command' do nothing.

Michael's points are good and I wouldn't recommend using this archive
command either, but what isn't clear to me is what you're actaully
trying to solve by using such a method..? You haven't said anywhere
what's wrong with archive_command (I know that there certainly are some
things wrong with it, of course, but there are solutions to a number of
those issues that isn't a hack like this ...).

Thanks!

Stephen

Rhhh Lin

ruanlinehan@hotmail.com

over 8 years ago

In reply to: Stephen Frost (#4)

Re: Backup strategy using 'wal_keep_segments'

Hi Stephen,

Thanks for replying.

I would actually be an advocate for using a proper archive_command in order to facilitate a proper (Per the documentation) PITR and backup strategy.

However, a colleague had suggested such a creative approach (Possibly, less administrative overhead, ease of maintenance? I'm not sure) and I struggled to find any evidence online in blogs/white-papers/documentation that this was a feasible approach. That said, I couldn't find any info rejecting it as a method either, which led me to post here. Essentially, this was a difference of opinion on approach, and I was looking to gather information in order to make an informed opposing argument.

My only thought now would be how could I decipher, within the sequence chain of WAL files, up to which file has the "archival" progressed to. i.e. which files are not susceptible to being called upon again for restartpoints/checkpoints. That is, where is my absolute point (or file) of archival using something along the lines of 'pg_current_xlog_location'.

Regards,

Ruan

________________________________
From: Stephen Frost <sfrost@snowman.net>
Sent: 30 October 2017 16:41:11
To: Rhhh Lin
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Backup strategy using 'wal_keep_segments'

Greetings,

* Rhhh Lin (ruanlinehan@hotmail.com) wrote:

A colleague recently suggested that instead of implementing an 'archive_command' to push archivable WALs to a secondary location (for further backup to tape for example), we could instead persist the WAL files in their current location by setting the "wal_keep_segments" parameter to an extreme value e.g. 1000 and have the 'archive_command' do nothing.

Thanks!

Stephen

Michael Paquier

michael@paquier.xyz

over 8 years ago

In reply to: Rhhh Lin (#5)

Re: Backup strategy using 'wal_keep_segments'

On Tue, Oct 31, 2017 at 9:53 AM, Rhhh Lin <ruanlinehan@hotmail.com> wrote:

I would actually be an advocate for using a proper archive_command in order
to facilitate a proper (Per the documentation) PITR and backup strategy.

You should avoid using your own fancy archive command. There are
things that WAL-E for this purpose, and even if you do make sure that
the segment data is flushed to disk before sending back 0 as exit code
to the client.

However, a colleague had suggested such a creative approach (Possibly, less
administrative overhead, ease of maintenance? I'm not sure) and I struggled
to find any evidence online in blogs/white-papers/documentation that this
was a feasible approach. That said, I couldn't find any info rejecting it as
a method either, which led me to post here. Essentially, this was a
difference of opinion on approach, and I was looking to gather information
in order to make an informed opposing argument.

Backup methods should not be fancy, just reliable.

My only thought now would be how could I decipher, within the sequence chain
of WAL files, up to which file has the "archival" progressed to. i.e. which
files are not susceptible to being called upon again for
restartpoints/checkpoints. That is, where is my absolute point (or file) of
archival using something along the lines of 'pg_current_xlog_location'.

There is the system view called pg_stat_archiver for that. If you are
using something like pg_receivexlog, pg_stat_replication is the way to
go to monitor the archiving progress.
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Stephen Frost

sfrost@snowman.net

over 8 years ago

In reply to: Rhhh Lin (#5)

Re: Backup strategy using 'wal_keep_segments'

Greetings,

* Rhhh Lin (ruanlinehan@hotmail.com) wrote:

I would actually be an advocate for using a proper archive_command in order to facilitate a proper (Per the documentation) PITR and backup strategy.

Glad to hear it.

However, a colleague had suggested such a creative approach (Possibly, less administrative overhead, ease of maintenance? I'm not sure) and I struggled to find any evidence online in blogs/white-papers/documentation that this was a feasible approach. That said, I couldn't find any info rejecting it as a method either, which led me to post here. Essentially, this was a difference of opinion on approach, and I was looking to gather information in order to make an informed opposing argument.

I'd strongly suggest considering tried-and-true approachs instead of
coming up with something novel, particularly when it comes to backups
and recovery.

Thanks!

Stephen