Backup "Best Practices"

Started by Israel Brewsterover 9 years ago18 messagesgeneral

israel@ravnalaska.net

over 9 years ago

I was wondering if someone could inform me about, or point me to an online article about best practices for backing up a postgresql database cluster? At the moment, I have two servers running with streaming replication for failover purposes, and then I do nightly database dumps for recovery/backup purposes. However, I feel this approach is lacking in a couple areas I can think of:

- If a total failure occurs towards the end of the day, we could potentially loose a whole days worth of data. Similar argument for user error - there is no way to "undo" a catastrophic user data error without going all the way back to the previous day
- Less-than-ideal recovery under some scenarios. Since each database in the cluster is dumped individually, this is good should only *one* database need to be restored, but could get tedious should the entire cluster need to be restored.

To mitigate these issues, I am thinking of supplementing the individual dumps with a full base backup and WAL archiving to a NFS share. This should enable (relatively) quick/easy recovery from backup, plus the ability to do PIT Recovery. I do have a few questions with this approach, however:

- How do I figure out how often I should take a full base backup? I know this will depend on the amount of traffic my database is doing, and how long I am willing to wait for WAL files to be replayed - the more WAL files needing replayed, the longer recovery will take - but is there some rule of thumb that I can use to calculate how often I need a new base backup? Perhaps based on the number of WAL files?
- What is the "best" (or just a good) method of keeping the WAL archives under control? Obviously when I do a new basebackup I can "cleanup" any old files that said backup doesn't need, but how do I know what those are?
- Should I be looking at any other backup methods in addition to/instead of the basebackup/WAL archive scheme?

Thanks for any information!
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

John R Pierce

pierce@hogranch.com

over 9 years ago

In reply to: Israel Brewster (#1)

Re: Backup "Best Practices"

On 11/23/2016 10:16 AM, Israel Brewster wrote:

To mitigate these issues, I am thinking of supplementing the
individual dumps with a full base backup and WAL archiving to a NFS
share. This should enable (relatively) quick/easy recovery from
backup, plus the ability to do PIT Recovery. I do have a few questions
with this approach, however:

if you do the base backup periodically (daily? weekly? monthly? kinda
depends on your workload, data volume, and available space on your WAL
archive server)

- How do I figure out how often I should take a full base backup? I
know this will depend on the amount of traffic my database is doing,
and how long I am willing to wait for WAL files to be replayed - the
more WAL files needing replayed, the longer recovery will take - but
is there some rule of thumb that I can use to calculate how often I
need a new base backup? Perhaps based on the number of WAL files?

frequency of doing this is entirely dependent on your data volume, how
long the backups take, and your tolerance for restore times. wal archive
recoveries are generally quite fast, but of course if there's millions
of wal files to recover, that would take a long time. note, too,
9.6 has some major performance enhancements in how checkpoints are
written, which should speed things up overall.

- What is the "best" (or just a good) method of keeping the WAL
archives under control? Obviously when I do a new basebackup I can
"cleanup" any old files that said backup doesn't need, but how do I
know what those are?
- Should I be looking at any other backup methods in addition
to/instead of the basebackup/WAL archive scheme?

I would keep at least 2 base backups and all wal files since the start
of the oldest base backup. when you create a new base backup, delete
the oldest, and all wal archive files up to the start of the following
one. the reason for keeping at least two is so you always have a
recovery point if something catastrophic happens during the backup process.

note that having base backups plus wal archives allows PiTR too, point
in time recovery. say you discover a bad script updated a ton of bad
data into your database last tuesday at 3pm. you can restore the
preceding base backup, then recover up to tuesday just before this event.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Israel Brewster

israel@ravnalaska.net

over 9 years ago

In reply to: John R Pierce (#2)

Re: Backup "Best Practices"

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

On Nov 23, 2016, at 9:42 AM, John R Pierce <pierce@hogranch.com> wrote:

On 11/23/2016 10:16 AM, Israel Brewster wrote:

To mitigate these issues, I am thinking of supplementing the individual dumps with a full base backup and WAL archiving to a NFS share. This should enable (relatively) quick/easy recovery from backup, plus the ability to do PIT Recovery. I do have a few questions with this approach, however:

if you do the base backup periodically (daily? weekly? monthly? kinda depends on your workload, data volume, and available space on your WAL archive server)

- How do I figure out how often I should take a full base backup? I know this will depend on the amount of traffic my database is doing, and how long I am willing to wait for WAL files to be replayed - the more WAL files needing replayed, the longer recovery will take - but is there some rule of thumb that I can use to calculate how often I need a new base backup? Perhaps based on the number of WAL files?

frequency of doing this is entirely dependent on your data volume, how long the backups take, and your tolerance for restore times. wal archive recoveries are generally quite fast, but of course if there's millions of wal files to recover, that would take a long time. note, too, 9.6 has some major performance enhancements in how checkpoints are written, which should speed things up overall.

Backups (using pg_basebackup to a tar file, gziped) take about 45 minutes. Part of the goal here is to minimize restore time in a disaster recovery scenario, so *fairly* low tolerance for restore times (say, after getting the base backup in place, another 5 minutes or less to get it up-and-running). My difficulty is that I don't know if that translates into two WAL files or two thousand. But that was really my question - is there some way to know how many WAL files are too many, and by extension how often I need to do a new base backup? Obviously how quickly I get to that point of "too many" WAL files could vary widely, but if I could get some idea of what that line is, or at least how to figure out that line, that would really help with planning.

From what you said, it sounds like I can go for a while, and allow quite a few WAL files to pile up (especially with 9.6, which I will be moving to shortly) between base backups without creating too large a recovery delay. So I'm thinking I may try weekly or even monthly base backups and see how that goes.

- What is the "best" (or just a good) method of keeping the WAL archives under control? Obviously when I do a new basebackup I can "cleanup" any old files that said backup doesn't need, but how do I know what those are?
- Should I be looking at any other backup methods in addition to/instead of the basebackup/WAL archive scheme?

I would keep at least 2 base backups and all wal files since the start of the oldest base backup.

Good thinking. I'll look into implementing that.

when you create a new base backup, delete the oldest, and all wal archive files up to the start of the following one.

How do I determine which those are? Just based on the timestamp if the WAL file, such that I could do something like take the timestamp of the last basebackup and delete all WAL files older than that? Or is there a better way?

the reason for keeping at least two is so you always have a recovery point if something catastrophic happens during the backup process.

note that having base backups plus wal archives allows PiTR too, point in time recovery. say you discover a bad script updated a ton of bad data into your database last tuesday at 3pm. you can restore the preceding base backup, then recover up to tuesday just before this event.

Exactly - that's one of the primary reasons I'm taking this approach.

Thanks again for the info!

Show quoted text

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

John R Pierce

pierce@hogranch.com

over 9 years ago

In reply to: Israel Brewster (#3)

Re: Backup "Best Practices"

On 11/23/2016 11:20 AM, Israel Brewster wrote:

How do I determine which those are? Just based on the timestamp if the
WAL file, such that I could do something like take the timestamp of
the last basebackup and delete all WAL files older than that? Or is
there a better way?

there's a tool, barman, I've never used but its supposed to greatly
simplify this whole process...

http://www.pgbarman.org/

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Israel Brewster

israel@ravnalaska.net

over 9 years ago

In reply to: John R Pierce (#4)

Re: Backup "Best Practices"

On Nov 23, 2016, at 10:31 AM, John R Pierce <pierce@hogranch.com> wrote:

On 11/23/2016 11:20 AM, Israel Brewster wrote:

How do I determine which those are? Just based on the timestamp if the WAL file, such that I could do something like take the timestamp of the last basebackup and delete all WAL files older than that? Or is there a better way?

there's a tool, barman, I've never used but its supposed to greatly simplify this whole process...

http://www.pgbarman.org/ <http://www.pgbarman.org/>

Definitely looks like something to try. Thanks!

Show quoted text

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Joe Conway

mail@joeconway.com

over 9 years ago

In reply to: Israel Brewster (#5)

Re: Backup "Best Practices"

On 11/23/2016 02:50 PM, Israel Brewster wrote:

On Nov 23, 2016, at 10:31 AM, John R Pierce <pierce@hogranch.com

there's a tool, barman, I've never used but its supposed to greatly
simplify this whole process...

http://www.pgbarman.org/

Definitely looks like something to try. Thanks!

I'd recommend you also look at pgbackrest:
https://github.com/pgbackrest/pgbackrest

HTH,

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Andreas Kretschmer

andreas@a-kretschmer.de

over 9 years ago

In reply to: John R Pierce (#4)

Re: Backup "Best Practices"

Am 23. November 2016 20:31:21 MEZ, schrieb John R Pierce <pierce@hogranch.com>:

On 11/23/2016 11:20 AM, Israel Brewster wrote:

How do I determine which those are? Just based on the timestamp if

the

WAL file, such that I could do something like take the timestamp of
the last basebackup and delete all WAL files older than that? Or is
there a better way?

there's a tool, barman, I've never used but its supposed to greatly
simplify this whole process...

http://www.pgbarman.org/

Barman is a really great tool.

--
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Vick Khera

vivek@khera.org

over 9 years ago

In reply to: Israel Brewster (#1)

Re: Backup "Best Practices"

On Wed, Nov 23, 2016 at 1:16 PM, Israel Brewster <israel@ravnalaska.net>
wrote:

I was wondering if someone could inform me about, or point me to an online
article about best practices for backing up a postgresql database cluster?
At the moment, I have two servers running with streaming replication for
failover purposes, and then I do nightly database dumps for recovery/backup
purposes. However, I feel this approach is lacking in a couple areas I can
think of:

The first thing you have to do is list all of the problems you want to
solve by having backups. You did list some of them here, but you should
think long and hard about it. I did not see anything about off-site backups
for example. Those are necessary if you want to protect yourself against
something like a fire destroying everything at your primary location.
Consider your business costs for losing X hours of data vs the cost of
storing that data. The backup and recovery you come up with will have to be
tailored to your needs.

For my use case I do the following:

1 live backup using slony for immediate server failover.
Nightly pg_dump snapshots kept for 2 weeks for customer "pilot error"
recovery
The nightly pg_dump files are kept for 2 years for the first of every month
to help recover from pilot error. So far the longest back I've needed to
restore has been about 6 months, but disk space is cheap for me. These
files are kept on a third server that is not one of the two primary DB
servers.

The pg_dump files are rsync'd from my data center to a filer at my main
office about 40 miles away, for my off-site disaster recovery.

If I could afford a faster and more reliable network connection at the
office, I'd set up live streaming backup offsite as well. However, the
physical location makes that not likely to happen any time soon :(

Also consider the extra burden that doing your backups puts on your
servers. Do they have enough I/O capacity to handle it, especially at peak
demand times?

Jeff Janes

jeff.janes@gmail.com

over 9 years ago

In reply to: Israel Brewster (#1)

Re: Backup "Best Practices"

On Wed, Nov 23, 2016 at 10:16 AM, Israel Brewster <israel@ravnalaska.net>
wrote:

I was wondering if someone could inform me about, or point me to an online
article about best practices for backing up a postgresql database cluster?
At the moment, I have two servers running with streaming replication for
failover purposes, and then I do nightly database dumps for recovery/backup
purposes. However, I feel this approach is lacking in a couple areas I can
think of:

- If a total failure occurs towards the end of the day, we could
potentially loose a whole days worth of data.

Why wouldn't the streaming replica salvage that? Are they expected to fail
together? Is the NFS share onto which you want to store your basebackup
and WAL also expected to fail together with them?

Similar argument for user error - there is no way to "undo" a catastrophic
user data error without going all the way back to the previous day
- Less-than-ideal recovery under some scenarios. Since each database in
the cluster is dumped individually, this is good should only *one* database
need to be restored, but could get tedious should the entire cluster need
to be restored.

To mitigate these issues, I am thinking of supplementing the individual
dumps with a full base backup and WAL archiving to a NFS share. This should
enable (relatively) quick/easy recovery from backup, plus the ability to do
PIT Recovery. I do have a few questions with this approach, however:

- How do I figure out how often I should take a full base backup? I know
this will depend on the amount of traffic my database is doing, and how
long I am willing to wait for WAL files to be replayed - the more WAL files
needing replayed, the longer recovery will take - but is there some rule of
thumb that I can use to calculate how often I need a new base backup?
Perhaps based on the number of WAL files?

You have to try it and see. Different types of wal records will take
different amounts of time to re-play, so there is no rule of thumb. It
would depend on the type of traffic you have in your database. And it
could be limited by a single CPU, or by IO. If the restore_command needs
to restore the WAL from a remote server, it is very likely to be limited by
the latency of doing that. In fact, this is often the bottleneck even if
it is restoring from the local server, at least if archival is often driven
by archive_timeout.

When I need to re-clone production to get a fresh server to use for dev or
testing, I do so using almost exactly the same method I would use for
restoring production from a disaster (restore from most recent basebackup,
then recovery from WAL archive). So I know how long it takes for the
recovery to happen based on true experience, and I take a new basebackup
when that length of time starts to annoy me.

- What is the "best" (or just a good) method of keeping the WAL archives
under control? Obviously when I do a new basebackup I can "cleanup" any old
files that said backup doesn't need,

You have said you might be interested in doing PITR. So you want to delay
the cleanup so as to not compromise that ability. You need to develop a
policy on how far back you want to be able to do a PITR.

but how do I know what those are?

pg_archivecleanup -n /mnt/server/archiverdir
000000010000000000000010.00000020.backup

- Should I be looking at any other backup methods in addition to/instead
of the basebackup/WAL archive scheme?

You may want to consider pg_receivexlog to maintain your WAL archive,
rather than archive_command. That way you don't have to worry about the
trades off caused by setting archive_timeout. But unless you use it with a
replication slot, it is not very safe as the pg_receivexlog could stop
working and your database would happy run along without protection. Also,
it is hard to be sure you are reliably issuing an fsyncs over NFS, so with
archive_command over NFS there is always the risk your WAL data is not
actually reaching disk in a timely fashion. So if you can run
pg_receivexlog running on the NFS-host machine pointed to the local
storage, not looping back over NFS, that is safer.

Cheers,

Jeff

#10

Israel Brewster

israel@ravnalaska.net

over 9 years ago

In reply to: Jeff Janes (#9)

Re: Backup "Best Practices"

On Nov 25, 2016, at 1:00 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Nov 23, 2016 at 10:16 AM, Israel Brewster <israel@ravnalaska.net <mailto:israel@ravnalaska.net>> wrote:
I was wondering if someone could inform me about, or point me to an online article about best practices for backing up a postgresql database cluster? At the moment, I have two servers running with streaming replication for failover purposes, and then I do nightly database dumps for recovery/backup purposes. However, I feel this approach is lacking in a couple areas I can think of:

- If a total failure occurs towards the end of the day, we could potentially loose a whole days worth of data.

Why wouldn't the streaming replica salvage that? Are they expected to fail together? Is the NFS share onto which you want to store your basebackup and WAL also expected to fail together with them?

That's why I specified *total* failure. If only the primary dies, that's a simple cutover to the secondary, and not a total failure - no need to go to backups of any sort under that scenario :-) I'm thinking more along the lines of server room burns down, natural disaster, etc - something that causes a total failure of the db system, necessitating recovery from an offsite backup.

Similar argument for user error - there is no way to "undo" a catastrophic user data error without going all the way back to the previous day
- Less-than-ideal recovery under some scenarios. Since each database in the cluster is dumped individually, this is good should only *one* database need to be restored, but could get tedious should the entire cluster need to be restored.

To mitigate these issues, I am thinking of supplementing the individual dumps with a full base backup and WAL archiving to a NFS share. This should enable (relatively) quick/easy recovery from backup, plus the ability to do PIT Recovery. I do have a few questions with this approach, however:

- How do I figure out how often I should take a full base backup? I know this will depend on the amount of traffic my database is doing, and how long I am willing to wait for WAL files to be replayed - the more WAL files needing replayed, the longer recovery will take - but is there some rule of thumb that I can use to calculate how often I need a new base backup? Perhaps based on the number of WAL files?

You have to try it and see. Different types of wal records will take different amounts of time to re-play, so there is no rule of thumb. It would depend on the type of traffic you have in your database. And it could be limited by a single CPU, or by IO. If the restore_command needs to restore the WAL from a remote server, it is very likely to be limited by the latency of doing that. In fact, this is often the bottleneck even if it is restoring from the local server, at least if archival is often driven by archive_timeout.

When I need to re-clone production to get a fresh server to use for dev or testing, I do so using almost exactly the same method I would use for restoring production from a disaster (restore from most recent basebackup, then recovery from WAL archive). So I know how long it takes for the recovery to happen based on true experience, and I take a new basebackup when that length of time starts to annoy me.

Gotcha. Guess I'll have to start running some tests :-)

- What is the "best" (or just a good) method of keeping the WAL archives under control? Obviously when I do a new basebackup I can "cleanup" any old files that said backup doesn't need,

You have said you might be interested in doing PITR. So you want to delay the cleanup so as to not compromise that ability. You need to develop a policy on how far back you want to be able to do a PITR.

but how do I know what those are?

pg_archivecleanup -n /mnt/server/archiverdir 000000010000000000000010.00000020.backup

Ok, but where does that "000000010000000000000010.00000020.backup" come from? I mean, I can tell it's a WAL segment file name (plus a backup label), but I don't have anything like that in my WAL archives, even though I've run pg_basebackup a couple of times. Do I have to call something to create that file? Some flag to pg_basebackup? At the moment I am running pg_basebackup such that it generates gziped tar files, if that makes a difference.

- Should I be looking at any other backup methods in addition to/instead of the basebackup/WAL archive scheme?

You may want to consider pg_receivexlog to maintain your WAL archive, rather than archive_command. That way you don't have to worry about the trades off caused by setting archive_timeout. But unless you use it with a replication slot, it is not very safe as the pg_receivexlog could stop working and your database would happy run along without protection. Also, it is hard to be sure you are reliably issuing an fsyncs over NFS, so with archive_command over NFS there is always the risk your WAL data is not actually reaching disk in a timely fashion. So if you can run pg_receivexlog running on the NFS-host machine pointed to the local storage, not looping back over NFS, that is safer.

Thanks, I'll look into it, along with the other suggested tools. Perhaps they will solve all my confusion :)

Show quoted text

Cheers,

Jeff

#11

John R Pierce

pierce@hogranch.com

over 9 years ago

In reply to: Israel Brewster (#10)

Re: Backup "Best Practices"

On 11/28/2016 2:50 PM, Israel Brewster wrote:

Why wouldn't the streaming replica salvage that? Are they expected
to fail together? Is the NFS share onto which you want to store your
basebackup and WAL also expected to fail together with them?

That's why I specified *total* failure. If only the primary dies,
that's a simple cutover to the secondary, and not a total failure - no
need to go to backups of any sort under that scenario :-) I'm thinking
more along the lines of server room burns down, natural disaster, etc
- something that causes a total failure of the db system,
necessitating recovery from an offsite backup.

your base backups and WAL archive should be replicated offsite, then.

--
john r pierce, recycling bits in santa cruz

#12

Israel Brewster

israel@ravnalaska.net

over 9 years ago

In reply to: John R Pierce (#11)

Re: Backup "Best Practices"

On Nov 28, 2016, at 2:13 PM, John R Pierce <pierce@hogranch.com> wrote:

On 11/28/2016 2:50 PM, Israel Brewster wrote:

Why wouldn't the streaming replica salvage that? Are they expected to fail together? Is the NFS share onto which you want to store your basebackup and WAL also expected to fail together with them?

That's why I specified *total* failure. If only the primary dies, that's a simple cutover to the secondary, and not a total failure - no need to go to backups of any sort under that scenario :-) I'm thinking more along the lines of server room burns down, natural disaster, etc - something that causes a total failure of the db system, necessitating recovery from an offsite backup.

your base backups and WAL archive should be replicated offsite, then.

Obviously :-)

Show quoted text

--
john r pierce, recycling bits in santa cruz

#13

Jeff Janes

jeff.janes@gmail.com

over 9 years ago

In reply to: Israel Brewster (#10)

Re: Backup "Best Practices"

On Mon, Nov 28, 2016 at 2:50 PM, Israel Brewster <israel@ravnalaska.net>
wrote:

- What is the "best" (or just a good) method of keeping the WAL archives

under control? Obviously when I do a new basebackup I can "cleanup" any old
files that said backup doesn't need,

You have said you might be interested in doing PITR. So you want to delay
the cleanup so as to not compromise that ability. You need to develop a
policy on how far back you want to be able to do a PITR.

but how do I know what those are?

pg_archivecleanup -n /mnt/server/archiverdir 000000010000000000000010.
00000020.backup

Ok, but where does that "000000010000000000000010.00000020.backup" come
from? I mean, I can tell it's a WAL segment file name (plus a backup
label), but I don't have anything like that in my WAL archives, even though
I've run pg_basebackup a couple of times.

I get one file like that for every pg_basebackup I run. Could your
archive_command be doing something to specifically short-circuit the
writing of those files? Like testing the length of %p or %f?

Do I have to call something to create that file? Some flag to
pg_basebackup? At the moment I am running pg_basebackup such that it
generates gziped tar files, if that makes a difference.

That is how I run it as well. I don't think there is a flag to
pg_basebackup which even allows you to bypass the creation of those files.
You are looking in the WAL archive itself, correct? Not somewhere in a
listing of the base.tar.gz file?

Cheers,

Jeff

#14

Thomas Kellerer

spam_eater@gmx.net

over 9 years ago

In reply to: Israel Brewster (#10)

Re: Backup "Best Practices"

Israel Brewster schrieb am 28.11.2016 um 23:50:

pg_archivecleanup -n /mnt/server/archiverdir 000000010000000000000010.00000020.backup

Ok, but where does that "000000010000000000000010.00000020.backup"
come from? I mean, I can tell it's a WAL segment file name (plus a
backup label), but I don't have anything like that in my WAL
archives, even though I've run pg_basebackup a couple of times. Do I
have to call something to create that file? Some flag to
pg_basebackup? At the moment I am running pg_basebackup such that it
generates gziped tar files, if that makes a difference.

The .backup file will be inside the tar file if I'm not mistaken

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15

Israel Brewster

israel@ravnalaska.net

over 9 years ago

In reply to: Jeff Janes (#13)

Re: Backup "Best Practices"

On Nov 28, 2016, at 10:04 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Mon, Nov 28, 2016 at 2:50 PM, Israel Brewster <israel@ravnalaska.net <mailto:israel@ravnalaska.net>> wrote:

- What is the "best" (or just a good) method of keeping the WAL archives under control? Obviously when I do a new basebackup I can "cleanup" any old files that said backup doesn't need,

You have said you might be interested in doing PITR. So you want to delay the cleanup so as to not compromise that ability. You need to develop a policy on how far back you want to be able to do a PITR.

but how do I know what those are?

pg_archivecleanup -n /mnt/server/archiverdir 000000010000000000000010.00000020.backup

Ok, but where does that "000000010000000000000010.00000020.backup" come from? I mean, I can tell it's a WAL segment file name (plus a backup label), but I don't have anything like that in my WAL archives, even though I've run pg_basebackup a couple of times.

I get one file like that for every pg_basebackup I run. Could your archive_command be doing something to specifically short-circuit the writing of those files? Like testing the length of %p or %f?

My archive command is simply a copy - straight out of the examples given in the documentation, actually. Only test I do is to make sure the file doesn't exist before running the copy

Do I have to call something to create that file? Some flag to pg_basebackup? At the moment I am running pg_basebackup such that it generates gziped tar files, if that makes a difference.

That is how I run it as well. I don't think there is a flag to pg_basebackup which even allows you to bypass the creation of those files. You are looking in the WAL archive itself, correct? Not somewhere in a listing of the base.tar.gz file?

I am looking at the WAL archive itself. One thing that just occurred to me: in my testing, I've been running the base backup from the secondary slave server. Perhaps that makes a difference? I know the slave itself doesn't archive WAL files, but I would have expected the master to get the message a backup was being run and do any needed archiving itself.

Show quoted text

Cheers,

Jeff

#16

Israel Brewster

israel@ravnalaska.net

over 9 years ago

In reply to: Thomas Kellerer (#14)

Re: Backup "Best Practices"

On Nov 28, 2016, at 10:20 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:

Israel Brewster schrieb am 28.11.2016 um 23:50:

pg_archivecleanup -n /mnt/server/archiverdir 000000010000000000000010.00000020.backup

Ok, but where does that "000000010000000000000010.00000020.backup"
come from? I mean, I can tell it's a WAL segment file name (plus a
backup label), but I don't have anything like that in my WAL
archives, even though I've run pg_basebackup a couple of times. Do I
have to call something to create that file? Some flag to
pg_basebackup? At the moment I am running pg_basebackup such that it
generates gziped tar files, if that makes a difference.

The .backup file will be inside the tar file if I'm not mistaken

Oh, ok - thanks. I'll check on that. Although since I am generating gziped tar files, that's not the easiest thing in the world - I'll have to decompress the file first, then extract the .backup file. Still, worth looking into I suppose :-)

Show quoted text

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#17

Israel Brewster

israel@ravnalaska.net

over 9 years ago

In reply to: Israel Brewster (#15)

Re: Backup "Best Practices"

On Nov 29, 2016, at 8:12 AM, Israel Brewster <israel@ravnalaska.net> wrote:

On Nov 28, 2016, at 10:04 PM, Jeff Janes <jeff.janes@gmail.com <mailto:jeff.janes@gmail.com>> wrote:

On Mon, Nov 28, 2016 at 2:50 PM, Israel Brewster <israel@ravnalaska.net <mailto:israel@ravnalaska.net>> wrote:

- What is the "best" (or just a good) method of keeping the WAL archives under control? Obviously when I do a new basebackup I can "cleanup" any old files that said backup doesn't need,

You have said you might be interested in doing PITR. So you want to delay the cleanup so as to not compromise that ability. You need to develop a policy on how far back you want to be able to do a PITR.

but how do I know what those are?

pg_archivecleanup -n /mnt/server/archiverdir 000000010000000000000010.00000020.backup

Ok, but where does that "000000010000000000000010.00000020.backup" come from? I mean, I can tell it's a WAL segment file name (plus a backup label), but I don't have anything like that in my WAL archives, even though I've run pg_basebackup a couple of times.

I get one file like that for every pg_basebackup I run. Could your archive_command be doing something to specifically short-circuit the writing of those files? Like testing the length of %p or %f?

My archive command is simply a copy - straight out of the examples given in the documentation, actually. Only test I do is to make sure the file doesn't exist before running the copy

Do I have to call something to create that file? Some flag to pg_basebackup? At the moment I am running pg_basebackup such that it generates gziped tar files, if that makes a difference.

That is how I run it as well. I don't think there is a flag to pg_basebackup which even allows you to bypass the creation of those files. You are looking in the WAL archive itself, correct? Not somewhere in a listing of the base.tar.gz file?

I am looking at the WAL archive itself. One thing that just occurred to me: in my testing, I've been running the base backup from the secondary slave server. Perhaps that makes a difference? I know the slave itself doesn't archive WAL files, but I would have expected the master to get the message a backup was being run and do any needed archiving itself.

So to test, I ran a base backup from my primary server rather than the secondary - and the .backup file WAS indeed created in the WAL archive directory. So I guess that means I have to run base backups from the primary server. Are there any performance implications to doing this that I should be aware of? Something that would imply I need to make sure to run the backup during lull periods?

Show quoted text

Cheers,

Jeff

#18

Jeff Janes

jeff.janes@gmail.com

over 9 years ago

In reply to: Thomas Kellerer (#14)

Re: Backup "Best Practices"

On Mon, Nov 28, 2016 at 11:20 PM, Thomas Kellerer <spam_eater@gmx.net>
wrote:

Israel Brewster schrieb am 28.11.2016 um 23:50:

pg_archivecleanup -n /mnt/server/archiverdir
000000010000000000000010.00000020.backup

Ok, but where does that "000000010000000000000010.00000020.backup"
come from? I mean, I can tell it's a WAL segment file name (plus a
backup label), but I don't have anything like that in my WAL
archives, even though I've run pg_basebackup a couple of times. Do I
have to call something to create that file? Some flag to
pg_basebackup? At the moment I am running pg_basebackup such that it
generates gziped tar files, if that makes a difference.

The .backup file will be inside the tar file if I'm not mistaken

I don't think it will be, but there will be a backup_label file in there,
which contains much of the same contents as the .backup file does. But in
this case, the contents of the file are not important, only the name is.
pg_archivecleanup doesn't attempt to open the file-name given as the
second argument, it just looks at the name itself. So you could pull
backup_label out of the tar file, parse the contents and use them to
construct the command to give to pg_archivecleanup.

I think it would really be nice if pg_basebackup -D backup_dir -Ft would
create the backup_label file not only in the tarball, but also (as the
final step) create it as a loosie file in the backup_dir.

Cheers,

Jeff

Backup "Best Practices"

Attachments:

Attachments: