Seeking datacenter PITR backup suggestions

Started by Joey K.over 18 years ago10 messagesgeneral
Jump to latest
#1Joey K.
pguser@gmail.com

Greetings,

We have several web applications with Pg 8.2.x running on isolated servers
(~25). The database size on each machines (du -h pgdata) is ~2 GB. We have
been using nightly filesystem backup (stop pg, tar backup to ftp, start pg)
and it worked well.

We would like to move to PITR backups since the database size will increase
moving forward and our current backup method might increase server
downtimes.

We have a central ftp backup server (yes, ftp :-) which we would like to use
for weekly full and daily incremental PITR backups.

After reading the docs, PITR is still fuzzy. Our ideas for backup are (do
not worry about the syntax),

** START **

tmpwal = "/localhost/tmp" # tmp space on server 1 for storing wal files
before ftp
Configure $pgdata/postgresql.conf archive_command = "cp %p $tmpwal/%f"

Day 1:
% psql pg_start_backup(); tar pgdata.tar --exclude pg_xlog/ pgdata
% psql pg_stop_backup()
% ftp put pgdata.tar ftpserver:/server1/day1/pgdata
% ftp put $tmpwal/* ftpserver:/server1/day1/wal
% rm -f $tmpwal/* pgdata.tar

Day 2:
% ftp put $tmpwal/* ftpserver:/server1/day2/wal
% rm -f $tmpwal/*

Day 3:
...
...

Day 7:
% rm -f $tmpwal/*
Start over

Recovery on server1 (skeleton commands),
% rm -f $tmpwal/*
% mv pgdata pgdata.hosed
% ftp get ftpbackup:/server1/day1/pgdata.tar .
% tar -xvf pgdata.tar
% ftp get ftpbackup:/server1/day1/wal/* $tmpwal
% ftp get ftpbackup:/server1/day2/wal/* $tmpwal
.....
.....
% cp -r pgdata.hosed/pg_xlog pgdata/
% echo "cp $tmpwal/%f %p" > pgdata/recovery.conf
% start pg (recovery begins)

** END **

Assumptions:
a. After pg_stop_backup(), Pg immediately recycles log files and hence wal
logs can be copied to backup. This is a clean start.
b. New wal files since (a) are incremental backups

We are not sure if WAL log filenames are unique and possibly overwrite older
wal files during recovery.

I'm seeking suggestions from others with experience performing PostgreSQL
PITR backups from multiple servers to a central backup server.

Thanks in advance,
Joey Krane

#2Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Joey K. (#1)
Re: Seeking datacenter PITR backup suggestions

On Aug 17, 2007, at 5:48 PM, Joey K. wrote:

We have several web applications with Pg 8.2.x running on isolated
servers (~25). The database size on each machines (du -h pgdata) is
~2 GB. We have been using nightly filesystem backup (stop pg, tar
backup to ftp, start pg) and it worked well.

We would like to move to PITR backups since the database size will
increase moving forward and our current backup method might
increase server downtimes.

We have a central ftp backup server (yes, ftp :-) which we would
like to use for weekly full and daily incremental PITR backups.

After reading the docs, PITR is still fuzzy. Our ideas for backup
are (do not worry about the syntax),

** START **

tmpwal = "/localhost/tmp" # tmp space on server 1 for storing wal
files before ftp
Configure $pgdata/postgresql.conf archive_command = "cp %p $tmpwal/%f"

Why not just FTP WAL files directly?

Day 1:
% psql pg_start_backup(); tar pgdata.tar --exclude pg_xlog/ pgdata
% psql pg_stop_backup()
% ftp put pgdata.tar ftpserver:/server1/day1/pgdata
% ftp put $tmpwal/* ftpserver:/server1/day1/wal
% rm -f $tmpwal/* pgdata.tar

The last 2 are a race condition... you could easily lose a WAL file
that way.

Keep in mind that that pgdata.tar is 100% useless unless you also
have the WAL files that were created during the backup. I generally
recommend to folks that they keep two base copies around for that
reason.

Day 2:
% ftp put $tmpwal/* ftpserver:/server1/day2/wal
% rm -f $tmpwal/*

Day 3:
...
...

Day 7:
% rm -f $tmpwal/*
Start over

Recovery on server1 (skeleton commands),
% rm -f $tmpwal/*
% mv pgdata pgdata.hosed
% ftp get ftpbackup:/server1/day1/pgdata.tar .
% tar -xvf pgdata.tar
% ftp get ftpbackup:/server1/day1/wal/* $tmpwal
% ftp get ftpbackup:/server1/day2/wal/* $tmpwal
.....
.....
% cp -r pgdata.hosed/pg_xlog pgdata/
% echo "cp $tmpwal/%f %p" > pgdata/recovery.conf
% start pg (recovery begins)

** END **

Assumptions:
a. After pg_stop_backup(), Pg immediately recycles log files and
hence wal logs can be copied to backup. This is a clean start.
b. New wal files since (a) are incremental backups

We are not sure if WAL log filenames are unique and possibly
overwrite older wal files during recovery.

I'm seeking suggestions from others with experience performing
PostgreSQL PITR backups from multiple servers to a central backup
server.

In general, your handling of WAL files seems fragile and error-prone.
I think it would make far more sense to just FTP them directly, and
not try and get fancy with different directories for different days.
*when* a WAL file was generated is meaningless until you compare it
to a base backup to see if that WAL file is required for the base
backup, useful (but not required) to the base backup, or useless for
the base backup.
--
Decibel!, aka Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#3Steve Crawford
scrawford@pinpointresearch.com
In reply to: Jim Nasby (#2)
Re: Seeking datacenter PITR backup suggestions

In general, your handling of WAL files seems fragile and error-prone....

Indeed. I would recommend simply using rsync to handle pushing the
files. I see several advantages:

1. Distributed load - you aren't copying a full-day of files all at once.

2. Very easy to set-up - you can use it directly as your archive_command
if you wish.

3. Atomic. Rsync copies new data to a temporary location that will only
be moved into place when the transfer is complete. The destination
server will never see a partial file. Depending on the FTP client/server
combo, you will likely end up with a partial file in the event of
communication failure.

4. Much more up-to-the-minute recovery data.

In your scenario, what about using "cp -l" (or "ln") instead? Since the
hard-link it is only creating a new pointer, it will be very fast and
save a bunch of disk IO on your server and it doesn't appear that the
tempdir is for much other than organizing purposes anyway.

I'm setting up some test machines to learn more about PITR and warm
backups and am considering a two-stage process using "cp -l" to add the
file to the list needing transfer and regular rsync to actually move the
files to the destination machine. (The destination machine will be over
a WAN link so I'd like to avoid having PG tied up waiting for each rsync
to complete.)

Cheers,
Steve

#4Bruce Momjian
bruce@momjian.us
In reply to: Steve Crawford (#3)
Re: Seeking datacenter PITR backup suggestions

"Steve Crawford" <scrawford@pinpointresearch.com> writes:

4. Much more up-to-the-minute recovery data.

In your scenario, what about using "cp -l" (or "ln") instead? Since the
hard-link it is only creating a new pointer, it will be very fast and
save a bunch of disk IO on your server and it doesn't appear that the
tempdir is for much other than organizing purposes anyway.

Postgres tries to reuse WAL files. Once the archive_command completes it
believes it is safe to reuse the old file without deleting it. That will do
nasty things if you've used ln as your archive command.

I'm setting up some test machines to learn more about PITR and warm
backups and am considering a two-stage process using "cp -l" to add the
file to the list needing transfer and regular rsync to actually move the
files to the destination machine. (The destination machine will be over
a WAN link so I'd like to avoid having PG tied up waiting for each rsync
to complete.)

In theory the only thing that the archive command blocks is the reuse of WAL
log files. So as long as the command is running if Postgres has used up all
its existing WAL files it will have to create new ones which does have some
performance cost. But otherwise it's unaffected.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#5Steve Crawford
scrawford@pinpointresearch.com
In reply to: Bruce Momjian (#4)
Re: Seeking datacenter PITR backup suggestions

Gregory Stark wrote:

In your scenario, what about using "cp -l" (or "ln") instead?....

Postgres tries to reuse WAL files. Once the archive_command completes it
believes it is safe to reuse the old file without deleting it. That will do
nasty things if you've used ln as your archive command.

Um, OK. I won't try that. Thanks. That really could be nasty as I
imagine that it would cause either corruption or failures on the standby
server that might be very hard to track down.

In theory the only thing that the archive command blocks is the reuse of WAL
log files....

What happens if PG server crashes during a WAL transfer? Does it still
know the file needs to be transferred or is doing a fast copy to a local
(same machine or machine on local network) from which files are
transferred over a slow-link a safer process?

Also, in the (rare) case that PG needs to be restarted, will the restart
block while waiting for all current log transfers to complete?

Cheers,
Steve

#6Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Steve Crawford (#3)
Re: Seeking datacenter PITR backup suggestions

On Tue, Aug 28, 2007 at 11:05:52AM -0700, Steve Crawford wrote:

In your scenario, what about using "cp -l" (or "ln") instead? Since the
hard-link it is only creating a new pointer, it will be very fast and
save a bunch of disk IO on your server and it doesn't appear that the
tempdir is for much other than organizing purposes anyway.

Note that that will only work if you're creating the link on the same
filesystem, and having /tmp and your data in the same filesystem isn't
such a hot idea.
--
Decibel!, aka Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#7Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Bruce Momjian (#4)
Re: Seeking datacenter PITR backup suggestions

On Tue, Aug 28, 2007 at 08:31:10PM +0100, Gregory Stark wrote:

"Steve Crawford" <scrawford@pinpointresearch.com> writes:

4. Much more up-to-the-minute recovery data.

In your scenario, what about using "cp -l" (or "ln") instead? Since the
hard-link it is only creating a new pointer, it will be very fast and
save a bunch of disk IO on your server and it doesn't appear that the
tempdir is for much other than organizing purposes anyway.

Postgres tries to reuse WAL files. Once the archive_command completes it
believes it is safe to reuse the old file without deleting it. That will do
nasty things if you've used ln as your archive command.

I thought that was specifically disabled when PITR was enabled? Or do we
just do a rename rather than an unlink ond creating a new file?
--
Decibel!, aka Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#7)
Re: Seeking datacenter PITR backup suggestions

Decibel! <decibel@decibel.org> writes:

On Tue, Aug 28, 2007 at 08:31:10PM +0100, Gregory Stark wrote:

Postgres tries to reuse WAL files. Once the archive_command completes it
believes it is safe to reuse the old file without deleting it. That will do
nasty things if you've used ln as your archive command.

I thought that was specifically disabled when PITR was enabled? Or do we
just do a rename rather than an unlink ond creating a new file?

No. The only difference is we don't recycle the file until the
archive_command says it's done with it.

The archive_command must actually physically copy the data someplace
else, and must not return success until it's sure the copy is good.
Perhaps the docs are not sufficiently clear on the point?

regards, tom lane

#9Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#8)
Re: Seeking datacenter PITR backup suggestions

On Tue, Aug 28, 2007 at 09:54:23PM -0400, Tom Lane wrote:

Decibel! <decibel@decibel.org> writes:

On Tue, Aug 28, 2007 at 08:31:10PM +0100, Gregory Stark wrote:

Postgres tries to reuse WAL files. Once the archive_command completes it
believes it is safe to reuse the old file without deleting it. That will do
nasty things if you've used ln as your archive command.

I thought that was specifically disabled when PITR was enabled? Or do we
just do a rename rather than an unlink ond creating a new file?

No. The only difference is we don't recycle the file until the
archive_command says it's done with it.

The archive_command must actually physically copy the data someplace
else, and must not return success until it's sure the copy is good.
Perhaps the docs are not sufficiently clear on the point?

Yeah... I think that's a big gotcha waiting to smack someone. I'd
actually make the mention <strong> so that hopefully no one can miss
it... or do we have an official method for putting warnings in the docs?

"Because WAL segment files are renamed and not re-created from scratch,
it is critical that the archive command actually copy files, not move
or hard-link them."
--
Decibel!, aka Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#10Simon Riggs
simon@2ndQuadrant.com
In reply to: Jim Nasby (#9)
Re: Seeking datacenter PITR backup suggestions

On Tue, 2007-08-28 at 21:04 -0500, Decibel! wrote:

On Tue, Aug 28, 2007 at 09:54:23PM -0400, Tom Lane wrote:

Perhaps the docs are not sufficiently clear on the point?

Yeah... I think that's a big gotcha waiting to smack someone. I'd
actually make the mention <strong> so that hopefully no one can miss
it... or do we have an official method for putting warnings in the docs?

"Because WAL segment files are renamed and not re-created from scratch,
it is critical that the archive command actually copy files, not move
or hard-link them."

I'll shortly be writing a doc patch to clarify a few points and to
explain new possibilities, such as Koichi Suzuki's work.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com