Are file system level differential/incremental backups possible?

Started by Bob Hatfieldover 14 years ago12 messagesgeneral
Jump to latest
#1Bob Hatfield
bobhatfield@gmail.com

Is it possible to do a full file system level backup of the data
directory, say once a week, and differentials or incrementals daily?

I'm wondering if there are files that would normally be removed that a
restore: Full then diff/inc would not remove and perhaps
corrupt/confuse things.

Process:
Saturday: Full backup (reset archive bits) of data dir with database shutdown
Sunday: Differential (don't reset archive bits) of data dir with
database shutdown
Monday: Differential (don't reset archive bits) of data dir with
database shutdown
Wednesday: Restore to test server using Saturday's Full and Monday's
Differential.

Obviously this works for regular files/file systems; however, I'm not
sure this is a good method with postgresql as the resulting data dir
*may* (?) contain extra files (or other issues)?

Note: our database is 850GB (Windows 2008 R2 pg version 8.3.12)

#2Julien Rouhaud
rjuju123@gmail.com
In reply to: Bob Hatfield (#1)
Re: Are file system level differential/incremental backups possible?

As there's one file for each object, a single update on each would make you
to copy the all the file again. I heard there was tool to make differentiel
copy of a part of a file but I don't know if it's really efficient.

Anyway, a better way for you would be to do a regular backup (with
pg_start_backup, copy and pg_stop_backup) and then use wal archive_command
to keep the xlogs between 2 full backups.

On Wed, Oct 12, 2011 at 11:30 PM, Bob Hatfield <bobhatfield@gmail.com>wrote:

Show quoted text

Is it possible to do a full file system level backup of the data
directory, say once a week, and differentials or incrementals daily?

I'm wondering if there are files that would normally be removed that a
restore: Full then diff/inc would not remove and perhaps
corrupt/confuse things.

Process:
Saturday: Full backup (reset archive bits) of data dir with database
shutdown
Sunday: Differential (don't reset archive bits) of data dir with
database shutdown
Monday: Differential (don't reset archive bits) of data dir with
database shutdown
Wednesday: Restore to test server using Saturday's Full and Monday's
Differential.

Obviously this works for regular files/file systems; however, I'm not
sure this is a good method with postgresql as the resulting data dir
*may* (?) contain extra files (or other issues)?

Note: our database is 850GB (Windows 2008 R2 pg version 8.3.12)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Bob Hatfield
bobhatfield@gmail.com
In reply to: Julien Rouhaud (#2)
Re: Are file system level differential/incremental backups possible?

Anyway, a better way for you would be to do a regular backup (with pg_start_backup, copy and pg_stop_backup) and then use wal archive_command to keep the xlogs between 2 full backups.

Thanks Julien. Can pg_start/stop_backup() be used for regular full
file system backups? All of the documentation I've read only refers
to using those for warm standby/wal shipping methods.

#4Julien Rouhaud
rjuju123@gmail.com
In reply to: Bob Hatfield (#3)
Re: Are file system level differential/incremental backups possible?

On Thu, Oct 13, 2011 at 12:04 AM, Bob Hatfield <bobhatfield@gmail.com>wrote:

Anyway, a better way for you would be to do a regular backup (with

pg_start_backup, copy and pg_stop_backup) and then use wal archive_command
to keep the xlogs between 2 full backups.

Thanks Julien. Can pg_start/stop_backup() be used for regular full
file system backups? All of the documentation I've read only refers
to using those for warm standby/wal shipping methods.

Yes, and it's the only way to do a file backup without stopping the server.
Careful, the command can last a while as it forces a checkpoint (see the doc
for more details).

It's used for warm standby to create a copy of the server, before the wals
that'll be generated can continue to restore it.

#5Alan Hodgson
ahodgson@simkin.ca
In reply to: Bob Hatfield (#3)
Re: Are file system level differential/incremental backups possible?

On October 12, 2011 03:04:30 PM Bob Hatfield wrote:

Anyway, a better way for you would be to do a regular backup (with
pg_start_backup, copy and pg_stop_backup) and then use wal
archive_command to keep the xlogs between 2 full backups.

Thanks Julien. Can pg_start/stop_backup() be used for regular full
file system backups? All of the documentation I've read only refers
to using those for warm standby/wal shipping methods.

The base backup necessary to initialize a warm standby server is a full file
system backup of the database, which can also be used for restores to any
point in time after the base backup is completed, assuming you also have all
the archived WAL files.

#6Bob Hatfield
bobhatfield@gmail.com
In reply to: Alan Hodgson (#5)
Re: Are file system level differential/incremental backups possible?

The base backup necessary to initialize a warm standby server is a full file system backup of the database, which can also be used for restores to any point in time after the base backup is completed, assuming you also have all the archived WAL files.

Thanks to both of you. I currently use the base backup technique for
use with a warm standby server but was not aware I could use that for
full file system level backups as well.

In fact, I currently run a warm standby server (created from a base
backup every several weeks) as well as stop/start the pg process to do
a full file system level copy each night. If I think about this a
bit, I'm sure there's a more efficient way of doing this. (As well as
may solve another problem I'm having (another post) with the standby
server's database getting corrupt after stopping/starting the
primary's pg process.)

#7Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Bob Hatfield (#1)
Re: Are file system level differential/incremental backups possible?

Bob Hatfield wrote:

Is it possible to do a full file system level backup of the data
directory, say once a week, and differentials or incrementals daily?

I'm wondering if there are files that would normally be removed that a
restore: Full then diff/inc would not remove and perhaps
corrupt/confuse things.

If you drop or truncate a table between the full and the incremental
backup, will that file be "resurrected"?

Such resurrected files will not disturb PostgreSQL, but if you keep
them around, you might end up with a lot of dead files if you have to
restore a couple of times.

Yours,
Laurenz Albe

#8Bob Hatfield
bobhatfield@gmail.com
In reply to: Laurenz Albe (#7)
Re: Are file system level differential/incremental backups possible?

If you drop or truncate a table between the full and the incremental backup, will that file be "resurrected"?

Such resurrected files will not disturb PostgreSQL, but if you keep them around, you might end up with a lot of dead files if you have to restore a couple of times.

That makes sense, thank you. At one point, I was doing delta copies
of the database: full one night, then only new/changed files the
following nights -- I am pretty sure that didn't work and thought it
was because the destination had extra files that pg had removed on the
primary. I now use a robocopy /mir which removes files on the
destination that aren't on the primary. This seems to work fine --
but it made me think that PostreSQL doesn't work with the former
process.

#9Craig Ringer
craig@2ndquadrant.com
In reply to: Bob Hatfield (#1)
Re: Are file system level differential/incremental backups possible?

On 10/13/2011 05:30 AM, Bob Hatfield wrote:

Is it possible to do a full file system level backup of the data
directory, say once a week, and differentials or incrementals daily?

I'd love to be able to do this, but you can't do it usefully at a
file-system level. There's too much churn in the data files for even a
binary diff to be much use - and even if it were, the performance of it
would be miserable.

You *could* do a differential so long as it's a proper one that tracks
file removals as well as additions/changes. It'd be pretty pointless
though as you wouldn't save much if any storage.

I've looked into database-level diffs, but from what I can find out it
seems that PostgreSQL's MVCC system doesn't store enough information to
produce a differential dump, either. There's no way to detect and record
tuples that were deleted then vacuumed away since the last backup was
taken, so a trigger-based or WAL-based system is necessary.

I'd love a way to "collapse" or merge a set of WAL segments into a
minimal diff that only contained just enough information to get from the
start to end state of the series of WAL segments, rather than all the
churn in-between. This would be great for storing longer PITR histories
(but more coarsely) and being able to do faster restores. Unfortunately
given how scattered writes are I doubt it'd actually be possible or any
faster if it was.

SQL-level differentials would be great, though.

--
Craig Ringer

#10Bob Hatfield
bobhatfield@gmail.com
In reply to: Craig Ringer (#9)
Re: Are file system level differential/incremental backups possible?

Is it possible to do a full file system level backup of the data
directory, say once a week, and differentials or incrementals daily?

I'd love to be able to do this, but you can't do it usefully at a
file-system level. There's too much churn in the data files for even a
binary diff to be much use - and even if it were, the performance of it
would be miserable.

Actually, for us, a full backup is currently about 950GB which takes
about 24 hours and a diff several days later is only 150GB and takes
two hours. (It takes significantly less time not only becuase it's
less data but also because the diff job doesn't have to compete with
other backup jobs (since it's faster due to less data).)

I currently stop pg, robocopy (rsync) the changes to our DRS server,
restart pg -- then backup the data on the DRS server later. This
allows us to have our database back up within 30 minutes.

I've tested the database after doing a restore: full + diff and it
seems fine (both viewing with the application as well as running a
reindex on the entire db). I think we'll stick with this process with
the understanding that if a disaster struck, and we needed to do a
full + diff restore, that we may have a few files that are there that
that aren't needed.

#11Craig Ringer
craig@2ndquadrant.com
In reply to: Bob Hatfield (#10)
Re: Are file system level differential/incremental backups possible?

On 10/20/2011 02:00 AM, Bob Hatfield wrote:

Is it possible to do a full file system level backup of the data
directory, say once a week, and differentials or incrementals daily?

I'd love to be able to do this, but you can't do it usefully at a
file-system level. There's too much churn in the data files for even a
binary diff to be much use - and even if it were, the performance of it
would be miserable.

Actually, for us, a full backup is currently about 950GB which takes
about 24 hours and a diff several days later is only 150GB and takes
two hours. (It takes significantly less time not only becuase it's
less data but also because the diff job doesn't have to compete with
other backup jobs (since it's faster due to less data).)

Wow. I'm very surprised by that, but it's good to hear. I would've
expected a much poorer result, and have never seen anything like that
good a result when I've tested with my (much smaller) data sets.

Perhaps a binary diff based backup mechanism that's aware of Pg's
structure and can skip holes, etc is worth looking into after all.

--
Craig Ringer

#12Greg Sabino Mullane
greg@turnstep.com
In reply to: Craig Ringer (#9)
Re: Are file system level differential/incremental backups possible?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Is it possible to do a full file system level backup of the data
directory, say once a week, and differentials or incrementals daily?

...

taken, so a trigger-based or WAL-based system is necessary.

...

SQL-level differentials would be great, though.

FWIW, Bucardo 5 has a 'flatfile' mode that will output the deletion
and insert statements to get a table from an old state to a new one.
Not quite what is being asked for here, but could be useful for
similar things as well as for auditing.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201110221339
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAk6i/9oACgkQvJuQZxSWSsjbkgCfRIIstVLsw6OeMUJZ2iL2if4Z
4hQAoIpOZXK/OseM45FAUmi8vcESpRif
=ed0H
-----END PGP SIGNATURE-----