Differential Backups

Started by Ian Hardingover 24 years ago13 messagesgeneral

ianh@tpchd.org

over 24 years ago

I have been thinking about backups. I currently do one a day. However, I thought it might be nice to get differential backups through the day. I should be able to generate dumps throughout the day, generate a diff from my baseline dump, and just keep the diff, right? Then to do a restore I would just patch for the point in time I wanted to restore to? Seems like it would work, but whether it would save any hard drive space would depend on how much activity the database saw. Anyone doing this now?

Ian A. Harding
Programmer/Analyst II
Tacoma-Pierce County Health Department
(253) 798-3549
mailto: ianh@tpchd.org

Doug McNaught

doug@wireboard.com

over 24 years ago

In reply to: Ian Harding (#1)

Re: Differential Backups

"Ian Harding" <ianh@tpchd.org> writes:

I have been thinking about backups. I currently do one a day.
However, I thought it might be nice to get differential backups
through the day. I should be able to generate dumps throughout the
day, generate a diff from my baseline dump, and just keep the diff,
right? Then to do a restore I would just patch for the point in
time I wanted to restore to? Seems like it would work, but whether
it would save any hard drive space would depend on how much activity
the database saw. Anyone doing this now?

Interesting idea. The one thing I might worry about is that 'diff'
might (I'm not familiar with its algorithm) eat a great deal of memory
if the dumps you're comparing are very large and significantly
different.

I'd say give it a try and see how you like it.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863

Import Notes

Reply to msg id not found: IanHarding'smessageofMon29Oct2001122244-0800

Timothy H. Keitt

tklistaddr@keittlab.bio.sunysb.edu

over 24 years ago

In reply to: Ian Harding (#1)

Re: Differential Backups

Tried it. GNU diff chokes on very large files. It would be so nice if
incremental dumps were native to pgsql.

Tim

Ian Harding wrote:

I have been thinking about backups. I currently do one a day. However, I thought it might be nice to get differential backups through the day. I should be able to generate dumps throughout the day, generate a diff from my baseline dump, and just keep the diff, right? Then to do a restore I would just patch for the point in time I wanted to restore to? Seems like it would work, but whether it would save any hard drive space would depend on how much activity the database saw. Anyone doing this now?

Ian A. Harding
Programmer/Analyst II
Tacoma-Pierce County Health Department
(253) 798-3549
mailto: ianh@tpchd.org

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

--
Timothy H. Keitt
Department of Ecology and Evolution
State University of New York at Stony Brook
Stony Brook, New York 11794 USA
Phone: 631-632-1101, FAX: 631-632-7626
http://life.bio.sunysb.edu/ee/keitt/

Alvaro Herrera

alvherre@atentus.com

over 24 years ago

In reply to: Doug McNaught (#2)

Re: Differential Backups

On 29 Oct 2001, Doug McNaught wrote:

"Ian Harding" <ianh@tpchd.org> writes:

I have been thinking about backups. I currently do one a day.
However, I thought it might be nice to get differential backups
through the day.

Interesting idea. The one thing I might worry about is that 'diff'
might (I'm not familiar with its algorithm) eat a great deal of memory
if the dumps you're comparing are very large and significantly
different.

GNU diff reads in memory both files. You sure need lots to compare
medium sized databases, and I don't think this method will work on big
ones.

I think this has to be implemented inside the database; maybe there's a
way of extracting the data from WAL logs (committed transactions?). Then
you need to go to the tables and see what each transaction did...

Another way to do it could be to store a timestamp on each tuple, and
check that for the diff backup. Sounds like you're going to enlarge your
data a lot by just having the timestamps...

--
Alvaro Herrera (<alvherre[@]atentus.com>)
"Coge la flor que hoy nace alegre, ufana. Quiï¿½n sabe si nacera otra man~ana?"

Tod McQuillin

devin@spamcop.net

over 24 years ago

In reply to: Ian Harding (#1)

Re: Differential Backups

On Mon, 29 Oct 2001, Ian Harding wrote:

I have been thinking about backups. I currently do one a day.
However, I thought it might be nice to get differential backups
through the day. I should be able to generate dumps throughout the
day, generate a diff from my baseline dump, and just keep the diff,
right? Then to do a restore I would just patch for the point in time
I wanted to restore to?

This is exactly what rcs (http://www.cs.purdue.edu/homes/trinkle/RCS/) and
cvs (http://www.cvshome.org/) do.

If you check each new pgdump into an rcs file, rcs saves only the diffs
from the prior revision.

I'm not sure if this would meet your needs or not, but it's worth a look.
--
Tod McQuillin

Paul Tomblin

ptomblin@xcski.com

over 24 years ago

In reply to: Alvaro Herrera (#4)

Re: Differential Backups

Quoting Alvaro Herrera (alvherre@atentus.com):

Interesting idea. The one thing I might worry about is that 'diff'
might (I'm not familiar with its algorithm) eat a great deal of memory
if the dumps you're comparing are very large and significantly
different.

GNU diff reads in memory both files. You sure need lots to compare
medium sized databases, and I don't think this method will work on big
ones.

Doesn't GNU diff have the "-h" option?

--
Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody
Never underestimate the bandwidth of a station wagon full of
tapes hurtling down the highway.
-- Andrew Tanenbaum

Chris Dircks

chrisd@rasstar.ca

over 24 years ago

In reply to: Paul Tomblin (#6)

Re: Differential Backups

quote from gnu diff man page:

-h This option currently has no effect; it is present for Unix
compatibility.

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Paul Tomblin
Sent: Monday, October 29, 2001 6:31 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Differential Backups

Quoting Alvaro Herrera (alvherre@atentus.com):

Interesting idea. The one thing I might worry about is that 'diff'
might (I'm not familiar with its algorithm) eat a great deal of memory
if the dumps you're comparing are very large and significantly
different.

GNU diff reads in memory both files. You sure need lots to compare
medium sized databases, and I don't think this method will work on big
ones.

Doesn't GNU diff have the "-h" option?

--
Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody
Never underestimate the bandwidth of a station wagon full of
tapes hurtling down the highway.
-- Andrew Tanenbaum

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Alvaro Herrera

alvherre@atentus.com

over 24 years ago

In reply to: Paul Tomblin (#6)

Re: Differential Backups

On Mon, 29 Oct 2001, Paul Tomblin wrote:

Quoting Alvaro Herrera (alvherre@atentus.com):

Interesting idea. The one thing I might worry about is that 'diff'
might (I'm not familiar with its algorithm) eat a great deal of memory
if the dumps you're comparing are very large and significantly
different.

GNU diff reads in memory both files. You sure need lots to compare
medium sized databases, and I don't think this method will work on big
ones.

Doesn't GNU diff have the "-h" option?

No, at least in my version of it (2.7, which appears to be the latest in
my local mirror of GNU). What's that supposed to do? In fact, the help
text says

-h This option currently has no effect; it is present
for Unix compatibility.

--
Alvaro Herrera (<alvherre[@]atentus.com>)
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)

Paul Tomblin

ptomblin@xcski.com

over 24 years ago

In reply to: Alvaro Herrera (#8)

Re: Differential Backups

Quoting Alvaro Herrera (alvherre@atentus.com):

On Mon, 29 Oct 2001, Paul Tomblin wrote:

Quoting Alvaro Herrera (alvherre@atentus.com):

Interesting idea. The one thing I might worry about is that 'diff'
might (I'm not familiar with its algorithm) eat a great deal of memory
if the dumps you're comparing are very large and significantly
different.

GNU diff reads in memory both files. You sure need lots to compare
medium sized databases, and I don't think this method will work on big
ones.

Doesn't GNU diff have the "-h" option?

No, at least in my version of it (2.7, which appears to be the latest in
my local mirror of GNU). What's that supposed to do? In fact, the help
text says

-h This option currently has no effect; it is present
for Unix compatibility.

The option I'm thinking of might be "-H". The old man pages used to say
it stood for "half hearted", optimized for large files with few
differences.

--
Paul Tomblin <ptomblin@xcski.com>, not speaking for anybody
God does not play dice with the Universe. -- Albert Einstein.

#10

Nicholas Piper

nick@nickpiper.co.uk

over 24 years ago

In reply to: Alvaro Herrera (#8)

Re: Differential Backups

On Tue, 30 Oct 2001, Alvaro Herrera wrote:

On Mon, 29 Oct 2001, Paul Tomblin wrote:

Quoting Alvaro Herrera (alvherre@atentus.com):

GNU diff reads in memory both files. You sure need lots to compare
medium sized databases, and I don't think this method will work on big
ones.

Doesn't GNU diff have the "-h" option?

No, at least in my version of it (2.7, which appears to be the latest in
my local mirror of GNU). What's that supposed to do? In fact, the help

Maybe the -H option was meant:

-H Use heuristics to speed handling of large files
that have numerous scattered small changes.

In 2.7 also.

--
Part 3 MEng Cybernetics; Reading, UK http://www.nickpiper.co.uk/
Change PGP actions of mailer or fetch key see website 1024D/3ED8B27F
Choose life. Be Vegan :-) Please reduce needless cruelty + suffering !

#11

hubert depesz lubaczewski

depesz@depesz.pl

over 24 years ago

In reply to: Ian Harding (#1)

Re: Differential Backups

On Mon, 29 Oct 2001 12:22:44 -0800
"Ian Harding" <ianh@tpchd.org> wrote:

I have been thinking about backups. I currently do one a day. However, I

thought it might be nice to get differential backups through the day. I
should be able to generate dumps throughout the day, generate a diff from my
baseline dump, and just keep the diff, right? Then to do a restore I would
just patch for the point in time I wanted to restore to? Seems like it would
work, but whether it would save any hard drive space would depend on how much
activity the database saw. Anyone doing this now?

idea is god, but dont use suggested diff program. go for xdelta. it's
algorithm is much better - faster and definetly less memory-eating.

depesz

--
hubert depesz lubaczewski http://www.depesz.pl/
------------------------------------------------------------------------
... vows are spoken to be broken ... [enjoy the silence]
... words are meaningless and forgettable ... [depeche mode]

#12

Jeff Lu

jklcom@mindspring.com

over 24 years ago

In reply to: hubert depesz lubaczewski (#11)

Re: Differential Backups

Can you show me an example on doing a backup using xdelta?

Thanks
-Jeff

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of hubert depesz
lubaczewski
Sent: Tuesday, October 30, 2001 7:29 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Differential Backups

On Mon, 29 Oct 2001 12:22:44 -0800
"Ian Harding" <ianh@tpchd.org> wrote:

I have been thinking about backups. I currently do one a day. However, I

thought it might be nice to get differential backups through the day. I
should be able to generate dumps throughout the day, generate a diff from my
baseline dump, and just keep the diff, right? Then to do a restore I would
just patch for the point in time I wanted to restore to? Seems like it
would
work, but whether it would save any hard drive space would depend on how
much
activity the database saw. Anyone doing this now?

idea is god, but dont use suggested diff program. go for xdelta. it's
algorithm is much better - faster and definetly less memory-eating.

depesz

#13

hubert depesz lubaczewski

depesz@depesz.pl

over 24 years ago

In reply to: Jeff Lu (#12)

Re: Differential Backups

On Tue, 30 Oct 2001 11:05:48 -0800
"Jeff Lu" <jklcom@mindspring.com> wrote:

Can you show me an example on doing a backup using xdelta?

sure.
what i will show assumes that usually you want newest backup to be available
fastest. older backups can take some time to generate.

first make your standard pg_dump to some file. let's call it dump.sql
$ pg_dump -d dump.sql .........
o.k.
now next day (and every following day too) you do:
$ pg_dump -d new.dump ........
$ xdelta delta new.dump dump.sql patch_file_name
$ mv -f new.dump dump.sql

now in dump.sql you always have the newest dump file, while patch file
contains information how to get older patch from newer.
how to patch?

$ xdelta patch patch_file_name dump.sql old.dump.sql

all you have to do is to store these patchfiles forever, or just ocassionally
(once in a month) make full backup instead of differential.

depesz