Incremental backup with RSYNC or something?

Started by Phoenix Kiulaover 14 years ago19 messagesgeneral
Jump to latest
#1Phoenix Kiula
phoenix.kiula@gmail.com

Hi.

I currently have a cronjob to do a full pgdump of the database every
day. And then gzip it for saving to my backup drive.

However, my db is now 60GB in size, so this daily operation is making
less and less sense. (Some of you may think this is foolish to begin
with).

Question: what can I do to rsync only the new additions in every table
starting 00:00:01 until 23:59:59 for each day?

Searching google leads to complex things like "incremental WAL" and
whatnot, or talks of stuff like pgcluster. I'm hoping there's a more
straightforward core solution without additional software or PHD
degrees.

Many thanks for any ideas!
PK

#2Robins Tharakan
robins.tharakan@comodo.com
In reply to: Phoenix Kiula (#1)
Re: Incremental backup with RSYNC or something?

Hi,

Well, the 'complex' stuff is only as there for larger or high-traffic
DBs. Besides at 60GB that is a largish DB in itself and you should begin
to try out a few other backup methods nonetheless. That is moreso, if
you are taking entire DB backups everyday, you would save a considerable
lot on (backup) storage.

Anyway, as for pgdump, we have a DB 20x bigger than you mention (1.3TB)
and it takes only half a day to do a pgdump+gzip (both). One thing that
comes to mind, how are you compressing? I hope you are doing this in one
operation (or at least piping pgdump to gzip before writing to disk)?

--
Robins Tharakan

Show quoted text

On 11/13/2011 05:08 PM, Phoenix Kiula wrote:

Hi.

I currently have a cronjob to do a full pgdump of the database every
day. And then gzip it for saving to my backup drive.

However, my db is now 60GB in size, so this daily operation is making
less and less sense. (Some of you may think this is foolish to begin
with).

Question: what can I do to rsync only the new additions in every table
starting 00:00:01 until 23:59:59 for each day?

Searching google leads to complex things like "incremental WAL" and
whatnot, or talks of stuff like pgcluster. I'm hoping there's a more
straightforward core solution without additional software or PHD
degrees.

Many thanks for any ideas!
PK

#3Phoenix Kiula
phoenix.kiula@gmail.com
In reply to: Robins Tharakan (#2)
Re: Incremental backup with RSYNC or something?

On Sun, Nov 13, 2011 at 8:42 PM, Robins Tharakan
<robins.tharakan@comodo.com> wrote:

Hi,

Well, the 'complex' stuff is only as there for larger or high-traffic DBs.
Besides at 60GB that is a largish DB in itself and you should begin to try
out a few other backup methods nonetheless. That is moreso, if you are
taking entire DB backups everyday, you would save a considerable lot on
(backup) storage.

Thanks. I usually keep only the last 6 days of it. And monthly backups
as of Day 1. So it's not piling up or anything.

What "other methods" do you recommend? That was in fact my question.
Do I need to install some modules?

Anyway, as for pgdump, we have a DB 20x bigger than you mention (1.3TB) and
it takes only half a day to do a pgdump+gzip (both). One thing that comes to
mind, how are you compressing? I hope you are doing this in one operation
(or at least piping pgdump to gzip before writing to disk)?

I'm gzipping with this command (this is my backup.sh)--

BKPFILE=/backup/pg/dbback-${DATA}.sql
pg_dump MYDB -U MYDB_MYDB -f ${BKPFILE}
gzip --fast ${BKPFILE}

Is this good enough? Sadly, this takes up over 97% of the CPU when it's running!

#4Robins Tharakan
robins.tharakan@comodo.com
In reply to: Phoenix Kiula (#3)
Re: Incremental backup with RSYNC or something?

What "other methods" do you recommend? That was in fact my question.
Do I need to install some modules?

Well depending on your PG version you could read up about the various
backup methods. I believe you'll be interested in 24.3 there when you
ask for WAL archiving. The good thing is, its useful for DBs much bigger
and especially for those that 'cant' go down for even a minute, but yes
it has its trade-offs. (Its not that bad actually, but its a call you
need to take).

http://www.postgresql.org/docs/8.4/static/backup.html

I'm gzipping with this command (this is my backup.sh)--

BKPFILE=/backup/pg/dbback-${DATA}.sql
pg_dump MYDB -U MYDB_MYDB -f ${BKPFILE}
gzip --fast ${BKPFILE}

You could club the pgdump / gzip in one step, thereby avoiding extra
writes to disk. The URL below should help you on that (pgdump dbname |
gzip > file.gz)

http://www.postgresql.org/docs/8.4/static/backup-dump.html#BACKUP-DUMP-LARGE

You could also do a

pg_dump -Fc | gzip -1 -c > dumpfile.gz

at the cost of a slightly larger (but faster backup).

--
Robins Tharakan

#5Robins Tharakan
robins.tharakan@comodo.com
In reply to: Robins Tharakan (#4)
Re: Incremental backup with RSYNC or something?

You could also do a

pg_dump -Fc | gzip -1 -c > dumpfile.gz

at the cost of a slightly larger (but faster backup).

Actually if you're going this route, you could skip even the pg_dump
compression as well...

pg_dump db | gzip -1 -c > dumpfile.gz

--
Robins Tharakan

#6Grzegorz Jaśkiewicz
gryzman@gmail.com
In reply to: Robins Tharakan (#5)
Re: Incremental backup with RSYNC or something?

pg_dump -Fc already compresses, no need to pipe through gzip

#7Craig Ringer
craig@2ndquadrant.com
In reply to: Phoenix Kiula (#1)
Re: Incremental backup with RSYNC or something?

On Nov 13, 2011 7:39 PM, "Phoenix Kiula"

Question: what can I do to rsync only the new additions in every table
starting 00:00:01 until 23:59:59 for each day?

You can't really. You can rsync the whole thing and it can be faster, but
you can't really just copy the last changes as a diff.

That's because Pg writes all over the data files, it doesn't just append.
There isn't any 'last changed timestamp' on records, and even if there were
Pg would have no way to know which records to delete in the replication
target.

If you want differential backups you'll need to use a row based replication
system like slony or bucardo.

It'd be nice if Pg offered easier differential backups, but at this point
there isn't really anything.

Searching google leads to complex things like "incremental WAL" and
whatnot, or talks of stuff like pgcluster. I'm hoping there's a more
straightforward core solution without additional software or PHD
degrees.

Nothing really basic. You'll need to use PITR (WAL shipping), streaming
replication or a row level replication solution.

Show quoted text

Many thanks for any ideas!
PK

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Andy Colson
andy@squeakycode.net
In reply to: Grzegorz Jaśkiewicz (#6)
Re: Incremental backup with RSYNC or something?

On 11/13/2011 07:51 AM, Gregg Jaskiewicz wrote:

pg_dump -Fc already compresses, no need to pipe through gzip

I dont think that'll use two core's if you have 'em. The pipe method will use two cores, so it should be faster. (assuming you are not IO bound).

-Andy

#9Phoenix Kiula
phoenix.kiula@gmail.com
In reply to: Andy Colson (#8)
Re: Incremental backup with RSYNC or something?

On Sun, Nov 13, 2011 at 10:45 PM, Andy Colson <andy@squeakycode.net> wrote:

On 11/13/2011 07:51 AM, Gregg Jaskiewicz wrote:

pg_dump -Fc already compresses, no need to pipe through gzip

I dont think that'll use two core's if you have 'em.  The pipe method will
use two cores, so it should be faster.  (assuming you are not IO bound).

I am likely IO bound. Anyway, what's the right code for the pipe
method? I think the earlier recommendation had a problem as "-Fc"
already does compression.

Is this the right code for the FASTEST possible backup if I don't care
about the size of the dump, all I want is that it's not CPU-intensive
(with the tables I wish excluded) --

BKPFILE=/backup/pg/dbback-${DATA}.sql
pg_dump MYDB -T excludetable1 -T excludetable2 -U MYDB_MYDB | gzip
--fast > ${BKPFILE}

Thanks!

#10Venkat Balaji
venkat.balaji@verse.in
In reply to: Phoenix Kiula (#1)
Re: Incremental backup with RSYNC or something?

Question: what can I do to rsync only the new additions in every table
starting 00:00:01 until 23:59:59 for each day?

A table level replication (like Slony) should help here.

Or

A trigger based approach with dblink would be an-other (but, a bit complex)
option.

Thanks
VB

#11Phoenix Kiula
phoenix.kiula@gmail.com
In reply to: Venkat Balaji (#10)
Re: Incremental backup with RSYNC or something?

On Mon, Nov 14, 2011 at 1:45 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:

Question: what can I do to rsync only the new additions in every table
starting 00:00:01 until 23:59:59 for each day?

A table level replication (like Slony) should help here.

Slony needs more than one physical server, right?

#12Scott Marlowe
scott.marlowe@gmail.com
In reply to: Craig Ringer (#7)
Re: Incremental backup with RSYNC or something?

On Sun, Nov 13, 2011 at 7:01 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:

On Nov 13, 2011 7:39 PM, "Phoenix Kiula"

Searching google leads to complex things like "incremental WAL" and
whatnot, or talks of stuff like pgcluster. I'm hoping there's a more
straightforward core solution without additional software or PHD
degrees.

Nothing really basic. You'll need to use PITR (WAL shipping), streaming
replication or a row level replication solution.

One method they could use would be to partition the data into old and
new, and backup the new, then merge the partitions and start a new one
or something like that. But that's a huge pain. I would suggest PITR
backups here as the likely best match.

#13Venkat Balaji
venkat.balaji@verse.in
In reply to: Phoenix Kiula (#11)
Re: Incremental backup with RSYNC or something?

On Fri, Nov 18, 2011 at 6:08 AM, Phoenix Kiula <phoenix.kiula@gmail.com>wrote:

On Mon, Nov 14, 2011 at 1:45 PM, Venkat Balaji <venkat.balaji@verse.in>
wrote:

Question: what can I do to rsync only the new additions in every table
starting 00:00:01 until 23:59:59 for each day?

A table level replication (like Slony) should help here.

Slony needs more than one physical server, right?

Not necessarily, you can replicate with-in the same host as well. It all
depends on which tables you want to replicate.
It does not depend on number of hosts.

Thanks
VB

#14Robert Treat
xzilla@users.sourceforge.net
In reply to: Venkat Balaji (#10)
Re: Incremental backup with RSYNC or something?

On Mon, Nov 14, 2011 at 12:45 AM, Venkat Balaji <venkat.balaji@verse.in> wrote:

Question: what can I do to rsync only the new additions in every table
starting 00:00:01 until 23:59:59 for each day?

A table level replication (like Slony) should help here.
Or
A trigger based approach with dblink would be an-other (but, a bit complex)
option.

If you don't actually care about the rows of data specifically, and
just want incremental data diff, you might look at what options your
filesystem gives you. We often use incremental snapshots on ZFS to
give use smaller copies that can be shipped off to the backup server
and used to reconstruct the server if needed.

Robert Treat
conjecture: xzilla.net
consulting: omniti.com

#15Merlin Moncure
mmoncure@gmail.com
In reply to: Phoenix Kiula (#1)
Re: Incremental backup with RSYNC or something?

On Sun, Nov 13, 2011 at 5:38 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:

Hi.

I currently have a cronjob to do a full pgdump of the database every
day. And then gzip it for saving to my backup drive.

However, my db is now 60GB in size, so this daily operation is making
less and less sense. (Some of you may think this is foolish to begin
with).

Question: what can I do to rsync only the new additions in every table
starting 00:00:01 until 23:59:59 for each day?

Searching google leads to complex things like "incremental WAL" and
whatnot, or talks of stuff like pgcluster. I'm hoping there's a more
straightforward core solution without additional software or PHD
degrees.

My standard advice to this problem is to do a HS/SR setup which solves
a number of problems simultaneously. It still makes sence to take a
full snapshot once in a while though -- but you can take it from the
standby.

merlin

#16Alex Thurlow
alex-reports@blastro.com
In reply to: Merlin Moncure (#15)
Re: Incremental backup with RSYNC or something?

On 11/22/2011 3:28 PM, Merlin Moncure wrote:

On Sun, Nov 13, 2011 at 5:38 AM, Phoenix Kiula<phoenix.kiula@gmail.com> wrote:

Hi.

I currently have a cronjob to do a full pgdump of the database every
day. And then gzip it for saving to my backup drive.

However, my db is now 60GB in size, so this daily operation is making
less and less sense. (Some of you may think this is foolish to begin
with).

Question: what can I do to rsync only the new additions in every table
starting 00:00:01 until 23:59:59 for each day?

Searching google leads to complex things like "incremental WAL" and
whatnot, or talks of stuff like pgcluster. I'm hoping there's a more
straightforward core solution without additional software or PHD
degrees.

My standard advice to this problem is to do a HS/SR setup which solves
a number of problems simultaneously. It still makes sence to take a
full snapshot once in a while though -- but you can take it from the
standby.

merlin

How long is this backup taking? I have a ~100GB database that I back up
with pg_dump (which compresses as it dumps if you want it to) and that
only takes 35 minutes. Granted, I have it on some fast SCSI drives in
RAID 1, but even a single SATA drive should still finish in a decent
amount of time.

#17Phoenix Kiula
phoenix.kiula@gmail.com
In reply to: Alex Thurlow (#16)
Re: Incremental backup with RSYNC or something?

On Wed, Nov 23, 2011 at 6:13 AM, Alex Thurlow <alex-reports@blastro.com> wrote:

On 11/22/2011 3:28 PM, Merlin Moncure wrote:

......

How long is this backup taking?  I have a ~100GB database that I back up
with pg_dump (which compresses as it dumps if you want it to) and that only
takes 35 minutes.  Granted, I have it on some fast SCSI drives in RAID 1,
but even a single SATA drive should still finish in a decent amount of time.

Hi Alex, could you share what exact command you use? Mine are SCSI
too, in RAID 10, but the dump takes over 2-3 hours (60 GB database)
and the CPU consumption during this time is huge.

Thanks!

In reply to: Phoenix Kiula (#17)
Re: Incremental backup with RSYNC or something?

On Thu, Nov 24, 2011 at 4:49 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:

On Wed, Nov 23, 2011 at 6:13 AM, Alex Thurlow <alex-reports@blastro.com> wrote:

On 11/22/2011 3:28 PM, Merlin Moncure wrote:

......

How long is this backup taking?  I have a ~100GB database that I back up
with pg_dump (which compresses as it dumps if you want it to) and that only
takes 35 minutes.  Granted, I have it on some fast SCSI drives in RAID 1,
but even a single SATA drive should still finish in a decent amount of time.

Hi Alex, could you share what exact command you use? Mine are SCSI
too, in RAID 10, but the dump takes over 2-3 hours (60 GB database)
and the CPU consumption during this time is huge.

I wrote a bunch of shell scripts tools to backup postgres 9.1 with
rsync/ccollect (another hardlink tool), I might find the time to
publish it on github once I find the time.

--
Benjamin Henrion <bhenrion at ffii.org>
FFII Brussels - +32-484-566109 - +32-2-4148403
"In July 2005, after several failed attempts to legalise software
patents in Europe, the patent establishment changed its strategy.
Instead of explicitly seeking to sanction the patentability of
software, they are now seeking to create a central European patent
court, which would establish and enforce patentability rules in their
favor, without any possibility of correction by competing courts or
democratically elected legislators."

#19Phoenix Kiula
phoenix.kiula@gmail.com
In reply to: Benjamin Henrion (#18)
Re: Incremental backup with RSYNC or something?

On Thu, Nov 24, 2011 at 11:53 PM, Benjamin Henrion <bh@udev.org> wrote:

On Thu, Nov 24, 2011 at 4:49 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:

On Wed, Nov 23, 2011 at 6:13 AM, Alex Thurlow <alex-reports@blastro.com> wrote:

On 11/22/2011 3:28 PM, Merlin Moncure wrote:

......

How long is this backup taking?  I have a ~100GB database that I back up
with pg_dump (which compresses as it dumps if you want it to) and that only
takes 35 minutes.  Granted, I have it on some fast SCSI drives in RAID 1,
but even a single SATA drive should still finish in a decent amount of time.

Hi Alex, could you share what exact command you use? Mine are SCSI
too, in RAID 10, but the dump takes over 2-3 hours (60 GB database)
and the CPU consumption during this time is huge.

I wrote a bunch of shell scripts tools to backup postgres 9.1 with
rsync/ccollect (another hardlink tool), I might find the time to
publish it on github once I find the time.

Thanks Ben. Look forward to it.

Will the script be different for version 9.0.5? Would love to have
rsync working. Even without a script, just the commands will help.