Database backup mechanism
Hi Folks,
We have a requirement to deal with large databases of the size Terabytes
when we go into production. What is the best database back-up mechanism
and possible issues?
pg_dump can back-up database but the dump file is limited by OS file-size
limit. What about the option of compressing the dump file? How much time
does it generally take for large databases? I heard, that it would be way
too long (even one or days). I haven't tried it out, though.
What about taking zipped back-up of the database directory? We tried this
out but the checkpoint data in pg_xlogs directory is also being backed-up.
Since these logs keeps on increasing from day1 of database creation, the
back_up size if increasing drastically.
Can we back-up certain subdirectories without loss of information or
consistency..?
Any quick comments/suggestions in this regard would be very helpful.
Thanks in advance,
Ravi Kumar Mandala
RaviKumar.Mandala@versata.com wrote:
We have a requirement to deal with large databases of the size Terabytes
when we go into production. What is the best database back-up mechanism
and possible issues?
It depends.
Make sure you read Chapter 23. Backup and Restore of the user manual:
http://www.postgresql.org/docs/8.2/interactive/backup.html
It discusses pg_dump and restore, as well as file system level backup.
You'll probably want to set up continuous archiving, which allows you to
take a file-system level backup without shutting down the database.
What about taking zipped back-up of the database directory? We tried this
out but the checkpoint data in pg_xlogs directory is also being backed-up.
Since these logs keeps on increasing from day1 of database creation, the
back_up size if increasing drastically.
The amount of WAL files in pg_xlog directory is controlled by the
checkpoint_segments configuration parameter.
Can we back-up certain subdirectories without loss of information or
consistency..?
No.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
I would also be interested in any "creative" ways to reduce the size and
time to backup databases/clusters. We were just having a conversation
about this yesterday. We were mulling over things like using rsync to
only backup files in the database directory tree that actually changed.
Or maybe doing a selective backup of files based on modified times, etc,
but were unsure if this would be a safe, reliable way to backup a
reduced set of data.
Doug Knight
WSI Inc.
Andover, MA
Show quoted text
On Fri, 2007-02-09 at 12:45 +0530, RaviKumar.Mandala@versata.com wrote:
Hi Folks,
We have a requirement to deal with large databases of the size
Terabytes when we go into production. What is the best database
back-up mechanism and possible issues?pg_dump can back-up database but the dump file is limited by OS
file-size limit. What about the option of compressing the dump file?
How much time does it generally take for large databases? I heard,
that it would be way too long (even one or days). I haven't tried it
out, though.What about taking zipped back-up of the database directory? We tried
this out but the checkpoint data in pg_xlogs directory is also being
backed-up. Since these logs keeps on increasing from day1 of database
creation, the back_up size if increasing drastically.
Can we back-up certain subdirectories without loss of information or
consistency..?Any quick comments/suggestions in this regard would be very helpful.
Thanks in advance,
Ravi Kumar Mandala
RaviKumar.Mandala@versata.com wrote:
Hi Folks,
We have a requirement to deal with large databases of the size
Terabytes when we go into production. What is the best database
back-up mechanism and possible issues?pg_dump can back-up database but the dump file is limited by OS
file-size limit. What about the option of compressing the dump file?
How much time does it generally take for large databases? I heard,
that it would be way too long (even one or days). I haven't tried it
out, though.What about taking zipped back-up of the database directory? We tried
this out but the checkpoint data in pg_xlogs directory is also being
backed-up. Since these logs keeps on increasing from day1 of database
creation, the back_up size if increasing drastically.
Can we back-up certain subdirectories without loss of information or
consistency..?Any quick comments/suggestions in this regard would be very helpful.
Please ask in the correct forum, either pgsql-general or pgsql-admin.
This list is strictly for discussion of development of postgres, not
usage questions.
(If all you need is a pg_dump backup, maybe you could just pipe its
output to something like 'split -a 5 -b 1000m - mybackup')
cheers
andrew
Doug Knight wrote:
I would also be interested in any "creative" ways to reduce the size and
time to backup databases/clusters. We were just having a conversation
about this yesterday. We were mulling over things like using rsync to
only backup files in the database directory tree that actually changed.
Or maybe doing a selective backup of files based on modified times, etc,
but were unsure if this would be a safe, reliable way to backup a
reduced set of data.
Most of your virtual-layer filesystems should offer a snapshot facility
that lets the database think its writing to the files while you see a
static version. You could rsync that against an older file-level copy as
the base copy in a PITR backup.
Note - even with a snapshot facility you need to use PITR or stop the
database to get a guaranteed working copy.
--
Richard Huxton
Archonet Ltd
On Friday 09 February 2007 08:16, Doug Knight wrote:
I would also be interested in any "creative" ways to reduce the size and
time to backup databases/clusters. We were just having a conversation
about this yesterday. We were mulling over things like using rsync to
only backup files in the database directory tree that actually changed.
Or maybe doing a selective backup of files based on modified times, etc,
but were unsure if this would be a safe, reliable way to backup a
reduced set of data.
The biggest factor with backing up TB databases is the amount of activity you
have going on. If you have a fairly static data set, you can try using rsync
or custom pg_dump scripts piping specific tables to something like split as
needed. Both of these methods will require some finagling, so make sure to
test them before using them. You can probably also use something like Slony,
though the initial data copy will prove painful, but again on a fairly static
set of data it could work.
If you're doing large amounts of transactional activity, these methods are
going to become unworkable. We push about 2 GB of WAL an hour on one of our
systems, and the only method that seems workable is using PITR with weekly
filesystem snapshots as the base and then copying the xlogs offline for
re-play. It's still tricky to get right, but it seems to work.
--
Robert Treat
Database Architect
OmniTI Computer Consulting, Inc.