tape backups

Started by Benover 19 years ago5 messagesgeneral
Jump to latest
#1Ben
bench@silentmedia.com

Hi everybody,

I'm trying to find a good solution to making backups to tape, where I
want to define "good" as:

- easy to use, like pg_dumpall, BUT
- not in a single file, so I don't backup my entire database cluster
with every differential backup

As I understand my backup program (Bacula) if a file changes at all
between differential backups then it gets backed up again in its
entirety. That seems pretty reasonable. So now I'm trying to figure
out how to get my postgres dump to end up in files in such a way that
little change in data means few file changes. But if there's no
native tool to do that (and it seems like there isn't) then setting
up something like that sounds like it might be a pain, as would
restoring from it.

Am I going about this the wrong way? Would it just be easier to do a
full pg_dumpall for my full backups and then build up a list of WAL
files with each differential? How do other people do it?

#2Shoaib Mir
shoaibmir@gmail.com
In reply to: Ben (#1)
Re: tape backups

I think you might want to do incremental backups so a better approach to
that as you mentioned too will be WAL files. For details you can refer to
--> http://www.postgresql.org/docs/current/static/continuous-archiving.html

--------------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

Show quoted text

On 12/23/06, Ben <bench@silentmedia.com> wrote:

Hi everybody,

I'm trying to find a good solution to making backups to tape, where I
want to define "good" as:

- easy to use, like pg_dumpall, BUT
- not in a single file, so I don't backup my entire database cluster
with every differential backup

As I understand my backup program (Bacula) if a file changes at all
between differential backups then it gets backed up again in its
entirety. That seems pretty reasonable. So now I'm trying to figure
out how to get my postgres dump to end up in files in such a way that
little change in data means few file changes. But if there's no
native tool to do that (and it seems like there isn't) then setting
up something like that sounds like it might be a pain, as would
restoring from it.

Am I going about this the wrong way? Would it just be easier to do a
full pg_dumpall for my full backups and then build up a list of WAL
files with each differential? How do other people do it?

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

#3Ben
bench@silentmedia.com
In reply to: Shoaib Mir (#2)
Re: tape backups

Thanks for the pointer. This does look like what I want, because in
retrospect I don't know how I would know which wal logs I would start
to replay after a given pg_dumpall to bring myself up to the present
after a recovery.

But, this page confuses me when it talks about pg_start_backup and
pg_stop_backup. What do these functions do? It seems like they do
nothing more than let me know which wal files were in use over the
duration of the backup, which is certainly useful. But they do NOT
seem to freeze the actual data files, and it seems to me that because
the data files won't be archived atomically while they may be
changing, that I might end up with corrupted data files that a replay
of wal files wouldn't correct. Is my fear groundless?

On Dec 23, 2006, at 10:20 AM, Shoaib Mir wrote:

Show quoted text

I think you might want to do incremental backups so a better
approach to that as you mentioned too will be WAL files. For
details you can refer to --> http://www.postgresql.org/docs/current/
static/continuous-archiving.html

--------------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 12/23/06, Ben <bench@silentmedia.com> wrote:
Hi everybody,

I'm trying to find a good solution to making backups to tape, where I
want to define "good" as:

- easy to use, like pg_dumpall, BUT
- not in a single file, so I don't backup my entire database cluster
with every differential backup

As I understand my backup program (Bacula) if a file changes at all
between differential backups then it gets backed up again in its
entirety. That seems pretty reasonable. So now I'm trying to figure
out how to get my postgres dump to end up in files in such a way that
little change in data means few file changes. But if there's no
native tool to do that (and it seems like there isn't) then setting
up something like that sounds like it might be a pain, as would
restoring from it.

Am I going about this the wrong way? Would it just be easier to do a
full pg_dumpall for my full backups and then build up a list of WAL
files with each differential? How do other people do it?

---------------------------(end of
broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ben (#3)
Re: tape backups

Ben <bench@silentmedia.com> writes:

But, this page confuses me when it talks about pg_start_backup and
pg_stop_backup. What do these functions do? It seems like they do
nothing more than let me know which wal files were in use over the
duration of the backup, which is certainly useful. But they do NOT
seem to freeze the actual data files, and it seems to me that because
the data files won't be archived atomically while they may be
changing, that I might end up with corrupted data files that a replay
of wal files wouldn't correct. Is my fear groundless?

Yes. The reason we don't have to freeze the data files during a backup
is that any page that changes within that interval will be rewritten
anyway when the WAL log is replayed during recovery. This is why the
WAL sequence has to start before the pg_start_backup rather than at some
later point --- that overlap is exactly what makes it safe to not freeze
the data files.

regards, tom lane

#5Ben
bench@silentmedia.com
In reply to: Tom Lane (#4)
Re: tape backups

Ah, got it. Thanks!

On Dec 23, 2006, at 5:59 PM, Tom Lane wrote:

Show quoted text

Ben <bench@silentmedia.com> writes:

But, this page confuses me when it talks about pg_start_backup and
pg_stop_backup. What do these functions do? It seems like they do
nothing more than let me know which wal files were in use over the
duration of the backup, which is certainly useful. But they do NOT
seem to freeze the actual data files, and it seems to me that because
the data files won't be archived atomically while they may be
changing, that I might end up with corrupted data files that a replay
of wal files wouldn't correct. Is my fear groundless?

Yes. The reason we don't have to freeze the data files during a
backup
is that any page that changes within that interval will be rewritten
anyway when the WAL log is replayed during recovery. This is why the
WAL sequence has to start before the pg_start_backup rather than at
some
later point --- that overlap is exactly what makes it safe to not
freeze
the data files.

regards, tom lane

---------------------------(end of
broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match