a few questions on backup

Started by Marco Colomboalmost 19 years ago10 messagesgeneral
Jump to latest
#1Marco Colombo
pgsql@esiway.net

Hello,
I have a few questions on backuping a PostgreSQL server (lets say
anything 8.x.x). I've read "Continuous Archiving and Point-In-Time
Recovery (PITR)" in the manual I'm still missing something...well
actually I think I don't but I've been debating on this with a friend
for a while, and there's something we don't seem to agree on, so I need
some bits of clarification. :)

Ok, let's say what I really need is "poor man's" PITR. That is, I
perform only one daily full backup, and I need to be able to restore the
database contents as they were at backup time. I don't even need to know
the exact time (yes, that means I don't really care about which
transactions turn out to be committed and which don't). Mmm, ok that
can't be rightfully called PITR at all, maybe.

Am I right in assuming that the following procedure is ok?

1) issue pg_start_backup();
2) copy (or tar or cpio) the data dir, w/o pg_xlog/
3) issue pg_stop_backup();
4) copy (or tar or cpio) pg_xlog/ contents.

That's all. Please note that I'm doing NO WAL archiving.

Whether this is going to work or not is based on what exactly
pg_start_backup() does. According to him (my friend), it may simply
prevent PostgreSQL from writing to data files until pg_stop_backup() is
issued, thus kind of enforcing a snapshot on the data files themselves.
Changes go to WAL only, and they are "played" on data files only after
the pg_stop_backup(). Hmm, I don't really like that. But if so, I don't
even need step 4), since the tar backup would be consistent. I'm able to
restore from that only, and data are those at the time of pg_start_backup().

Or, (that's me) pg_start_backup() only ensures that full_page_writes is
enabled during the backup phase, so that any change to data files can be
undone/redone based on the contents of WAL segments. The tar archive
contains potentially inconsistant data, that's why I need WAL segments
too. At restore time, PG eventually performs a crash recovery, and data
are those at the time of step 4) (which is an interval really, some time
between the start and the end of the copy, but I don't need to be able
to tell the exact time anyway).

BTW, I see a vulnerability in the above procedure... if there's enough
write activity so that PostgreSQL recycles WAL segments between 1) and
4), some changes may be lost. At step 4) I need to save _all_ WAL
segments that have been produced during the backup time. I assume that
activity at backup time is low enough that the event is "extremely
unlikely". Another doubt I have is about WAL checkpointing... is it
possible that a checkpoint happens during step 2), and the tar archive
gets data files both from before and from after the checkpoint (which I
think is bad) or does pg_start_backup() prevent WAL checkpointing, too?

Finally, if I'm missing something and the above is wrong, I think that
the only way to perform a full backup on a live database at filesystem
level, is to enable WAL archiving as the first step of the backup
procedure (assuming it's not usually on, of course), and later save all
the WAL segments that were _archived_ during that time, including the
one made at pg_stop_backup(), on the same backup medium.

If I understand the documentation right, nothing breaks if the
archive_command just returns OK w/o saving anything during regular
operation, and starts saving segments only during the backup time... I
mean, that prevents me from doing arbitrary PITR after the backup, but
all I want to do is performing a full backup, w/o being able to do any
partial backup after that. If so, I may write a script that does nothing
most of the time, and archives WAL segments only to be included in the
full backup.

TIA, (and sorry if the above sounds a bit messy)
.TM.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marco Colombo (#1)
Re: a few questions on backup

Marco Colombo <pgsql@esiway.net> writes:

Am I right in assuming that the following procedure is ok?

1) issue pg_start_backup();
2) copy (or tar or cpio) the data dir, w/o pg_xlog/
3) issue pg_stop_backup();
4) copy (or tar or cpio) pg_xlog/ contents.

No. You have to have an actual archive_command script copying the WAL
segments somewhere else when told to. An asynchronous copy of the xlog
directory will be nothing but garbage, because we recycle WAL segments
as fast as we can (ie, as soon as the archive_command claims to have
saved the data).

1) 2) and 3) are OK, but you need to use archive_command to collect the
xlog segments.

Actually ... given your low requirements, I wonder why you don't just
stop the postmaster, tar the datadir, start the postmaster.

regards, tom lane

#3Hannes Dorbath
light@theendofthetunnel.de
In reply to: Marco Colombo (#1)
Re: a few questions on backup

On 14.05.2007 16:54, Marco Colombo wrote:

I have a few questions on backuping a PostgreSQL server (lets say
anything 8.x.x). I've read "Continuous Archiving and Point-In-Time
Recovery (PITR)" in the manual I'm still missing something...well
actually I think I don't but I've been debating on this with a friend
for a while, and there's something we don't seem to agree on, so I need
some bits of clarification. :)

Ok, let's say what I really need is "poor man's" PITR. That is, I
perform only one daily full backup, and I need to be able to restore the
database contents as they were at backup time. I don't even need to know
the exact time (yes, that means I don't really care about which
transactions turn out to be committed and which don't). Mmm, ok that
can't be rightfully called PITR at all, maybe.

Am I right in assuming that the following procedure is ok?

1) issue pg_start_backup();
2) copy (or tar or cpio) the data dir, w/o pg_xlog/
3) issue pg_stop_backup();
4) copy (or tar or cpio) pg_xlog/ contents.

That's all. Please note that I'm doing NO WAL archiving.

Whether this is going to work or not is based on what exactly
pg_start_backup() does. According to him (my friend), it may simply
prevent PostgreSQL from writing to data files until pg_stop_backup() is
issued, thus kind of enforcing a snapshot on the data files themselves.
Changes go to WAL only, and they are "played" on data files only after
the pg_stop_backup(). Hmm, I don't really like that. But if so, I don't
even need step 4), since the tar backup would be consistent. I'm able to
restore from that only, and data are those at the time of pg_start_backup().

Or, (that's me) pg_start_backup() only ensures that full_page_writes is
enabled during the backup phase, so that any change to data files can be
undone/redone based on the contents of WAL segments. The tar archive
contains potentially inconsistant data, that's why I need WAL segments
too. At restore time, PG eventually performs a crash recovery, and data
are those at the time of step 4) (which is an interval really, some time
between the start and the end of the copy, but I don't need to be able
to tell the exact time anyway).

BTW, I see a vulnerability in the above procedure... if there's enough
write activity so that PostgreSQL recycles WAL segments between 1) and
4), some changes may be lost. At step 4) I need to save _all_ WAL
segments that have been produced during the backup time. I assume that
activity at backup time is low enough that the event is "extremely
unlikely". Another doubt I have is about WAL checkpointing... is it
possible that a checkpoint happens during step 2), and the tar archive
gets data files both from before and from after the checkpoint (which I
think is bad) or does pg_start_backup() prevent WAL checkpointing, too?

Finally, if I'm missing something and the above is wrong, I think that
the only way to perform a full backup on a live database at filesystem
level, is to enable WAL archiving as the first step of the backup
procedure (assuming it's not usually on, of course), and later save all
the WAL segments that were _archived_ during that time, including the
one made at pg_stop_backup(), on the same backup medium.

If I understand the documentation right, nothing breaks if the
archive_command just returns OK w/o saving anything during regular
operation, and starts saving segments only during the backup time... I
mean, that prevents me from doing arbitrary PITR after the backup, but
all I want to do is performing a full backup, w/o being able to do any
partial backup after that. If so, I may write a script that does nothing
most of the time, and archives WAL segments only to be included in the
full backup.

Are file system snapshots not the best approach for your requirements?

lvcreate -s -L5G -nbackup /dev/foo/postgresql
mount /dev/foo/backup /mnt/backup-snap
tar jcpvf pg-backup-<time_stamp>.bz2 /mnt/backup-snap

You can't do much wrong with that, it's fast and easy to use.

Just my 2cent..

--
Regards,
Hannes Dorbath

#4Marco Colombo
pgsql@esiway.net
In reply to: Tom Lane (#2)
Re: a few questions on backup

Tom Lane wrote:

No. You have to have an actual archive_command script copying the WAL
segments somewhere else when told to. An asynchronous copy of the xlog
directory will be nothing but garbage, because we recycle WAL segments
as fast as we can (ie, as soon as the archive_command claims to have
saved the data).

Mmm, sorry I'm not sure I'm following here. Maybe I should provide some
background. In my pg_xlog directory I see five files, WAL segments, I
suppose. Only one (as I expected) is begin currently used, the others
are old (one a couple of days old).

When PG performs a switch from one segment to another one (I assume it
recycles the oldest available), does it archive the recycled one (before
starting using it of course) or the just-filled one? If it's the one
being recycled, it means that in my setup it would takes two days to
archive a segment since it stopped being used. Am I missing something?

1) 2) and 3) are OK, but you need to use archive_command to collect the
xlog segments.

Actually ... given your low requirements, I wonder why you don't just
stop the postmaster, tar the datadir, start the postmaster.

Well, currently we do a pg_dump. The database mainly supports dynamic
websites. It's very unlikely they get updated at the time the backup
runs, and overall there is little updating even during the day, but I
don't like stopping the postmaster because, even if the write load is
negligible, the read one might be not. It's still small enough that a
tar (to disk) might take only a minute or two to complete, but yet it's
a minute of downtime for the web sites. If I can avoid that, why not?

I'm not unsatisfied with pg_dump, and I agree that with my requirements
the whole issue is accademic. I just wanted to learn how it works
exactly, such knowledge could provide useful for doing the Right Thing
in case of troubles. Maybe it's the right time for me to have a look at
the source...

Hannes Dorbath wrote:

lvcreate -s -L5G -nbackup /dev/foo/postgresql
mount /dev/foo/backup /mnt/backup-snap
tar jcpvf pg-backup-<time_stamp>.bz2 /mnt/backup-snap

You can't do much wrong with that, it's fast and easy to use.

Been there, done that. In my environment (Fedora Core 6) it's fast and
easy, but not reliable, unfortunately. Sometimes the snapshot won't get
created, sometimes it won't get removed after the backup is done.

.TM.

#5Richard Huxton
dev@archonet.com
In reply to: Marco Colombo (#4)
Re: a few questions on backup

Marco Colombo wrote:

Mmm, sorry I'm not sure I'm following here. Maybe I should provide some
background. In my pg_xlog directory I see five files, WAL segments, I
suppose. Only one (as I expected) is begin currently used, the others
are old (one a couple of days old).

When PG performs a switch from one segment to another one (I assume it
recycles the oldest available), does it archive the recycled one (before
starting using it of course) or the just-filled one? If it's the one
being recycled, it means that in my setup it would takes two days to
archive a segment since it stopped being used. Am I missing something?

It calls archive_command on the just-filled one.

1) 2) and 3) are OK, but you need to use archive_command to collect the
xlog segments.

Actually ... given your low requirements, I wonder why you don't just
stop the postmaster, tar the datadir, start the postmaster.

Well, currently we do a pg_dump. The database mainly supports dynamic
websites. It's very unlikely they get updated at the time the backup
runs,

You do know that pg_dump gives you a guaranteed accurate snapshot of the
database? It doesn't matter if it's in use.

--
Richard Huxton
Archonet Ltd

#6Prashant Ranjalkar
prashant.ranjalkar@gmail.com
In reply to: Marco Colombo (#1)
Re: a few questions on backup

Hi,

The procedure you followed is for online backups. The backups are useless
unless you set archive_command in your postgresql.conf file. This command
will copy the filled transaction log to a directory where you specified in
your archive_command. The PG won't write to transaction logs unless it takes
the copy of the filled transaction log. These copied transaction logs are
used during the recovery process for roll forwarding.

If you don't set the archive_command then you have to go for cold backups
where database will be stopped and data directory is backed up.

Hope this helps.

Regards,
Prashant Ranjalkar
EnterpriseDB Corporation

Show quoted text

On 5/14/07, Marco Colombo <pgsql@esiway.net> wrote:

Hello,
I have a few questions on backuping a PostgreSQL server (lets say
anything 8.x.x). I've read "Continuous Archiving and Point-In-Time
Recovery (PITR)" in the manual I'm still missing something...well
actually I think I don't but I've been debating on this with a friend
for a while, and there's something we don't seem to agree on, so I need
some bits of clarification. :)

Ok, let's say what I really need is "poor man's" PITR. That is, I
perform only one daily full backup, and I need to be able to restore the
database contents as they were at backup time. I don't even need to know
the exact time (yes, that means I don't really care about which
transactions turn out to be committed and which don't). Mmm, ok that
can't be rightfully called PITR at all, maybe.

Am I right in assuming that the following procedure is ok?

1) issue pg_start_backup();
2) copy (or tar or cpio) the data dir, w/o pg_xlog/
3) issue pg_stop_backup();
4) copy (or tar or cpio) pg_xlog/ contents.

That's all. Please note that I'm doing NO WAL archiving.

Whether this is going to work or not is based on what exactly
pg_start_backup() does. According to him (my friend), it may simply
prevent PostgreSQL from writing to data files until pg_stop_backup() is
issued, thus kind of enforcing a snapshot on the data files themselves.
Changes go to WAL only, and they are "played" on data files only after
the pg_stop_backup(). Hmm, I don't really like that. But if so, I don't
even need step 4), since the tar backup would be consistent. I'm able to
restore from that only, and data are those at the time of
pg_start_backup().

Or, (that's me) pg_start_backup() only ensures that full_page_writes is
enabled during the backup phase, so that any change to data files can be
undone/redone based on the contents of WAL segments. The tar archive
contains potentially inconsistant data, that's why I need WAL segments
too. At restore time, PG eventually performs a crash recovery, and data
are those at the time of step 4) (which is an interval really, some time
between the start and the end of the copy, but I don't need to be able
to tell the exact time anyway).

BTW, I see a vulnerability in the above procedure... if there's enough
write activity so that PostgreSQL recycles WAL segments between 1) and
4), some changes may be lost. At step 4) I need to save _all_ WAL
segments that have been produced during the backup time. I assume that
activity at backup time is low enough that the event is "extremely
unlikely". Another doubt I have is about WAL checkpointing... is it
possible that a checkpoint happens during step 2), and the tar archive
gets data files both from before and from after the checkpoint (which I
think is bad) or does pg_start_backup() prevent WAL checkpointing, too?

Finally, if I'm missing something and the above is wrong, I think that
the only way to perform a full backup on a live database at filesystem
level, is to enable WAL archiving as the first step of the backup
procedure (assuming it's not usually on, of course), and later save all
the WAL segments that were _archived_ during that time, including the
one made at pg_stop_backup(), on the same backup medium.

If I understand the documentation right, nothing breaks if the
archive_command just returns OK w/o saving anything during regular
operation, and starts saving segments only during the backup time... I
mean, that prevents me from doing arbitrary PITR after the backup, but
all I want to do is performing a full backup, w/o being able to do any
partial backup after that. If so, I may write a script that does nothing
most of the time, and archives WAL segments only to be included in the
full backup.

TIA, (and sorry if the above sounds a bit messy)
.TM.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

#7Marco Colombo
pgsql@esiway.net
In reply to: Richard Huxton (#5)
Re: a few questions on backup

Richard Huxton wrote:

It calls archive_command on the just-filled one.

Good to know, thanks. I think I'll experiment a bit with
archive_command. My point was that since I know (or better assume) that
old segments are going to stay in my pg_xlog for *days* before getting
recycled, just copying them all after the call to backup_stop() should
be enough, in my case. It's more than I need, even.

You do know that pg_dump gives you a guaranteed accurate snapshot of the
database? It doesn't matter if it's in use.

Yes, I know, that's why I'm using it. A dump is also useful in that you
can restore it even in a different system easier. A text dump allows you
to make changes before restoring, even with sed. I'm comfortable with it.

The reasons for playing with WAL segments for backups are:
1) learning how it works;
2) everything else is backed up with cpio;
3) at restore time I need to have PG running already, and there may be
something different at database initialization.

Reason 1) is the driving one, by far. I can handle it for the rest (i.e.
the compressed dump is saved as part of the cpio archive).

.TM.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marco Colombo (#7)
Re: a few questions on backup

Marco Colombo <pgsql@esiway.net> writes:

Good to know, thanks. I think I'll experiment a bit with
archive_command. My point was that since I know (or better assume) that
old segments are going to stay in my pg_xlog for *days* before getting
recycled,

On what do you base that assumption? Once the system thinks they're not
needed anymore, they'll be recycled immediately.

regards, tom lane

#9Marco Colombo
pgsql@esiway.net
In reply to: Tom Lane (#8)
Re: a few questions on backup

Tom Lane wrote:

Marco Colombo <pgsql@esiway.net> writes:

Good to know, thanks. I think I'll experiment a bit with
archive_command. My point was that since I know (or better assume) that
old segments are going to stay in my pg_xlog for *days* before getting
recycled,

On what do you base that assumption? Once the system thinks they're not
needed anymore, they'll be recycled immediately.

regards, tom lane

Well now that you make me think of it, I do make some assumptions. One
is that only one file in pg_xlog is the active segment. Two is that I
can trust modification times (so that a file inside pg_xlog that looks
old is actually old... and since postgresql does not run as root, it
couldn't cheat on that even if it tried to).

The best thing I can do is to configure archiving, and see what gets
archived exactly. I'm making assumptions there too. I expect for each
file in pg_xlog to find a copy in the archive directory (say archiving
is done with cp), with one exception, the segment currently beeing
written to. There will be a file with the same name but different
contents (and older modification time).

I'll try that out. Maybe my ideas are so far from the truth that I'm
having a hard time in explaing them to people who actually know how
things work. I'll be back with results. Meanwhile, thanks for your time.

.TM.

#10Marco Colombo
marco@esi.it
In reply to: Marco Colombo (#9)
Re: a few questions on backup

Marco Colombo wrote:

I'll try that out. Maybe my ideas are so far from the truth that I'm
having a hard time in explaing them to people who actually know how
things work. I'll be back with results. Meanwhile, thanks for your time.

I think I finally got it.

Segment 34 in my pg_xlog got archived and recycled. It became segment
39, but PG is still working on segment 35, after some hours. Now pg_xlog
contains 5 segments, from named from 35 to 39, 35 being the most
recently modified. 39 won't be used yet for about a couple of days.

Now I see what you mean for "recycled": I thought it meant "marked free
for later use", but it means "renamed for future use". My mistake was
assuming that the rename part happens lazily when PG starts using the
file. Instead, it happens right after (the eventual) archiving.

That makes the strategy in my original post somehow unfeasable. Still, I
was not completely wrong:

# cmp /var/lib/pgsql/data/pg_xlog/000000010000000100000039
/u1/pg_wal_archive/000000010000000100000034 && echo "Yes"
Yes

They do contain the same data, that of segment 34, and the *39 file will
stay there, untouched, for quite a while after the backup. So the WAL
segment I need *is* there, just with a different name.

The only problem is figuring out what segment that data actually belongs
to. I know only because I can compare it with the archived one. Now, I
could still make some educated guesses, by looking at modification
times, but definitely a guessing game is not something you want to play
when restoring your precious data. :) Archiving the WAL segments and
letting the recovery procedure handle them at restore time is easier anyway.

Again, thanks a lot.

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo@ESI.it