"Resurrected" data files - problem?

Started by Laurenz Albeover 18 years ago15 messagesgeneral
Jump to latest
#1Laurenz Albe
laurenz.albe@cybertec.at

We use a tape backup software that does "incremental backups"
as follows:

- In a full backup, all files are backed up.
- In an incremental backup, only the files with modification
date after the last backup are backed up.

Now when such a backup is restored, you first have to restore
the full backup, and then the incremental backup.

The problem is that files which were deleted between the full
and the incremental backup will get "resurrected" after such a
restore.

So if we perform our database backups with incremental
backups as described above, we could end up with additional
files after the restore, because PostgreSQL files can get
deleted (e.g. during DROP TABLE or TRUNCATE TABLE).

My question is:

Could such "resurrected" files (data files, files in
pg_xlog, pg_clog or elsewhere) cause a problem for the database
(other than the obvious one that there may be unnecessary files
about that consume disk space)?

Yours,
Laurenz Albe

#2Peter Childs
peterachilds@gmail.com
In reply to: Laurenz Albe (#1)
Re: "Resurrected" data files - problem?

On 08/11/2007, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

We use a tape backup software that does "incremental backups"
as follows:

- In a full backup, all files are backed up.
- In an incremental backup, only the files with modification
date after the last backup are backed up.

Now when such a backup is restored, you first have to restore
the full backup, and then the incremental backup.

The problem is that files which were deleted between the full
and the incremental backup will get "resurrected" after such a
restore.

So if we perform our database backups with incremental
backups as described above, we could end up with additional
files after the restore, because PostgreSQL files can get
deleted (e.g. during DROP TABLE or TRUNCATE TABLE).

My question is:

Could such "resurrected" files (data files, files in
pg_xlog, pg_clog or elsewhere) cause a problem for the database
(other than the obvious one that there may be unnecessary files
about that consume disk space)?

This will not work at all.

Try re-reading the instructions on backup in the manual.

oh and always, always, always test your backup works before you actually
need it!

Peter Childs

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Childs (#2)
Re: "Resurrected" data files - problem?

"Peter Childs" <peterachilds@gmail.com> writes:

On 08/11/2007, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

So if we perform our database backups with incremental
backups as described above, we could end up with additional
files after the restore, because PostgreSQL files can get
deleted (e.g. during DROP TABLE or TRUNCATE TABLE).

Could such "resurrected" files (data files, files in
pg_xlog, pg_clog or elsewhere) cause a problem for the database
(other than the obvious one that there may be unnecessary files
about that consume disk space)?

This will not work at all.

To be more specific: the resurrected files aren't the problem;
offhand I see no reason they'd create any issue beyond wasted
disk space. The problem is version skew between files that were
backed up at slightly different times, leading to inconsistency.

You could make this work if you shut down Postgres whenever you
are taking a backup, but as a means for backing up a live database
it indeed won't work at all.

regards, tom lane

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Peter Childs (#2)
Re: "Resurrected" data files - problem?

Peter Childs wrote:

We use a tape backup software that does "incremental backups"

[...]

So if we perform our database backups with incremental
backups as described above, we could end up with additional
files after the restore, because PostgreSQL files can get
deleted (e.g. during DROP TABLE or TRUNCATE TABLE).

My question is:

Could such "resurrected" files (data files, files in
pg_xlog, pg_clog or elsewhere) cause a problem for the database
(other than the obvious one that there may be unnecessary files
about that consume disk space)?

This will not work at all.

Can you give me a good reason why?

Try re-reading the instructions on backup in the manual.

I know them well. That is why I ask if this questionable procedure
could lead to damage.

Yours,
Laurenz Albe

#5Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tom Lane (#3)
Re: "Resurrected" data files - problem?

Tom Lane wrote:

So if we perform our database backups with incremental
backups as described above, we could end up with additional
files after the restore, because PostgreSQL files can get
deleted (e.g. during DROP TABLE or TRUNCATE TABLE).

Could such "resurrected" files (data files, files in
pg_xlog, pg_clog or elsewhere) cause a problem for the database
(other than the obvious one that there may be unnecessary files
about that consume disk space)?

This will not work at all.

To be more specific: the resurrected files aren't the problem;
offhand I see no reason they'd create any issue beyond wasted
disk space. The problem is version skew between files that were
backed up at slightly different times, leading to inconsistency.

I should have mentioned that before the (incremental) backup
there would be a pg_start_backup() and a pg_stop_backup()
afterwards, and we would use PITR.

So there could only be three kinds of files:
- Files that did not change since the full backup, restored
from there. They should therefore look exactly as if the
online backup were performed in the normal way.
- Files that have changed or are new, restored from the
incremental backup. These will also be ok, because
they were backed up between pg_start_backup() and
pg_stop_backup().
- Files that have been deleted between full and incremental
backup and have been resurrected.

This third group is the only one which might be problematic,
as far as I can see, because PostgreSQL will no expect them to
be there.

The version skew between files backed up at slightly different
times should be taken care of by PITR, shouldn't it?

Yours,
Laurenz Albe

#6Scott Marlowe
scott.marlowe@gmail.com
In reply to: Tom Lane (#3)
Re: "Resurrected" data files - problem?

On Nov 8, 2007 9:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Peter Childs" <peterachilds@gmail.com> writes:

On 08/11/2007, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

So if we perform our database backups with incremental
backups as described above, we could end up with additional
files after the restore, because PostgreSQL files can get
deleted (e.g. during DROP TABLE or TRUNCATE TABLE).

Could such "resurrected" files (data files, files in
pg_xlog, pg_clog or elsewhere) cause a problem for the database
(other than the obvious one that there may be unnecessary files
about that consume disk space)?

This will not work at all.

To be more specific: the resurrected files aren't the problem;
offhand I see no reason they'd create any issue beyond wasted
disk space. The problem is version skew between files that were
backed up at slightly different times, leading to inconsistency.

You could make this work if you shut down Postgres whenever you
are taking a backup, but as a means for backing up a live database
it indeed won't work at all.

I think if you had real snapshotting file systems you could use the
snapshots to create your backups. But this seems like a lot of work
to avoid implementing PITR to me.

#7Alan Hodgson
ahodgson@simkin.ca
In reply to: Laurenz Albe (#4)
Re: "Resurrected" data files - problem?

On Thursday 08 November 2007, "Albe Laurenz" <laurenz.albe@wien.gv.at>
wrote:

Can you give me a good reason why?

Try re-reading the instructions on backup in the manual.

I know them well. That is why I ask if this questionable procedure
could lead to damage.

You cannot backup a live database with a filesystem backup and expect it to
work afterwards, unless you take all the steps required to produce a PITR
base backup.

It's not even "questionable". It's a fundamental misunderstanding of what is
required to backup a database.

--
Peak Oil is now accepted as inevitable, and the debate only becomes as
to when - James Schlesinger, former US Secretary of Energy

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Laurenz Albe (#5)
Re: "Resurrected" data files - problem?

On Thu, 2007-11-08 at 17:11 +0100, Albe Laurenz wrote:

Tom Lane wrote:

So if we perform our database backups with incremental
backups as described above, we could end up with additional
files after the restore, because PostgreSQL files can get
deleted (e.g. during DROP TABLE or TRUNCATE TABLE).

Could such "resurrected" files (data files, files in
pg_xlog, pg_clog or elsewhere) cause a problem for the database
(other than the obvious one that there may be unnecessary files
about that consume disk space)?

This will not work at all.

To be more specific: the resurrected files aren't the problem;
offhand I see no reason they'd create any issue beyond wasted
disk space. The problem is version skew between files that were
backed up at slightly different times, leading to inconsistency.

I should have mentioned that before the (incremental) backup
there would be a pg_start_backup() and a pg_stop_backup()
afterwards, and we would use PITR.

So there could only be three kinds of files:
- Files that did not change since the full backup, restored
from there. They should therefore look exactly as if the
online backup were performed in the normal way.
- Files that have changed or are new, restored from the
incremental backup. These will also be ok, because
they were backed up between pg_start_backup() and
pg_stop_backup().
- Files that have been deleted between full and incremental
backup and have been resurrected.

This third group is the only one which might be problematic,
as far as I can see, because PostgreSQL will no expect them to
be there.

The version skew between files backed up at slightly different
times should be taken care of by PITR, shouldn't it?

The backup is not instantaneous, so there is no single time at which the
hot backup takes place. So deciding whether a file has changed based
upon a comparison of two file timestamps cannot work. You would need to
take timestamps for the file both before the pg_start_backup() and after
the pg_stop_backup() of the file for both full and incremental backups.
If all four timestamps are equivalent, then you are safe.

The relfilenode ids are potentially reused after a period of time, so
that could cause errors if not catered for on the incremental restore.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#9Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Simon Riggs (#8)
Re: "Resurrected" data files - problem?

So if we perform our database backups with incremental
backups as described above, we could end up with additional
files after the restore, because PostgreSQL files can get
deleted (e.g. during DROP TABLE or TRUNCATE TABLE).

Could such "resurrected" files (data files, files in
pg_xlog, pg_clog or elsewhere) cause a problem for the database
(other than the obvious one that there may be unnecessary files
about that consume disk space)?

This will not work at all.

To be more specific: the resurrected files aren't the problem;
offhand I see no reason they'd create any issue beyond wasted
disk space. The problem is version skew between files that were
backed up at slightly different times, leading to inconsistency.

I should have mentioned that before the (incremental) backup
there would be a pg_start_backup() and a pg_stop_backup()
afterwards, and we would use PITR.

So there could only be three kinds of files:
- Files that did not change since the full backup, restored
from there. They should therefore look exactly as if the
online backup were performed in the normal way.
- Files that have changed or are new, restored from the
incremental backup. These will also be ok, because
they were backed up between pg_start_backup() and
pg_stop_backup().
- Files that have been deleted between full and incremental
backup and have been resurrected.

This third group is the only one which might be problematic,
as far as I can see, because PostgreSQL will no expect them to
be there.

The version skew between files backed up at slightly different
times should be taken care of by PITR, shouldn't it?

The backup is not instantaneous, so there is no single time
at which the hot backup takes place. So deciding whether
a file has changed based upon a comparison of two file timestamps
cannot work. You would need to take timestamps for the file both
before the pg_start_backup() and after the pg_stop_backup()
of the file for both full and incremental backups.
If all four timestamps are equivalent, then you are safe.

I am afraid that there is still a misunderstanding. The
procedure would be as follows:

The backup:

- pg_start_backup()
- full backup of the PostgreSQL files
- pg_stop_backup()

Next day:

- pg_start_backup()
- backup of the files that have changed since the last backup
- pg_stop_backup()

The recovery:

- restore files from the full backup
- restore files from the incremental backup
- create recovery.conf and start the server

With a normal online backup, the backup also does not take
place at a single time. Why is that no problem there, but is
a problem here?

The relfilenode ids are potentially reused after a period of time, so
that could cause errors if not catered for on the incremental restore.

That was my original fear.

What happens if PostgreSQL tries to reuse a relfilenode and the file
is already present? Will the database crash or report an error?

Yours,
Laurenz Albe

#10Simon Riggs
simon@2ndQuadrant.com
In reply to: Laurenz Albe (#9)
Re: "Resurrected" data files - problem?

On Fri, 2007-11-09 at 09:24 +0100, Albe Laurenz wrote:

So if we perform our database backups with incremental
backups as described above, we could end up with additional
files after the restore, because PostgreSQL files can get
deleted (e.g. during DROP TABLE or TRUNCATE TABLE).

Could such "resurrected" files (data files, files in
pg_xlog, pg_clog or elsewhere) cause a problem for the database
(other than the obvious one that there may be unnecessary files
about that consume disk space)?

This will not work at all.

To be more specific: the resurrected files aren't the problem;
offhand I see no reason they'd create any issue beyond wasted
disk space. The problem is version skew between files that were
backed up at slightly different times, leading to inconsistency.

I should have mentioned that before the (incremental) backup
there would be a pg_start_backup() and a pg_stop_backup()
afterwards, and we would use PITR.

So there could only be three kinds of files:
- Files that did not change since the full backup, restored
from there. They should therefore look exactly as if the
online backup were performed in the normal way.
- Files that have changed or are new, restored from the
incremental backup. These will also be ok, because
they were backed up between pg_start_backup() and
pg_stop_backup().
- Files that have been deleted between full and incremental
backup and have been resurrected.

This third group is the only one which might be problematic,
as far as I can see, because PostgreSQL will no expect them to
be there.

The version skew between files backed up at slightly different
times should be taken care of by PITR, shouldn't it?

The backup is not instantaneous, so there is no single time
at which the hot backup takes place. So deciding whether
a file has changed based upon a comparison of two file timestamps
cannot work. You would need to take timestamps for the file both
before the pg_start_backup() and after the pg_stop_backup()
of the file for both full and incremental backups.
If all four timestamps are equivalent, then you are safe.

I am afraid that there is still a misunderstanding. The
procedure would be as follows:

The backup:

time1

- pg_start_backup()
- full backup of the PostgreSQL files
- pg_stop_backup()

time2

Next day:

time3

- pg_start_backup()
- backup of the files that have changed since the last backup
- pg_stop_backup()

time4

The recovery:

- restore files from the full backup
- restore files from the incremental backup
- create recovery.conf and start the server

Yes, I understood that.

With a normal online backup, the backup also does not take
place at a single time.

;-)

Why is that no problem there, but is
a problem here?

I'm not sure where "there" and "here" are, but it seems you have
misunderstood.

There *is* a problem with normal online backup, as many people have now
explained, though I have shown you a way to do this. Please reread.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#11Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Simon Riggs (#10)
Re: "Resurrected" data files - problem?

Simon Riggs wrote:

So if we perform our database backups with incremental
backups as described above, we could end up with additional
files after the restore, because PostgreSQL files can get
deleted (e.g. during DROP TABLE or TRUNCATE TABLE).

Could such "resurrected" files (data files, files in
pg_xlog, pg_clog or elsewhere) cause a problem for
the database?

The backup is not instantaneous, so there is no single time
at which the hot backup takes place. So deciding whether
a file has changed based upon a comparison of two file timestamps
cannot work. You would need to take timestamps for the file both
before the pg_start_backup() and after the pg_stop_backup()
of the file for both full and incremental backups.
If all four timestamps are equivalent, then you are safe.

I think that understanding is finally dawning here.

The problem you see is that the backup software might decide
that the file has not been changed, skip it and go on backing
up other files, but the file can still be modified before
pg_stop_backup(), correct?

Sorry for being such a blockhead, I can see now why this
is a bad idea.

Thanks for the patience,
Laurenz Albe

#12Simon Riggs
simon@2ndQuadrant.com
In reply to: Laurenz Albe (#11)
Re: "Resurrected" data files - problem?

On Fri, 2007-11-09 at 10:28 +0100, Albe Laurenz wrote:

Simon Riggs wrote:

So if we perform our database backups with incremental
backups as described above, we could end up with additional
files after the restore, because PostgreSQL files can get
deleted (e.g. during DROP TABLE or TRUNCATE TABLE).

Could such "resurrected" files (data files, files in
pg_xlog, pg_clog or elsewhere) cause a problem for
the database?

The backup is not instantaneous, so there is no single time
at which the hot backup takes place. So deciding whether
a file has changed based upon a comparison of two file timestamps
cannot work. You would need to take timestamps for the file both
before the pg_start_backup() and after the pg_stop_backup()
of the file for both full and incremental backups.
If all four timestamps are equivalent, then you are safe.

I think that understanding is finally dawning here.

The problem you see is that the backup software might decide
that the file has not been changed, skip it and go on backing
up other files, but the file can still be modified before
pg_stop_backup(), correct?

Correct.

I doubt whether the backup software performs these checks, but if you
have time please bug your vendor for details. Note that the check must
be performed *after* the backup, to see whether the backup itself is
valid, which is a little strange. That only makes sense when using
frozen snapshot techniques.

Any information about specific vendors interface is always welcome.
Speaking to them directly hasn't been very effective, but when there is
a paying customer they are much more responsive.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#12)
Re: "Resurrected" data files - problem?

Simon Riggs <simon@2ndquadrant.com> writes:

On Fri, 2007-11-09 at 10:28 +0100, Albe Laurenz wrote:

I think that understanding is finally dawning here.

The problem you see is that the backup software might decide
that the file has not been changed, skip it and go on backing
up other files, but the file can still be modified before
pg_stop_backup(), correct?

Correct.

Surely that's nonsense --- otherwise a time-extended base backup
could not work either.

What is required of the filesystem backup process is that each 8K page
of each file be restored to a state that it had at some time between
pg_start_backup and pg_stop_backup. The exact time can be different for
different pages. I don't see a reason to think that a base+incremental
backup method can't meet that requirement.

regards, tom lane

#14Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tom Lane (#13)
Re: "Resurrected" data files - problem?

Tom Lane wrote:

The problem you see is that the backup software might decide
that the file has not been changed, skip it and go on backing
up other files, but the file can still be modified before
pg_stop_backup(), correct?

Correct.

Surely that's nonsense --- otherwise a time-extended base backup
could not work either.

What is required of the filesystem backup process is that each 8K page
of each file be restored to a state that it had at some time between
pg_start_backup and pg_stop_backup. The exact time can be
different for different pages. I don't see a reason to think that a
base+incremental backup method can't meet that requirement.

Ho hum, you're right.
In the scenario on top, the file that was not backed up during
incremental backup will - after restore of full and incremental
backup - be in a state that was valid at some time between
pg_start_backup and pg_stop_backup.

So that should work.

What remains are the resurrected files. You said that they shouldn't
be a problem, right? What happens if PostgreSQL wants to create a
new data file and that file already exists? Could it lead to
problems in other directories like pg_clog?

Thank you all for your replies, BTW.

Yours,
Laurenz Albe

#15Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#13)
Re: "Resurrected" data files - problem?

On Fri, 2007-11-09 at 10:59 -0500, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

On Fri, 2007-11-09 at 10:28 +0100, Albe Laurenz wrote:

I think that understanding is finally dawning here.

The problem you see is that the backup software might decide
that the file has not been changed, skip it and go on backing
up other files, but the file can still be modified before
pg_stop_backup(), correct?

Correct.

Surely that's nonsense --- otherwise a time-extended base backup
could not work either.

What is required of the filesystem backup process is that each 8K page
of each file be restored to a state that it had at some time between
pg_start_backup and pg_stop_backup. The exact time can be different for
different pages. I don't see a reason to think that a base+incremental
backup method can't meet that requirement.

Hmm, OK, I do think we can improve on what I said before. What I said
was safe though, not nonsense, plus I think the difference between the
two is a hair's breadth in practice. Most of the time we're taking about
excluding historical data partitions, designed specifically to minimise
backup windows and data maintenance.

The question is which timestamps do you compare in order to arrive at
the maximal set of files that don't need to be backed up twice?

I'm assuming we're talking about starting recovery at the second
pg_start_backup(). Any file that changes during the first backup cannot
be a candidate, so timestamp before first backup must equal timestamp
before second backup. So we can effectively reduce things to just 2
timestamps, not 4 as I had originally said.

So we have:

pg_start_backup()
timestamp1
full backup
pg_stop_backup()

...

pg_start_backup() --- WAL chain starts here
timestamp2
incremental backup
pg_stop_backup()

Any file for which ts1 == ts2 can be skipped during the incremental
backup.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com