backup_label and server start

Started by Albe Laurenzabout 18 years ago13 messages
#1Albe Laurenz
laurenz.albe@wien.gv.at

If the postmaster is stopped with 'pg_ctl stop' while an
online backup is in progress, the 'backup_label' file will remain
in the data directory.

There is no recovery.conf file present.

When the server is started again, it attempts to recover from
the checkpoint marked in the backup_label file even if the
shutdown was clean.

If the WAL file mentioned in backup_label is not in pg_xlog
(it has already been archived and removed because there was
enough database activity since pg_start_backup()), the startup
process will fail with a message like this:

LOG: could not open file "pg_xlog/000000020000000000000084" (log file 0, segment 132): No such file or directory
LOG: invalid checkpoint record
PANIC: could not locate required checkpoint record
HINT: If you are not restoring from a backup, try removing the file "/POSTGRES/data/PG820/backup_label".

My question:
Is it safe to just delete the file as the hint suggests?

I see the following comment in src/backend/access/transam/xlog.c:

/*
* read_backup_label: check to see if a backup_label file is present
*
* If we see a backup_label during recovery, we assume that we are recovering
* from a backup dump file, and we therefore roll forward from the checkpoint
* identified by the label file, NOT what pg_control says. This avoids the
* problem that pg_control might have been archived one or more checkpoints
* later than the start of the dump, and so if we rely on it as the start
* point, we will fail to restore a consistent database state.

"We will fail to restore a consistent database state"
sounds rather intimidating.

*If* - on the other hand - it is safe to follow the hint
and remove the backup_label, wouldn't it be a good thing
for the startup process to ignore (and rename) the backup_label
file if no recovery.conf is present?

Or, alternatively, the backup_label file could by removed by a
clean shutdown.

Thanks,
Laurenz Albe

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Albe Laurenz (#1)
Re: backup_label and server start

"Albe Laurenz" <laurenz.albe@wien.gv.at> writes:

wouldn't it be a good thing
for the startup process to ignore (and rename) the backup_label
file if no recovery.conf is present?

No, it certainly wouldn't.

I don't see why we should simplify the bizarre case you're talking about
at the price of putting land mines under the feet of people who are
actually trying to do a restore. It hasn't lost any data for you,
and it gave you a correct HINT, so I don't have a problem with the
current behavior.

regards, tom lane

#3Simon Riggs
simon@2ndquadrant.com
In reply to: Albe Laurenz (#1)
Re: backup_label and server start

On Tue, 2007-11-20 at 15:19 +0100, Albe Laurenz wrote:

"We will fail to restore a consistent database state"
sounds rather intimidating.

Well, how else should a warning of data loss sound? :-)

It's vaguely possible that the database state could be consistent, if
the server were quiet when you stopped it. But that is unlikely *and*
there is no way of knowing for certain, that is why we introduced
pg_stop_backup() in the first place.

*If* - on the other hand - it is safe to follow the hint
and remove the backup_label, wouldn't it be a good thing
for the startup process to ignore (and rename) the backup_label
file if no recovery.conf is present?

The hint is telling you how to restart the original server, not a crafty
way of cheating the process to allow you to use it for backup.

What are you trying to do?

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#4Albe Laurenz
laurenz.albe@wien.gv.at
In reply to: Tom Lane (#2)
Re: backup_label and server start

If the postmaster is stopped with 'pg_ctl stop' while an
online backup is in progress, the 'backup_label' file will remain
in the data directory.

[...]

the startup process will fail with a message like this:

[...]

PANIC: could not locate required checkpoint record
HINT: If you are not restoring from a backup, try removing the file "/POSTGRES/data/PG820/backup_label".

wouldn't it be a good thing
for the startup process to ignore (and rename) the backup_label
file if no recovery.conf is present?

Tom Lane replied:

No, it certainly wouldn't.

Point taken. When backup_label is present and recovery.conf isn't,
there is the risk that the data directory has been restored from
an online backup, in which case using the latest available
checkpoint would be detrimental.

I don't see why we should simplify the bizarre case you're
talking about

Well, it's not a bizarre case, it has happened twice here.

If somebody stops the postmaster while an online backup is
in progress, there is no warning or nothing. Only the server
will fail to restart.

One of our databases is running in a RedHat cluster, which
in this case cannot failover to another node.
And this can also happen during an online backup.

Simon Riggs replied:

The hint is telling you how to restart the original server, not a crafty
way of cheating the process to allow you to use it for backup.

What are you trying to do?

You misunderstood me, I'm not trying to cheat anything, nor do
I want to restore a backup that way.

All I want to do is restart a server after a clean shutdown.

How about my second suggestion:

Remove backup_label when the server shuts down cleanly.
In that case an online backup in progress will not be useful
anyway, and there is no need to recover on server restart.

What do you think?

Yours,
Laurenz Albe

#5Simon Riggs
simon@2ndquadrant.com
In reply to: Albe Laurenz (#4)
Re: backup_label and server start

On Wed, 2007-11-21 at 09:04 +0100, Albe Laurenz wrote:

If somebody stops the postmaster while an online backup is
in progress, there is no warning or nothing. Only the server
will fail to restart.

Well, it seems best not to do this. There is always a need for a careful
procedure to manually shutdown a live server, interlocking with other
applications. ISTM like a manual procedure will resolve this for you.

If we remove the file in the place you suggest then an Archive Recovery
will succeed when it should fail, with no possibility of a hint, which
seems a worse error.

All I want to do is restart a server after a clean shutdown.

How about my second suggestion:

Remove backup_label when the server shuts down cleanly.
In that case an online backup in progress will not be useful
anyway, and there is no need to recover on server restart.

That will make PITRs fail:

1. pg_start_backup()
2. backup
3. shutdown, removes backup_label
4. pg_stop_backup()

step 4 will now fail because of a missing backup_label file.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#6Peter Childs
peterachilds@gmail.com
In reply to: Simon Riggs (#5)
Re: backup_label and server start

On 21/11/2007, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, 2007-11-21 at 09:04 +0100, Albe Laurenz wrote:

If somebody stops the postmaster while an online backup is
in progress, there is no warning or nothing. Only the server
will fail to restart.

Well, it seems best not to do this. There is always a need for a careful
procedure to manually shutdown a live server, interlocking with other
applications. ISTM like a manual procedure will resolve this for you.

If we remove the file in the place you suggest then an Archive Recovery
will succeed when it should fail, with no possibility of a hint, which
seems a worse error.

All I want to do is restart a server after a clean shutdown.

How about my second suggestion:

Remove backup_label when the server shuts down cleanly.
In that case an online backup in progress will not be useful
anyway, and there is no need to recover on server restart.

That will make PITRs fail:

1. pg_start_backup()
2. backup
3. shutdown, removes backup_label
4. pg_stop_backup()

step 4 will now fail because of a missing backup_label file.

How about this, emit a warning on shutdown and fail to shutdown until the
backup has finished.

Seams to me that either way your sunk if you shut down a server while a
backup is in progress. Your only way out is to work out weather to use the
previous pitr backups plus logs or remove the label. Doing it automatically
would be very very dangerous.

Peter.

#7Simon Riggs
simon@2ndquadrant.com
In reply to: Peter Childs (#6)
Re: backup_label and server start

On Wed, 2007-11-21 at 09:47 +0000, Peter Childs wrote:

How about this, emit a warning on shutdown and fail to shutdown until
the backup has finished.

That would be reasonable for -m smart shutdown.

We would then be treating the backup as a connection.

...but not for a fast shutdown.

Any comments against?

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#8Albe Laurenz
laurenz.albe@wien.gv.at
In reply to: Simon Riggs (#5)
Re: backup_label and server start

Simon Riggs wrote:

If somebody stops the postmaster while an online backup is
in progress, there is no warning or nothing. Only the server
will fail to restart.

Well, it seems best not to do this. There is always a need
for a careful
procedure to manually shutdown a live server, interlocking with other
applications. ISTM like a manual procedure will resolve this for you.

You're arguing that there *should* be a manual intervention
if a server was shutdown while a backup was active.

If we remove the file in the place you suggest then an Archive Recovery
will succeed when it should fail, with no possibility of a hint, which
seems a worse error.

How about my second suggestion:

Remove backup_label when the server shuts down cleanly.
In that case an online backup in progress will not be useful
anyway, and there is no need to recover on server restart.

That will make PITRs fail:

1. pg_start_backup()
2. backup
3. shutdown, removes backup_label
4. pg_stop_backup()

step 4 will now fail because of a missing backup_label file.

Using the same kind of argument as you did above I would
say that pg_stop_backup() *should* fail if the server
restarted (and recovered!) inbetween - there was certainly something
fishy going on during the online backup.

In your list, you left out step 3.5: restart the server.
This step may fail if you do *not* remove the backup_label.

What is worse:
- Have pg_stop_backup() fail if the server was shut down
during the backup
or
- Prevent the server from restarting at all without manual
intervention.

I would say the latter.

Yours,
Laurenz Albe

#9Simon Riggs
simon@2ndquadrant.com
In reply to: Albe Laurenz (#8)
Re: backup_label and server start

On Wed, 2007-11-21 at 15:04 +0100, Albe Laurenz wrote:

Simon Riggs wrote:

If somebody stops the postmaster while an online backup is
in progress, there is no warning or nothing. Only the server
will fail to restart.

Well, it seems best not to do this. There is always a need
for a careful
procedure to manually shutdown a live server, interlocking with other
applications. ISTM like a manual procedure will resolve this for you.

You're arguing that there *should* be a manual intervention
if a server was shutdown while a backup was active.

Shutting down the server was a manual action, so what is wrong in a
manual action to recover from that mistake?

If the shutdown was automatic, then it needs to be properly scheduled so
automatic actions do not conflict with one another.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

#10Albe Laurenz
laurenz.albe@wien.gv.at
In reply to: Simon Riggs (#5)
Re: backup_label and server start

Simon Riggs wrote:

That will make PITRs fail:

1. pg_start_backup()
2. backup
3. shutdown, removes backup_label
4. pg_stop_backup()

step 4 will now fail because of a missing backup_label file.

Wait a minute:
pg_stop_backup() will also fail in the current setup,
because after recovery backup_label gets renamed
to backup_label.old.

So what do we lose if we remove (or rename) backup_label
on a clean server shutdown?

Yours,
Laurenz Albe

#11Albe Laurenz
laurenz.albe@wien.gv.at
In reply to: Simon Riggs (#7)
Re: backup_label and server start

Simon Riggs wrote:

On Wed, 2007-11-21 at 09:47 +0000, Peter Childs wrote:

How about this, emit a warning on shutdown and fail to shutdown until
the backup has finished.

That would be reasonable for -m smart shutdown.

We would then be treating the backup as a connection.

...but not for a fast shutdown.

Any comments against?

No, that would be ok with me.

Anything that gets us out of the trap that you can shutdown
a server without any warning and then cannot restart it without
manual intervention.

What about: refuse shutdown for "smart" if a backup is in progress,
but shutdown with a loud warning for "fast".

... I still don't know what's wrong with removing backup_label
upon a clean server shutdown ...

Yours,
Laurenz Albe

#12Bruce Momjian
bruce@momjian.us
In reply to: Albe Laurenz (#4)
Re: backup_label and server start

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Albe Laurenz wrote:

If the postmaster is stopped with 'pg_ctl stop' while an
online backup is in progress, the 'backup_label' file will remain
in the data directory.

[...]

the startup process will fail with a message like this:

[...]

PANIC: could not locate required checkpoint record
HINT: If you are not restoring from a backup, try removing the file "/POSTGRES/data/PG820/backup_label".

wouldn't it be a good thing
for the startup process to ignore (and rename) the backup_label
file if no recovery.conf is present?

Tom Lane replied:

No, it certainly wouldn't.

Point taken. When backup_label is present and recovery.conf isn't,
there is the risk that the data directory has been restored from
an online backup, in which case using the latest available
checkpoint would be detrimental.

I don't see why we should simplify the bizarre case you're
talking about

Well, it's not a bizarre case, it has happened twice here.

If somebody stops the postmaster while an online backup is
in progress, there is no warning or nothing. Only the server
will fail to restart.

One of our databases is running in a RedHat cluster, which
in this case cannot failover to another node.
And this can also happen during an online backup.

Simon Riggs replied:

The hint is telling you how to restart the original server, not a crafty
way of cheating the process to allow you to use it for backup.

What are you trying to do?

You misunderstood me, I'm not trying to cheat anything, nor do
I want to restore a backup that way.

All I want to do is restart a server after a clean shutdown.

How about my second suggestion:

Remove backup_label when the server shuts down cleanly.
In that case an online backup in progress will not be useful
anyway, and there is no need to recover on server restart.

What do you think?

Yours,
Laurenz Albe

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#13Bruce Momjian
bruce@momjian.us
In reply to: Albe Laurenz (#4)
Re: backup_label and server start

Added to TODO:

o Fix server restart problem when the server was shutdown during
a PITR backup

http://archives.postgresql.org/pgsql-hackers/2007-11/msg00800.php

---------------------------------------------------------------------------

Albe Laurenz wrote:

If the postmaster is stopped with 'pg_ctl stop' while an
online backup is in progress, the 'backup_label' file will remain
in the data directory.

[...]

the startup process will fail with a message like this:

[...]

PANIC: could not locate required checkpoint record
HINT: If you are not restoring from a backup, try removing the file "/POSTGRES/data/PG820/backup_label".

wouldn't it be a good thing
for the startup process to ignore (and rename) the backup_label
file if no recovery.conf is present?

Tom Lane replied:

No, it certainly wouldn't.

Point taken. When backup_label is present and recovery.conf isn't,
there is the risk that the data directory has been restored from
an online backup, in which case using the latest available
checkpoint would be detrimental.

I don't see why we should simplify the bizarre case you're
talking about

Well, it's not a bizarre case, it has happened twice here.

If somebody stops the postmaster while an online backup is
in progress, there is no warning or nothing. Only the server
will fail to restart.

One of our databases is running in a RedHat cluster, which
in this case cannot failover to another node.
And this can also happen during an online backup.

Simon Riggs replied:

The hint is telling you how to restart the original server, not a crafty
way of cheating the process to allow you to use it for backup.

What are you trying to do?

You misunderstood me, I'm not trying to cheat anything, nor do
I want to restore a backup that way.

All I want to do is restart a server after a clean shutdown.

How about my second suggestion:

Remove backup_label when the server shuts down cleanly.
In that case an online backup in progress will not be useful
anyway, and there is no need to recover on server restart.

What do you think?

Yours,
Laurenz Albe

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +