Confusing message on startup after a crash while recovering

Started by Florian G. Pflugalmost 19 years ago3 messages
#1Florian G. Pflug
fgp@phlo.org

Hi

When postgres crashes during recovery, and is then restarted, it
says:
"database system was interrupted while in recovery at ...
This probably means that some data is corrupted and
you will have to use the last backup for recovery."

When I first read that message, I assumed that there are cases were
postgres can't recover from a crash that happened during recovery.
I guessed that some operations done during wal restore are not
idempotent, and lead to corrupt data if performed twice.

Only after actually reading the sourcecode of xlog.c, and seeing that
the a similar (but better worded) warning is output after a crash during
archive log replay, I realized that this warning probably just means
that corrupt data could be the _cause_ for the crash during recovery, not
the _caused_by_ a crash during recovery.

I'd suggest that the text is changed to something along the line of:
"database system was interrupted while in recovery at ...
If this has occurred more than once some data may be corrupted and
you may need to restore from the last backup."

This would also match the message for "interrupted while doign archive
log replay" more closely.

greetings, Florian Pflug

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian G. Pflug (#1)
Re: Confusing message on startup after a crash while recovering

"Florian G. Pflug" <fgp@phlo.org> writes:

I'd suggest that the text is changed to something along the line of:
"database system was interrupted while in recovery at ...
If this has occurred more than once some data may be corrupted and
you may need to restore from the last backup."

It seems the real problem is that it's not specifying *which* data is
probably corrupted. Maybe:

HINT: If recovery fails repeatedly, it probably means that the recovery log
data is corrupted; you may have to restore from your last full backup.

Also, do we want to suggest use of pg_resetxlog in the message?

regards, tom lane

#3Florian G. Pflug
fgp@phlo.org
In reply to: Tom Lane (#2)
Re: Confusing message on startup after a crash while recovering

Tom Lane wrote:

"Florian G. Pflug" <fgp@phlo.org> writes:

I'd suggest that the text is changed to something along the line of:
"database system was interrupted while in recovery at ...
If this has occurred more than once some data may be corrupted and
you may need to restore from the last backup."

It seems the real problem is that it's not specifying *which* data is
probably corrupted. Maybe:

HINT: If recovery fails repeatedly, it probably means that the recovery log
data is corrupted; you may have to restore from your last full backup.

IMHO that wording would be fine too - the important points for me is to
clearly state that corrupted data is maybe the _cause_ of the crash, and
not the _effect_ of the crash. And for the sake of consistency, the
message for abort-during-recovery and abort-during-archivelog-replay
should be similar.

Also, do we want to suggest use of pg_resetxlog in the message?

I'd rather add some documentation of how to use pg_resetxlog to the
manual if it's not already there, any maybe reference that chapter in
a HINT message. In that manual chapter you can warn about the dangers
of pg_resetxlog, and put in an advice to backup the database before
using it. I think such a warning is important, because any documentation
of pg_resetxlog is targeted at users know are not familiar with postgres
internals, and those users are likely to shoot themselves in their foot
if you point them to pg_resetxlog.

greetings, Florian Pflug