Add checkpoint/redo LSNs to recovery errors.
Hackers,
This patch adds checkpoint/redo LSNs to recovery error messages where
they may be useful for debugging.
When backup_label is present the LSNs are already output in a log
message, but it still seems like a good idea to repeat them.
When backup_label is not present, the checkpoint LSN is not logged
unless backup recovery is in progress or the checkpoint is found. In the
case where a backup is restored but the backup_label is missing, the
checkpoint LSN is not logged so it is useful for debugging to have it in
the error message.
Regards,
-David
Attachments:
add-lsn-to-err-v1.patchtext/plain; charset=UTF-8; name=add-lsn-to-err-v1.patchDownload+6-4
On Thu, Feb 29, 2024 at 10:53:15AM +1300, David Steele wrote:
This patch adds checkpoint/redo LSNs to recovery error messages where they
may be useful for debugging.
Thanks for following up on that!
When backup_label is not present, the checkpoint LSN is not logged unless
backup recovery is in progress or the checkpoint is found. In the case where
a backup is restored but the backup_label is missing, the checkpoint LSN is
not logged so it is useful for debugging to have it in the error message.
ereport(PANIC,
- (errmsg("could not locate a valid checkpoint record")));
+ (errmsg("could not locate a valid checkpoint record at %X/%X",
+ LSN_FORMAT_ARGS(CheckPointLoc))));
}
I've seen this one in the field occasionally, so that's really a
welcome addition IMO.
I've scanned a bit xlogrecovery.c, and I have spotted a couple of that
could gain more information.
ereport(PANIC,
(errmsg("invalid redo record in shutdown checkpoint")));
[...]
ereport(PANIC,
(errmsg("invalid redo in checkpoint record")));
These two could mention CheckPointLoc and checkPoint.redo.
ereport(PANIC,
(errmsg("invalid next transaction ID")));
Perhaps some XID information could be added here?
ereport(FATAL,
(errmsg("WAL ends before consistent recovery point")));
[...]
ereport(FATAL,
(errmsg("WAL ends before end of online backup"),
These two are in xlog.c, and don't mention backupStartPoint for the
first one. Perhaps there's a point in adding some information about
EndOfLog and LocalMinRecoveryPoint as well?
--
Michael
On 2/29/24 16:42, Michael Paquier wrote:
On Thu, Feb 29, 2024 at 10:53:15AM +1300, David Steele wrote:
This patch adds checkpoint/redo LSNs to recovery error messages where they
may be useful for debugging.I've scanned a bit xlogrecovery.c, and I have spotted a couple of that
could gain more information.ereport(PANIC,
(errmsg("invalid redo record in shutdown checkpoint")));
[...]
ereport(PANIC,
(errmsg("invalid redo in checkpoint record")));
These two could mention CheckPointLoc and checkPoint.redo.ereport(PANIC,
(errmsg("invalid next transaction ID")));
Perhaps some XID information could be added here?ereport(FATAL,
(errmsg("WAL ends before consistent recovery point")));
[...]
ereport(FATAL,
(errmsg("WAL ends before end of online backup"),These two are in xlog.c, and don't mention backupStartPoint for the
first one. Perhaps there's a point in adding some information about
EndOfLog and LocalMinRecoveryPoint as well?
For now I'd like to just focus on these three messages that are related
to a missing backup_label or a misconfiguration of restore_command when
backup_label is present.
No doubt there are many other recovery log messages that could be
improved, but I'd rather not bog down the patch.
Regards,
-David
On Sun, Mar 10, 2024 at 04:58:19PM +1300, David Steele wrote:
No doubt there are many other recovery log messages that could be improved,
but I'd rather not bog down the patch.
Fair argument, and these additions are useful when taken
independently. I've applied your suggestions for now.
--
Michael