Add checkpoint/redo LSNs to recovery errors.

Started by David Steeleabout 2 years ago4 messageshackers
Jump to latest
#1David Steele
david@pgmasters.net

Hackers,

This patch adds checkpoint/redo LSNs to recovery error messages where
they may be useful for debugging.

When backup_label is present the LSNs are already output in a log
message, but it still seems like a good idea to repeat them.

When backup_label is not present, the checkpoint LSN is not logged
unless backup recovery is in progress or the checkpoint is found. In the
case where a backup is restored but the backup_label is missing, the
checkpoint LSN is not logged so it is useful for debugging to have it in
the error message.

Regards,
-David

Attachments:

add-lsn-to-err-v1.patchtext/plain; charset=UTF-8; name=add-lsn-to-err-v1.patchDownload+6-4
#2Michael Paquier
michael@paquier.xyz
In reply to: David Steele (#1)
Re: Add checkpoint/redo LSNs to recovery errors.

On Thu, Feb 29, 2024 at 10:53:15AM +1300, David Steele wrote:

This patch adds checkpoint/redo LSNs to recovery error messages where they
may be useful for debugging.

Thanks for following up on that!

When backup_label is not present, the checkpoint LSN is not logged unless
backup recovery is in progress or the checkpoint is found. In the case where
a backup is restored but the backup_label is missing, the checkpoint LSN is
not logged so it is useful for debugging to have it in the error message.

             ereport(PANIC,
-                    (errmsg("could not locate a valid checkpoint record")));
+                    (errmsg("could not locate a valid checkpoint record at %X/%X",
+                            LSN_FORMAT_ARGS(CheckPointLoc))));
         }

I've seen this one in the field occasionally, so that's really a
welcome addition IMO.

I've scanned a bit xlogrecovery.c, and I have spotted a couple of that
could gain more information.

ereport(PANIC,
(errmsg("invalid redo record in shutdown checkpoint")));
[...]
ereport(PANIC,
(errmsg("invalid redo in checkpoint record")));
These two could mention CheckPointLoc and checkPoint.redo.

ereport(PANIC,
(errmsg("invalid next transaction ID")));
Perhaps some XID information could be added here?

ereport(FATAL,
(errmsg("WAL ends before consistent recovery point")));
[...]
ereport(FATAL,
(errmsg("WAL ends before end of online backup"),

These two are in xlog.c, and don't mention backupStartPoint for the
first one. Perhaps there's a point in adding some information about
EndOfLog and LocalMinRecoveryPoint as well?
--
Michael

#3David Steele
david@pgmasters.net
In reply to: Michael Paquier (#2)
Re: Add checkpoint/redo LSNs to recovery errors.

On 2/29/24 16:42, Michael Paquier wrote:

On Thu, Feb 29, 2024 at 10:53:15AM +1300, David Steele wrote:

This patch adds checkpoint/redo LSNs to recovery error messages where they
may be useful for debugging.

I've scanned a bit xlogrecovery.c, and I have spotted a couple of that
could gain more information.

ereport(PANIC,
(errmsg("invalid redo record in shutdown checkpoint")));
[...]
ereport(PANIC,
(errmsg("invalid redo in checkpoint record")));
These two could mention CheckPointLoc and checkPoint.redo.

ereport(PANIC,
(errmsg("invalid next transaction ID")));
Perhaps some XID information could be added here?

ereport(FATAL,
(errmsg("WAL ends before consistent recovery point")));
[...]
ereport(FATAL,
(errmsg("WAL ends before end of online backup"),

These two are in xlog.c, and don't mention backupStartPoint for the
first one. Perhaps there's a point in adding some information about
EndOfLog and LocalMinRecoveryPoint as well?

For now I'd like to just focus on these three messages that are related
to a missing backup_label or a misconfiguration of restore_command when
backup_label is present.

No doubt there are many other recovery log messages that could be
improved, but I'd rather not bog down the patch.

Regards,
-David

#4Michael Paquier
michael@paquier.xyz
In reply to: David Steele (#3)
Re: Add checkpoint/redo LSNs to recovery errors.

On Sun, Mar 10, 2024 at 04:58:19PM +1300, David Steele wrote:

No doubt there are many other recovery log messages that could be improved,
but I'd rather not bog down the patch.

Fair argument, and these additions are useful when taken
independently. I've applied your suggestions for now.
--
Michael