Improve error context after some failed XLogReadRecord()
Hi all,
I have noticed $subject while looking at a patch in this area:
/messages/by-id/YYiqxtEqhgb5gIWL@paquier.xyz
While we don't expect the four callers of XLogReadRecord() related to
logical decoding to fail in the code paths changed by the patch
attached, I think that it would be good to provide more context to
users rather than just emitting an elog(ERROR) coming directly from
the WAL reader. This way, it becomes a bit easier to guess from where
the failure is coming.
A patch is attached to improve all those elog()s. I have tried not to
be too fancy about those new error strings, but I am sure that these
could be tweaked.
Thoughts?
--
Michael
Attachments:
xlogreader-errs.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index aae0ae5b8a..b7d9521576 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -605,9 +605,9 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
/* the read_page callback waits for new WAL */
record = XLogReadRecord(ctx->reader, &err);
if (err)
- elog(ERROR, "%s", err);
+ elog(ERROR, "could not find logical decoding starting point: %s", err);
if (!record)
- elog(ERROR, "no record found"); /* shouldn't happen */
+ elog(ERROR, "could not find logical decoding starting point");
LogicalDecodingProcessRecord(ctx, ctx->reader);
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 2609a0a710..6cd2279a2e 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -276,7 +276,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
record = XLogReadRecord(ctx->reader, &errm);
if (errm)
- elog(ERROR, "%s", errm);
+ elog(ERROR, "could not find record for logical decoding: %s", errm);
/*
* The {begin_txn,change,commit_txn}_wrapper callbacks above will
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index a80298ba53..9c6b4cf1ae 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -529,7 +529,8 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
*/
record = XLogReadRecord(ctx->reader, &errm);
if (errm)
- elog(ERROR, "%s", errm);
+ elog(ERROR, "could not find record to advance replication slot: %s",
+ errm);
/*
* Process the record. Storage-level changes are ignored in
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index fff7dfc640..dd5066c3e3 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2979,7 +2979,7 @@ XLogSendLogical(void)
/* xlog record was invalid */
if (errm != NULL)
- elog(ERROR, "%s", errm);
+ elog(ERROR, "could not find record to send logically-decoded data: %s", errm);
if (record != NULL)
{
At Tue, 9 Nov 2021 10:47:11 +0900, Michael Paquier <michael@paquier.xyz> wrote in
Hi all,
I have noticed $subject while looking at a patch in this area:
/messages/by-id/YYiqxtEqhgb5gIWL@paquier.xyzWhile we don't expect the four callers of XLogReadRecord() related to
logical decoding to fail in the code paths changed by the patch
attached, I think that it would be good to provide more context to
users rather than just emitting an elog(ERROR) coming directly from
the WAL reader. This way, it becomes a bit easier to guess from where
the failure is coming.A patch is attached to improve all those elog()s. I have tried not to
be too fancy about those new error strings, but I am sure that these
could be tweaked.Thoughts?
As the whole, I agree that any substantial message is far better than
"%s" in regard to identifying the line that emitted a log line.
+ elog(ERROR, "could not find record to advance replication slot: %s",
I prefer "could not read record while advancing replcation slot",
because it is not finding a record but it is reading successive
records. However, I don't strongly object to the current wording.
+ elog(ERROR, "could not find record to send logically-decoded data: %s", errm);
I prefer something like the above for the same reason. How about
"could not read record while sending logically-decoded data:". The
same with above, I'm not strongly opposed to the current wording.
Otherwise it looks good to me.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Tue, Nov 09, 2021 at 04:58:11PM +0900, Kyotaro Horiguchi wrote:
Otherwise it looks good to me.
What you are proposing is better than what I had in those two
contexts, so I have just used what you have suggested for both of
them. Thanks!
--
Michael