Improve error context after some failed XLogReadRecord()

Started by Michael Paquierabout 4 years ago3 messages
#1Michael Paquier
michael@paquier.xyz
1 attachment(s)

Hi all,

I have noticed $subject while looking at a patch in this area:
/messages/by-id/YYiqxtEqhgb5gIWL@paquier.xyz

While we don't expect the four callers of XLogReadRecord() related to
logical decoding to fail in the code paths changed by the patch
attached, I think that it would be good to provide more context to
users rather than just emitting an elog(ERROR) coming directly from
the WAL reader. This way, it becomes a bit easier to guess from where
the failure is coming.

A patch is attached to improve all those elog()s. I have tried not to
be too fancy about those new error strings, but I am sure that these
could be tweaked.

Thoughts?
--
Michael

Attachments:

xlogreader-errs.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index aae0ae5b8a..b7d9521576 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -605,9 +605,9 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
 		/* the read_page callback waits for new WAL */
 		record = XLogReadRecord(ctx->reader, &err);
 		if (err)
-			elog(ERROR, "%s", err);
+			elog(ERROR, "could not find logical decoding starting point: %s", err);
 		if (!record)
-			elog(ERROR, "no record found"); /* shouldn't happen */
+			elog(ERROR, "could not find logical decoding starting point");
 
 		LogicalDecodingProcessRecord(ctx, ctx->reader);
 
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 2609a0a710..6cd2279a2e 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -276,7 +276,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 
 			record = XLogReadRecord(ctx->reader, &errm);
 			if (errm)
-				elog(ERROR, "%s", errm);
+				elog(ERROR, "could not find record for logical decoding: %s", errm);
 
 			/*
 			 * The {begin_txn,change,commit_txn}_wrapper callbacks above will
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index a80298ba53..9c6b4cf1ae 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -529,7 +529,8 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto)
 			 */
 			record = XLogReadRecord(ctx->reader, &errm);
 			if (errm)
-				elog(ERROR, "%s", errm);
+				elog(ERROR, "could not find record to advance replication slot: %s",
+					 errm);
 
 			/*
 			 * Process the record.  Storage-level changes are ignored in
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index fff7dfc640..dd5066c3e3 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2979,7 +2979,7 @@ XLogSendLogical(void)
 
 	/* xlog record was invalid */
 	if (errm != NULL)
-		elog(ERROR, "%s", errm);
+		elog(ERROR, "could not find record to send logically-decoded data: %s", errm);
 
 	if (record != NULL)
 	{
#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#1)
Re: Improve error context after some failed XLogReadRecord()

At Tue, 9 Nov 2021 10:47:11 +0900, Michael Paquier <michael@paquier.xyz> wrote in

Hi all,

I have noticed $subject while looking at a patch in this area:
/messages/by-id/YYiqxtEqhgb5gIWL@paquier.xyz

While we don't expect the four callers of XLogReadRecord() related to
logical decoding to fail in the code paths changed by the patch
attached, I think that it would be good to provide more context to
users rather than just emitting an elog(ERROR) coming directly from
the WAL reader. This way, it becomes a bit easier to guess from where
the failure is coming.

A patch is attached to improve all those elog()s. I have tried not to
be too fancy about those new error strings, but I am sure that these
could be tweaked.

Thoughts?

As the whole, I agree that any substantial message is far better than
"%s" in regard to identifying the line that emitted a log line.

+ elog(ERROR, "could not find record to advance replication slot: %s",

I prefer "could not read record while advancing replcation slot",
because it is not finding a record but it is reading successive
records. However, I don't strongly object to the current wording.

+ elog(ERROR, "could not find record to send logically-decoded data: %s", errm);

I prefer something like the above for the same reason. How about
"could not read record while sending logically-decoded data:". The
same with above, I'm not strongly opposed to the current wording.

Otherwise it looks good to me.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#2)
Re: Improve error context after some failed XLogReadRecord()

On Tue, Nov 09, 2021 at 04:58:11PM +0900, Kyotaro Horiguchi wrote:

Otherwise it looks good to me.

What you are proposing is better than what I had in those two
contexts, so I have just used what you have suggested for both of
them. Thanks!
--
Michael