Add information to rm_redo_error_callback()

Started by Drouvot, Bertrandover 5 years ago13 messages
#1Drouvot, Bertrand
bdrouvot@amazon.com
1 attachment(s)

Hi hackers,

I've attached a small patch to add information to rm_redo_error_callback().

The changes attached in this patch came while working on the "Add
information during standby recovery conflicts" patch (See [1]https://commitfest.postgresql.org/29/2604).

The goal is to add more information during the callback (if doable), so
that something like:

2020-08-04 14:42:57.545 UTC [15459] CONTEXT:  WAL redo at 0/4A3B0DE0 for
Heap2/CLEAN: remxid 1168

would get extra information that way:

2020-08-04 14:42:57.545 UTC [15459] CONTEXT:  WAL redo at 0/4A3B0DE0 for
Heap2/CLEAN: remxid 1168, blkref #0: rel 1663/13586/16850 fork main blk 0

As this could be useful outside of [1]https://commitfest.postgresql.org/29/2604, a dedicated "sub" patch has been
created (thanks Sawada for the suggestion).

I will add this patch to the next commitfest. I look forward to your
feedback about the idea and/or implementation.

Regards,

Bertrand

[1]: https://commitfest.postgresql.org/29/2604

Attachments:

v1-0001-improve-rm_redo_error_callback.patchtext/plain; charset=UTF-8; name=v1-0001-improve-rm_redo_error_callback.patch; x-mac-creator=0; x-mac-type=0Download
 src/backend/access/transam/xlog.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 55cac186dc..cf3267a347 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11745,10 +11745,22 @@ rm_redo_error_callback(void *arg)
 {
 	XLogReaderState *record = (XLogReaderState *) arg;
 	StringInfoData buf;
+	int block_id;
+	RelFileNode rnode;
+	ForkNumber forknum;
+	BlockNumber blknum;
 
 	initStringInfo(&buf);
 	xlog_outdesc(&buf, record);
 
+	for (block_id = 0; block_id <= record->max_block_id; block_id++)
+	{
+		if (XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blknum))
+			appendStringInfo(&buf,", blkref #%d: rel %u/%u/%u fork %s blk %u",
+						block_id, rnode.spcNode, rnode.dbNode,
+						rnode.relNode, forkNames[forknum],
+						blknum);
+	}
 	/* translator: %s is a WAL record description */
 	errcontext("WAL redo at %X/%X for %s",
 			   (uint32) (record->ReadRecPtr >> 32),
#2Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Drouvot, Bertrand (#1)
Re: Add information to rm_redo_error_callback()

On Wed, 5 Aug 2020 at 00:37, Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi hackers,

I've attached a small patch to add information to rm_redo_error_callback().

The changes attached in this patch came while working on the "Add
information during standby recovery conflicts" patch (See [1]).

The goal is to add more information during the callback (if doable), so
that something like:

2020-08-04 14:42:57.545 UTC [15459] CONTEXT: WAL redo at 0/4A3B0DE0 for
Heap2/CLEAN: remxid 1168

would get extra information that way:

2020-08-04 14:42:57.545 UTC [15459] CONTEXT: WAL redo at 0/4A3B0DE0 for
Heap2/CLEAN: remxid 1168, blkref #0: rel 1663/13586/16850 fork main blk 0

As this could be useful outside of [1], a dedicated "sub" patch has been
created (thanks Sawada for the suggestion).

I will add this patch to the next commitfest. I look forward to your
feedback about the idea and/or implementation.

Thank you for starting the new thread for this patch!

I think this patch is simple enough and improves information shown in
errcontext.

I have two comments on the patch:

diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 756b838e6a..8b2024e9e9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11749,10 +11749,22 @@ rm_redo_error_callback(void *arg)
 {
    XLogReaderState *record = (XLogReaderState *) arg;
    StringInfoData buf;
+   int block_id;
+   RelFileNode rnode;
+   ForkNumber forknum;
+   BlockNumber blknum;

initStringInfo(&buf);
xlog_outdesc(&buf, record);

+   for (block_id = 0; block_id <= record->max_block_id; block_id++)
+   {
+       if (XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blknum))
+           appendStringInfo(&buf,", blkref #%d: rel %u/%u/%u fork %s blk %u",
+                       block_id, rnode.spcNode, rnode.dbNode,
+                       rnode.relNode, forkNames[forknum],
+                       blknum);
+   }
    /* translator: %s is a WAL record description */
    errcontext("WAL redo at %X/%X for %s",
               (uint32) (record->ReadRecPtr >> 32),

rnode, forknum and blknum can be declared within the for loop.

I think it's better to put a new line just before the comment starting
from "translator:".

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Masahiko Sawada (#2)
1 attachment(s)
Re: Add information to rm_redo_error_callback()

Hi,

On 8/10/20 7:10 AM, Masahiko Sawada wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

On Wed, 5 Aug 2020 at 00:37, Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi hackers,

I've attached a small patch to add information to rm_redo_error_callback().

The changes attached in this patch came while working on the "Add
information during standby recovery conflicts" patch (See [1]).

The goal is to add more information during the callback (if doable), so
that something like:

2020-08-04 14:42:57.545 UTC [15459] CONTEXT: WAL redo at 0/4A3B0DE0 for
Heap2/CLEAN: remxid 1168

would get extra information that way:

2020-08-04 14:42:57.545 UTC [15459] CONTEXT: WAL redo at 0/4A3B0DE0 for
Heap2/CLEAN: remxid 1168, blkref #0: rel 1663/13586/16850 fork main blk 0

As this could be useful outside of [1], a dedicated "sub" patch has been
created (thanks Sawada for the suggestion).

I will add this patch to the next commitfest. I look forward to your
feedback about the idea and/or implementation.

Thank you for starting the new thread for this patch!

I think this patch is simple enough and improves information shown in
errcontext.

I have two comments on the patch:

diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 756b838e6a..8b2024e9e9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11749,10 +11749,22 @@ rm_redo_error_callback(void *arg)
{
XLogReaderState *record = (XLogReaderState *) arg;
StringInfoData buf;
+   int block_id;
+   RelFileNode rnode;
+   ForkNumber forknum;
+   BlockNumber blknum;

initStringInfo(&buf);
xlog_outdesc(&buf, record);

+   for (block_id = 0; block_id <= record->max_block_id; block_id++)
+   {
+       if (XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blknum))
+           appendStringInfo(&buf,", blkref #%d: rel %u/%u/%u fork %s blk %u",
+                       block_id, rnode.spcNode, rnode.dbNode,
+                       rnode.relNode, forkNames[forknum],
+                       blknum);
+   }
/* translator: %s is a WAL record description */
errcontext("WAL redo at %X/%X for %s",
(uint32) (record->ReadRecPtr >> 32),

rnode, forknum and blknum can be declared within the for loop.

I think it's better to put a new line just before the comment starting
from "translator:".

Thanks for looking at it!

I've attached a new version as per your comments.

Thanks,

Bertrand

Attachments:

v1-0002-improve-rm_redo_error_callback.patchtext/plain; charset=UTF-8; name=v1-0002-improve-rm_redo_error_callback.patch; x-mac-creator=0; x-mac-type=0Download
 src/backend/access/transam/xlog.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 55cac186dc..757b2bf3d0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11745,10 +11745,24 @@ rm_redo_error_callback(void *arg)
 {
 	XLogReaderState *record = (XLogReaderState *) arg;
 	StringInfoData buf;
+	int block_id;
 
 	initStringInfo(&buf);
 	xlog_outdesc(&buf, record);
 
+	for (block_id = 0; block_id <= record->max_block_id; block_id++)
+	{
+		RelFileNode rnode;
+		ForkNumber forknum;
+		BlockNumber blknum;
+
+		if (XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blknum))
+			appendStringInfo(&buf,", blkref #%d: rel %u/%u/%u fork %s blk %u",
+						block_id, rnode.spcNode, rnode.dbNode,
+						rnode.relNode, forkNames[forknum],
+						blknum);
+	}
+
 	/* translator: %s is a WAL record description */
 	errcontext("WAL redo at %X/%X for %s",
 			   (uint32) (record->ReadRecPtr >> 32),
#4Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Drouvot, Bertrand (#3)
Re: Add information to rm_redo_error_callback()

On Tue, 11 Aug 2020 at 00:07, Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi,

On 8/10/20 7:10 AM, Masahiko Sawada wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

On Wed, 5 Aug 2020 at 00:37, Drouvot, Bertrand <bdrouvot@amazon.com> wrote:

Hi hackers,

I've attached a small patch to add information to rm_redo_error_callback().

The changes attached in this patch came while working on the "Add
information during standby recovery conflicts" patch (See [1]).

The goal is to add more information during the callback (if doable), so
that something like:

2020-08-04 14:42:57.545 UTC [15459] CONTEXT: WAL redo at 0/4A3B0DE0 for
Heap2/CLEAN: remxid 1168

would get extra information that way:

2020-08-04 14:42:57.545 UTC [15459] CONTEXT: WAL redo at 0/4A3B0DE0 for
Heap2/CLEAN: remxid 1168, blkref #0: rel 1663/13586/16850 fork main blk 0

As this could be useful outside of [1], a dedicated "sub" patch has been
created (thanks Sawada for the suggestion).

I will add this patch to the next commitfest. I look forward to your
feedback about the idea and/or implementation.

Thank you for starting the new thread for this patch!

I think this patch is simple enough and improves information shown in
errcontext.

I have two comments on the patch:

diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 756b838e6a..8b2024e9e9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11749,10 +11749,22 @@ rm_redo_error_callback(void *arg)
{
XLogReaderState *record = (XLogReaderState *) arg;
StringInfoData buf;
+   int block_id;
+   RelFileNode rnode;
+   ForkNumber forknum;
+   BlockNumber blknum;

initStringInfo(&buf);
xlog_outdesc(&buf, record);

+   for (block_id = 0; block_id <= record->max_block_id; block_id++)
+   {
+       if (XLogRecGetBlockTag(record, block_id, &rnode, &forknum, &blknum))
+           appendStringInfo(&buf,", blkref #%d: rel %u/%u/%u fork %s blk %u",
+                       block_id, rnode.spcNode, rnode.dbNode,
+                       rnode.relNode, forkNames[forknum],
+                       blknum);
+   }
/* translator: %s is a WAL record description */
errcontext("WAL redo at %X/%X for %s",
(uint32) (record->ReadRecPtr >> 32),

rnode, forknum and blknum can be declared within the for loop.

I think it's better to put a new line just before the comment starting
from "translator:".

Thanks for looking at it!

I've attached a new version as per your comments.

Thank you for updating the patch!

The patch looks good to me. I've set this patch as Ready for Committer.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#4)
Re: Add information to rm_redo_error_callback()

On Tue, Aug 11, 2020 at 02:45:50PM +0900, Masahiko Sawada wrote:

Thank you for updating the patch!

The patch looks good to me. I've set this patch as Ready for Committer.

+       for (block_id = 0; block_id <= record->max_block_id; block_id++)
+       {
+               RelFileNode rnode;
+               ForkNumber forknum;
+               BlockNumber blknum;

Doesn't this potentially create duplicate information in some of the
RM's desc() callbacks, and are we sure that the information provided
is worth having for all the RMs? As one example, gin_desc() looks at
some of the block information, so there are overlaps. It may be
worth thinking about showing more information for has_image and
apply_image if a block is in_use?
--
Michael

#6Masahiko Sawada
masahiko.sawada@2ndquadrant.com
In reply to: Michael Paquier (#5)
Re: Add information to rm_redo_error_callback()

On Tue, 11 Aug 2020 at 15:30, Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Aug 11, 2020 at 02:45:50PM +0900, Masahiko Sawada wrote:

Thank you for updating the patch!

The patch looks good to me. I've set this patch as Ready for Committer.

+       for (block_id = 0; block_id <= record->max_block_id; block_id++)
+       {
+               RelFileNode rnode;
+               ForkNumber forknum;
+               BlockNumber blknum;

Doesn't this potentially create duplicate information in some of the
RM's desc() callbacks, and are we sure that the information provided
is worth having for all the RMs? As one example, gin_desc() looks at
some of the block information, so there are overlaps.

Yeah, there is duplicate information in some RMs. I thought that we
can change individual RM’s desc() functions to output the block
information but as long as I see the pg_waldump outputs these are not
annoying to me and many of RM’s desc() doesn’t show the block
information.

It may be
worth thinking about showing more information for has_image and
apply_image if a block is in_use?

Yes. I’m okay with adding information for has_image and apply_image
but IMHO I'm not sure how these shown in errcontext would help. If an
error related to has_image or apply_image happens, errmsg should show
something detailed information about FPI.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Masahiko Sawada (#6)
Re: Add information to rm_redo_error_callback()

Hi,

Thanks for the feedback.

On 8/11/20 12:03 PM, Masahiko Sawada wrote:

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

On Tue, 11 Aug 2020 at 15:30, Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Aug 11, 2020 at 02:45:50PM +0900, Masahiko Sawada wrote:

Thank you for updating the patch!

The patch looks good to me. I've set this patch as Ready for Committer.

+       for (block_id = 0; block_id <= record->max_block_id; block_id++)
+       {
+               RelFileNode rnode;
+               ForkNumber forknum;
+               BlockNumber blknum;

Doesn't this potentially create duplicate information in some of the
RM's desc() callbacks, and are we sure that the information provided
is worth having for all the RMs? As one example, gin_desc() looks at
some of the block information, so there are overlaps.

Yeah, there is duplicate information in some RMs. I thought that we
can change individual RM’s desc() functions to output the block
information but as long as I see the pg_waldump outputs these are not
annoying to me and many of RM’s desc() doesn’t show the block
information.

Having this "pg_waldump" kind of format in this place
(rm_redo_error_callback()) ensures that we'll always see the same piece
of information during rm_redo.

I think it's good to guarantee that we'll always see the same piece of
information (should a new RM desc() be created in the future for
example), even if it could lead to some information overlap in some cases.

It may be
worth thinking about showing more information for has_image and
apply_image if a block is in_use?

Yes. I’m okay with adding information for has_image and apply_image
but IMHO I'm not sure how these shown in errcontext would help. If an
error related to has_image or apply_image happens, errmsg should show
something detailed information about FPI.

I am ok too, but I am also not sure that errcontext is the right place
for that.

Thanks

Bertrand

Show quoted text

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Drouvot, Bertrand (#7)
Re: Add information to rm_redo_error_callback()

On 2020-Aug-17, Drouvot, Bertrand wrote:

Having this "pg_waldump" kind of format in this place
(rm_redo_error_callback()) ensures that we'll always see the same piece of
information during rm_redo.

I think it's good to guarantee that we'll always see the same piece of
information (should a new RM desc() be created in the future for example),
even if it could lead to some information overlap in some cases.

I agree.

I think we should treat the changes to remove rm_desc-specific info
items that are redundant as separate improvements that don't need to
block this patch. They would be, at worst, only minor annoyances.
And the removal, as was said, can affect other things that we might want
to think about separately.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Michael Paquier
michael@paquier.xyz
In reply to: Drouvot, Bertrand (#7)
Re: Add information to rm_redo_error_callback()

On Mon, Aug 17, 2020 at 05:47:13PM +0200, Drouvot, Bertrand wrote:

I think it's good to guarantee that we'll always see the same piece of
information (should a new RM desc() be created in the future for example),
even if it could lead to some information overlap in some cases.

I am ok too, but I am also not sure that errcontext is the right place for
that.

Hmm. I still think that knowing at least about a FPW could be an
interesting piece of information even here. Anyway, instead of
copying a logic that exists already in xlog_outrec(), why not moving
the block information print into a separate routine out of the
WAL_DEBUG section, and just reuse the same format for the context of
the redo error callback? That would also be more consistent with what
we do in pg_waldump where we don't show the fork name of a block when
it is on a MAIN_FORKNUM. And this would avoid a third copy of the
same logic. If we add the XID, previous LSN and the record length
on the stack of what is printed, we could just reuse the existing
routine, still that's perhaps too much information displayed.
--
Michael

#10Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#9)
1 attachment(s)
Re: Add information to rm_redo_error_callback()

On Thu, Sep 24, 2020 at 03:03:46PM +0900, Michael Paquier wrote:

Hmm. I still think that knowing at least about a FPW could be an
interesting piece of information even here. Anyway, instead of
copying a logic that exists already in xlog_outrec(), why not moving
the block information print into a separate routine out of the
WAL_DEBUG section, and just reuse the same format for the context of
the redo error callback? That would also be more consistent with what
we do in pg_waldump where we don't show the fork name of a block when
it is on a MAIN_FORKNUM. And this would avoid a third copy of the
same logic. If we add the XID, previous LSN and the record length
on the stack of what is printed, we could just reuse the existing
routine, still that's perhaps too much information displayed.

Seeing nothing, I took a swing at that, and finished with the
attached that refactors the logic and prints the block information as
wanted. Any objections to that?
--
Michael

Attachments:

redo-callback-block-v2.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 79a77ebbfe..e96075158a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -940,6 +940,7 @@ static bool CheckForStandbyTrigger(void);
 #ifdef WAL_DEBUG
 static void xlog_outrec(StringInfo buf, XLogReaderState *record);
 #endif
+static void xlog_block_info(StringInfo buf, XLogReaderState *record);
 static void xlog_outdesc(StringInfo buf, XLogReaderState *record);
 static void pg_start_backup_callback(int code, Datum arg);
 static void pg_stop_backup_callback(int code, Datum arg);
@@ -10258,6 +10259,19 @@ xlog_outrec(StringInfo buf, XLogReaderState *record)
 	appendStringInfo(buf, "; len %u",
 					 XLogRecGetDataLen(record));
 
+	xlog_block_info(buf, record);
+}
+#endif							/* WAL_DEBUG */
+
+/*
+ * Returns a string giving information about all the blocks in an
+ * XLogRecord.
+ */
+static void
+xlog_block_info(StringInfo buf, XLogReaderState *record)
+{
+	int block_id;
+
 	/* decode block references */
 	for (block_id = 0; block_id <= record->max_block_id; block_id++)
 	{
@@ -10284,7 +10298,6 @@ xlog_outrec(StringInfo buf, XLogReaderState *record)
 			appendStringInfoString(buf, " FPW");
 	}
 }
-#endif							/* WAL_DEBUG */
 
 /*
  * Returns a string describing an XLogRecord, consisting of its identity
@@ -11765,6 +11778,7 @@ rm_redo_error_callback(void *arg)
 
 	initStringInfo(&buf);
 	xlog_outdesc(&buf, record);
+	xlog_block_info(&buf, record);
 
 	/* translator: %s is a WAL record description */
 	errcontext("WAL redo at %X/%X for %s",
#11Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Michael Paquier (#10)
Re: Add information to rm_redo_error_callback()

Hi,

On 10/1/20 9:41 AM, Michael Paquier wrote:

On Thu, Sep 24, 2020 at 03:03:46PM +0900, Michael Paquier wrote:

Hmm. I still think that knowing at least about a FPW could be an
interesting piece of information even here. Anyway, instead of
copying a logic that exists already in xlog_outrec(), why not moving
the block information print into a separate routine out of the
WAL_DEBUG section, and just reuse the same format for the context of
the redo error callback? That would also be more consistent with what
we do in pg_waldump where we don't show the fork name of a block when
it is on a MAIN_FORKNUM. And this would avoid a third copy of the
same logic. If we add the XID, previous LSN and the record length
on the stack of what is printed, we could just reuse the existing
routine, still that's perhaps too much information displayed.

Seeing nothing, I took a swing at that, and finished with the
attached that refactors the logic and prints the block information as
wanted. Any objections to that?

Sorry for the late reply and thanks for looking at it!

Had a look at it and did a few tests: looks all good to me.

No objections at all, thanks!

Bertrand

Show quoted text

--
Michael

#12Michael Paquier
michael@paquier.xyz
In reply to: Drouvot, Bertrand (#11)
Re: Add information to rm_redo_error_callback()

On Thu, Oct 01, 2020 at 11:18:30AM +0200, Drouvot, Bertrand wrote:

Had a look at it and did a few tests: looks all good to me.

No objections at all, thanks!

Thanks for double-checking. Applied, then.
--
Michael

#13Drouvot, Bertrand
bdrouvot@amazon.com
In reply to: Michael Paquier (#12)
Re: Add information to rm_redo_error_callback()

On 10/2/20 2:47 AM, Michael Paquier wrote:

On Thu, Oct 01, 2020 at 11:18:30AM +0200, Drouvot, Bertrand wrote:

Had a look at it and did a few tests: looks all good to me.

No objections at all, thanks!

Thanks for double-checking. Applied, then.

Thanks!

Bertrand