Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

Started by Bharath Rupireddyover 3 years ago42 messageshackers
Jump to latest
#1Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com

Hi,

I was recently asked about converting an offset reported in WAL read
error messages[1]errmsg("could not read from WAL segment %s, offset %u: %m", errmsg("could not read from WAL segment %s, offset %u: %m", errmsg("could not write to log file %s " "at offset %u, length %zu: %m", errmsg("unexpected timeline ID %u in WAL segment %s, offset %u", errmsg("could not read from WAL segment %s, offset %u: read %d of %zu", pg_log_error("received write-ahead log record for offset %u with no file open", "invalid magic number %04X in WAL segment %s, offset %u", "invalid info bits %04X in WAL segment %s, offset %u", "invalid info bits %04X in WAL segment %s, offset %u", "unexpected pageaddr %X/%X in WAL segment %s, offset %u", "out-of-sequence timeline ID %u (after %u) in WAL segment %s, offset %u", to an LSN with which pg_waldump can be used to
verify the records or WAL file around that LSN (basically one can
filter out the output based on LSN). AFAICS, there's no function that
takes offset as an input and produces an LSN and I ended up figuring
out LSN manually. And, for some of my work too, I was hitting errors
in XLogReaderValidatePageHeader() and adding recptr to those error
messages helped me debug issues faster.

We have a bunch of messages [1]errmsg("could not read from WAL segment %s, offset %u: %m", errmsg("could not read from WAL segment %s, offset %u: %m", errmsg("could not write to log file %s " "at offset %u, length %zu: %m", errmsg("unexpected timeline ID %u in WAL segment %s, offset %u", errmsg("could not read from WAL segment %s, offset %u: read %d of %zu", pg_log_error("received write-ahead log record for offset %u with no file open", "invalid magic number %04X in WAL segment %s, offset %u", "invalid info bits %04X in WAL segment %s, offset %u", "invalid info bits %04X in WAL segment %s, offset %u", "unexpected pageaddr %X/%X in WAL segment %s, offset %u", "out-of-sequence timeline ID %u (after %u) in WAL segment %s, offset %u", that have an offset, but not LSN in
the error message. Firstly, is there an easiest way to figure out LSN
from offset reported in the error messages? If not, is adding LSN to
these messages along with offset a good idea? Of course, we can't just
convert offset to LSN using XLogSegNoOffsetToRecPtr() and report, but
something meaningful like reporting the LSN of the page that we are
reading-in or writing-out etc.

Thoughts?

[1]: errmsg("could not read from WAL segment %s, offset %u: %m", errmsg("could not read from WAL segment %s, offset %u: %m", errmsg("could not write to log file %s " "at offset %u, length %zu: %m", errmsg("unexpected timeline ID %u in WAL segment %s, offset %u", errmsg("could not read from WAL segment %s, offset %u: read %d of %zu", pg_log_error("received write-ahead log record for offset %u with no file open", "invalid magic number %04X in WAL segment %s, offset %u", "invalid info bits %04X in WAL segment %s, offset %u", "invalid info bits %04X in WAL segment %s, offset %u", "unexpected pageaddr %X/%X in WAL segment %s, offset %u", "out-of-sequence timeline ID %u (after %u) in WAL segment %s, offset %u",
errmsg("could not read from WAL segment %s, offset %u: %m",
errmsg("could not read from WAL segment %s, offset %u: %m",
errmsg("could not write to log file %s "
"at offset %u, length %zu: %m",
errmsg("unexpected timeline ID %u in WAL segment %s, offset %u",
errmsg("could not read from WAL segment %s, offset %u: read %d of %zu",
pg_log_error("received write-ahead log record for offset %u with no file open",
"invalid magic number %04X in WAL segment %s, offset %u",
"invalid info bits %04X in WAL segment %s, offset %u",
"invalid info bits %04X in WAL segment %s, offset %u",
"unexpected pageaddr %X/%X in WAL segment %s, offset %u",
"out-of-sequence timeline ID %u (after %u) in WAL segment %s, offset %u",

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#2Nathan Bossart
nathandbossart@gmail.com
In reply to: Bharath Rupireddy (#1)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Mon, Sep 19, 2022 at 07:26:57PM +0530, Bharath Rupireddy wrote:

We have a bunch of messages [1] that have an offset, but not LSN in
the error message. Firstly, is there an easiest way to figure out LSN
from offset reported in the error messages? If not, is adding LSN to
these messages along with offset a good idea? Of course, we can't just
convert offset to LSN using XLogSegNoOffsetToRecPtr() and report, but
something meaningful like reporting the LSN of the page that we are
reading-in or writing-out etc.

It seems like you want the opposite of pg_walfile_name_offset(). Perhaps
we could add a function like pg_walfile_offset_lsn() that accepts a WAL
file name and byte offset and returns the LSN.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#3Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#2)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Mon, Sep 19, 2022 at 03:16:42PM -0700, Nathan Bossart wrote:

It seems like you want the opposite of pg_walfile_name_offset(). Perhaps
we could add a function like pg_walfile_offset_lsn() that accepts a WAL
file name and byte offset and returns the LSN.

Like so...

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-introduce-pg_walfile_offset_lsn.patchtext/x-diff; charset=us-asciiDownload+86-1
#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bharath Rupireddy (#1)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On 2022-Sep-19, Bharath Rupireddy wrote:

We have a bunch of messages [1] that have an offset, but not LSN in
the error message. Firstly, is there an easiest way to figure out LSN
from offset reported in the error messages? If not, is adding LSN to
these messages along with offset a good idea? Of course, we can't just
convert offset to LSN using XLogSegNoOffsetToRecPtr() and report, but
something meaningful like reporting the LSN of the page that we are
reading-in or writing-out etc.

Maybe add errcontext() somewhere that reports the LSN would be
appropriate. For example, the page_read() callbacks have the LSN
readily available, so the ones in backend could install the errcontext
callback; or perhaps ReadPageInternal can do it #ifndef FRONTEND. Not
sure what is best of those options, but either of those sounds better
than sticking the LSN in a lower-level routine that doesn't necessarily
have the info already.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#5Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nathan Bossart (#3)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Tue, Sep 20, 2022 at 9:02 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Mon, Sep 19, 2022 at 03:16:42PM -0700, Nathan Bossart wrote:

It seems like you want the opposite of pg_walfile_name_offset(). Perhaps
we could add a function like pg_walfile_offset_lsn() that accepts a WAL
file name and byte offset and returns the LSN.

Like so...

Yeah, something like this will be handy for sure, but I'm not sure if
we want this to be in core. Let's hear from others.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#6Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#4)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Tue, Sep 20, 2022 at 12:57 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-Sep-19, Bharath Rupireddy wrote:

We have a bunch of messages [1] that have an offset, but not LSN in
the error message. Firstly, is there an easiest way to figure out LSN
from offset reported in the error messages? If not, is adding LSN to
these messages along with offset a good idea? Of course, we can't just
convert offset to LSN using XLogSegNoOffsetToRecPtr() and report, but
something meaningful like reporting the LSN of the page that we are
reading-in or writing-out etc.

Maybe add errcontext() somewhere that reports the LSN would be
appropriate. For example, the page_read() callbacks have the LSN
readily available, so the ones in backend could install the errcontext
callback; or perhaps ReadPageInternal can do it #ifndef FRONTEND. Not
sure what is best of those options, but either of those sounds better
than sticking the LSN in a lower-level routine that doesn't necessarily
have the info already.

All of the error messages [1]errmsg("could not read from WAL segment %s, offset %u: %m", errmsg("could not read from WAL segment %s, offset %u: %m", errmsg("could not write to log file %s " "at offset %u, length %zu: %m", errmsg("unexpected timeline ID %u in WAL segment %s, offset %u", errmsg("could not read from WAL segment %s, offset %u: read %d of %zu", pg_log_error("received write-ahead log record for offset %u with no file open", "invalid magic number %04X in WAL segment %s, offset %u", "invalid info bits %04X in WAL segment %s, offset %u", "invalid info bits %04X in WAL segment %s, offset %u", "unexpected pageaddr %X/%X in WAL segment %s, offset %u", "out-of-sequence timeline ID %u (after %u) in WAL segment %s, offset %u", have the LSN from which offset was
calculated, I think we can just append that to the error messages
(something like ".... offset %u, LSN %X/%X: %m") and not complicate
it. Thoughts?

[1]: errmsg("could not read from WAL segment %s, offset %u: %m", errmsg("could not read from WAL segment %s, offset %u: %m", errmsg("could not write to log file %s " "at offset %u, length %zu: %m", errmsg("unexpected timeline ID %u in WAL segment %s, offset %u", errmsg("could not read from WAL segment %s, offset %u: read %d of %zu", pg_log_error("received write-ahead log record for offset %u with no file open", "invalid magic number %04X in WAL segment %s, offset %u", "invalid info bits %04X in WAL segment %s, offset %u", "invalid info bits %04X in WAL segment %s, offset %u", "unexpected pageaddr %X/%X in WAL segment %s, offset %u", "out-of-sequence timeline ID %u (after %u) in WAL segment %s, offset %u",
errmsg("could not read from WAL segment %s, offset %u: %m",
errmsg("could not read from WAL segment %s, offset %u: %m",
errmsg("could not write to log file %s "
"at offset %u, length %zu: %m",
errmsg("unexpected timeline ID %u in WAL segment %s, offset %u",
errmsg("could not read from WAL segment %s, offset %u: read %d of %zu",
pg_log_error("received write-ahead log record for offset %u with no file open",
"invalid magic number %04X in WAL segment %s, offset %u",
"invalid info bits %04X in WAL segment %s, offset %u",
"invalid info bits %04X in WAL segment %s, offset %u",
"unexpected pageaddr %X/%X in WAL segment %s, offset %u",
"out-of-sequence timeline ID %u (after %u) in WAL segment %s, offset %u",

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#7Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bharath Rupireddy (#6)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

At Tue, 20 Sep 2022 17:40:36 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in

On Tue, Sep 20, 2022 at 12:57 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2022-Sep-19, Bharath Rupireddy wrote:

We have a bunch of messages [1] that have an offset, but not LSN in
the error message. Firstly, is there an easiest way to figure out LSN
from offset reported in the error messages? If not, is adding LSN to
these messages along with offset a good idea? Of course, we can't just
convert offset to LSN using XLogSegNoOffsetToRecPtr() and report, but
something meaningful like reporting the LSN of the page that we are
reading-in or writing-out etc.

Maybe add errcontext() somewhere that reports the LSN would be
appropriate. For example, the page_read() callbacks have the LSN
readily available, so the ones in backend could install the errcontext
callback; or perhaps ReadPageInternal can do it #ifndef FRONTEND. Not
sure what is best of those options, but either of those sounds better
than sticking the LSN in a lower-level routine that doesn't necessarily
have the info already.

All of the error messages [1] have the LSN from which offset was
calculated, I think we can just append that to the error messages
(something like ".... offset %u, LSN %X/%X: %m") and not complicate
it. Thoughts?

If all error-emitting site knows the LSN, we don't need the context
message. But *I* would like that the additional message looks like
"while reading record at LSN %X/%X" or slightly shorter version of
it. Because the targetRecPtr is the beginning of the current reading
record, not the LSN for the segment and offset. It may point to past
segments.

[1]
errmsg("could not read from WAL segment %s, offset %u: %m",
errmsg("could not read from WAL segment %s, offset %u: %m",
errmsg("could not write to log file %s "
"at offset %u, length %zu: %m",
errmsg("unexpected timeline ID %u in WAL segment %s, offset %u",
errmsg("could not read from WAL segment %s, offset %u: read %d of %zu",
pg_log_error("received write-ahead log record for offset %u with no file open",
"invalid magic number %04X in WAL segment %s, offset %u",
"invalid info bits %04X in WAL segment %s, offset %u",
"invalid info bits %04X in WAL segment %s, offset %u",
"unexpected pageaddr %X/%X in WAL segment %s, offset %u",
"out-of-sequence timeline ID %u (after %u) in WAL segment %s, offset %u",

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#8Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Kyotaro Horiguchi (#7)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Tue, Sep 27, 2022 at 8:31 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

If all error-emitting site knows the LSN, we don't need the context
message. But *I* would like that the additional message looks like
"while reading record at LSN %X/%X" or slightly shorter version of
it. Because the targetRecPtr is the beginning of the current reading
record, not the LSN for the segment and offset. It may point to past
segments.

I think we could just say "LSN %X/%X, offset %u", because the overall
context whether it's being read or written is implicit with the other
part of the message.

Please see the attached v1 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-Add-LSN-along-with-offset-to-error-messages-repor.patchapplication/x-patch; name=v1-0001-Add-LSN-along-with-offset-to-error-messages-repor.patchDownload+32-17
#9Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#8)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Thu, Sep 29, 2022 at 7:43 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Please see the attached v1 patch.

FWIW, I'm attaching Nathan's patch that introduced the new function
pg_walfile_offset_lsn as 0002 in the v1 patch set. Here's the CF entry
- https://commitfest.postgresql.org/40/3909/.

Please consider this for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-Add-LSN-along-with-offset-to-error-messages-repor.patchapplication/x-patch; name=v1-0001-Add-LSN-along-with-offset-to-error-messages-repor.patchDownload+32-17
v1-0002-introduce-pg_walfile_offset_lsn.patchapplication/x-patch; name=v1-0002-introduce-pg_walfile_offset_lsn.patchDownload+86-1
#10Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#9)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Tue, Oct 4, 2022 at 2:58 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

On Thu, Sep 29, 2022 at 7:43 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Please see the attached v1 patch.

FWIW, I'm attaching Nathan's patch that introduced the new function
pg_walfile_offset_lsn as 0002 in the v1 patch set. Here's the CF entry
- https://commitfest.postgresql.org/40/3909/.

Please consider this for further review.

I'm attaching the v2 patch set after rebasing on to the latest HEAD.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-Add-LSN-along-with-offset-to-error-messages-repor.patchapplication/x-patch; name=v2-0001-Add-LSN-along-with-offset-to-error-messages-repor.patchDownload+32-17
v2-0002-Introduce-pg_walfile_offset_lsn.patchapplication/x-patch; name=v2-0002-Introduce-pg_walfile_offset_lsn.patchDownload+86-1
#11Maxim Orlov
orlovmg@gmail.com
In reply to: Bharath Rupireddy (#10)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

Hi!

I've watched over the patch and consider it useful. Applies without
conflicts. The functionality of the patch itself is
meets declared functionality.

But, in my view, some improvements may be proposed. We should be more,
let's say, cautious (or a paranoid if you wish),
in pg_walfile_offset_lsn while dealing with user input values. What I mean
by that is:
- to init input vars of sscanf, i.e. tli, log andseg;
- check the return value of sscanf call;
- check filename max length.

Another question arises for me: is this function can be called during
recovery? If not, we should ereport about this, should we?

And one last note here: pg_walfile_offset_lsn is accepting NULL values and
return NULL in this case. From a theoretical
point of view, this is perfectly fine. Actually, I think this is exactly
how it supposed to be, but I'm not sure if there are no other opinions here.
--
Best regards,
Maxim Orlov.

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bharath Rupireddy (#10)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

Hmm, in 0002, why not return the timeline from the LSN too? It seems a
waste not to have it.

+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("\"offset\" must not be negative or greater than or "
+						"equal to WAL segment size")));

I don't think the word offset should be in quotes; and please don't cut
the line. So I propose

errmsg("offset must not be negative or greater than or equal to the WAL segment size")));

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#13Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Alvaro Herrera (#12)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Fri, Nov 11, 2022 at 5:52 PM Maxim Orlov <orlovmg@gmail.com> wrote:

Hi!

I've watched over the patch and consider it useful. Applies without conflicts. The functionality of the patch itself is
meets declared functionality.

Thanks for reviewing.

But, in my view, some improvements may be proposed. We should be more, let's say, cautious (or a paranoid if you wish),
in pg_walfile_offset_lsn while dealing with user input values. What I mean by that is:
- to init input vars of sscanf, i.e. tli, log andseg;
- check the return value of sscanf call;
- check filename max length.

IsXLogFileName() will take care of this. Also, I've added a new inline
function XLogIdFromFileName() that parses file name and returns log
and seg along with XLogSegNo and timeline id. This new function avoids
an extra sscanf() call as well.

Another question arises for me: is this function can be called during recovery? If not, we should ereport about this, should we?

It's just a utility function and doesn't depend on any of the server's
current state (unlike pg_walfile_name()), hence it doesn't matter if
this function is called during recovery.

And one last note here: pg_walfile_offset_lsn is accepting NULL values and return NULL in this case. From a theoretical
point of view, this is perfectly fine. Actually, I think this is exactly how it supposed to be, but I'm not sure if there are no other opinions here.

These functions are marked as 'STRICT', meaning a null is returned,
without even calling the function, if any of the input parameters is
null. I think we can keep the behaviour the same as its friends.

On Fri, Nov 11, 2022 at 11:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Thanks for reviewing.

Hmm, in 0002, why not return the timeline from the LSN too? It seems a
waste not to have it.

Yeah, that actually makes sense. We might be tempted to return segno
too, but it's not something that we emit to the user elsewhere,
whereas we emit timeline id.

+               ereport(ERROR,
+                               (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+                                errmsg("\"offset\" must not be negative or greater than or "
+                                               "equal to WAL segment size")));

I don't think the word offset should be in quotes; and please don't cut
the line. So I propose

errmsg("offset must not be negative or greater than or equal to the WAL segment size")));

Changed.

While on this, I noticed that the pg_walfile_name_offset() isn't
covered in tests. I took an opportunity and added a simple test case
along with pg_walfile_offset_lsn().

I'm attaching the v3 patch set for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-Add-LSN-along-with-offset-to-error-messages-repor.patchapplication/x-patch; name=v3-0001-Add-LSN-along-with-offset-to-error-messages-repor.patchDownload+32-17
v3-0002-Introduce-pg_walfile_offset_lsn.patchapplication/x-patch; name=v3-0002-Introduce-pg_walfile_offset_lsn.patchDownload+130-1
#14Maxim Orlov
orlovmg@gmail.com
In reply to: Bharath Rupireddy (#13)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Tue, 15 Nov 2022 at 13:02, Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:

On Fri, Nov 11, 2022 at 5:52 PM Maxim Orlov <orlovmg@gmail.com> wrote:

But, in my view, some improvements may be proposed. We should be more,

let's say, cautious (or a paranoid if you wish),

in pg_walfile_offset_lsn while dealing with user input values. What I

mean by that is:

- to init input vars of sscanf, i.e. tli, log andseg;
- check the return value of sscanf call;
- check filename max length.

IsXLogFileName() will take care of this. Also, I've added a new inline
function XLogIdFromFileName() that parses file name and returns log
and seg along with XLogSegNo and timeline id. This new function avoids
an extra sscanf() call as well.

Another question arises for me: is this function can be called during

recovery? If not, we should ereport about this, should we?

It's just a utility function and doesn't depend on any of the server's
current state (unlike pg_walfile_name()), hence it doesn't matter if
this function is called during recovery.

And one last note here: pg_walfile_offset_lsn is accepting NULL values

and return NULL in this case. From a theoretical

point of view, this is perfectly fine. Actually, I think this is exactly

how it supposed to be, but I'm not sure if there are no other opinions here.

These functions are marked as 'STRICT', meaning a null is returned,
without even calling the function, if any of the input parameters is
null. I think we can keep the behaviour the same as its friends.

Thanks for the explanations. I think you are right.

errmsg("offset must not be negative or greater than or equal to the

WAL segment size")));

Changed.

Confirm. And a timeline_id is added.

While on this, I noticed that the pg_walfile_name_offset() isn't
covered in tests. I took an opportunity and added a simple test case
along with pg_walfile_offset_lsn().

I'm attaching the v3 patch set for further review.

Great job! We should mark this patch as RFC, shall we?

--
Best regards,
Maxim Orlov.

#15Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Maxim Orlov (#14)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Wed, Nov 16, 2022 at 6:55 PM Maxim Orlov <orlovmg@gmail.com> wrote:

These functions are marked as 'STRICT', meaning a null is returned,
without even calling the function, if any of the input parameters is
null. I think we can keep the behaviour the same as its friends.

Thanks for the explanations. I think you are right.

Confirm. And a timeline_id is added.

While on this, I noticed that the pg_walfile_name_offset() isn't
covered in tests. I took an opportunity and added a simple test case
along with pg_walfile_offset_lsn().

I'm attaching the v3 patch set for further review.

Great job! We should mark this patch as RFC, shall we?

Please do, if you feel so. Thanks for your review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#16Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#15)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Thu, Nov 17, 2022 at 11:53:23AM +0530, Bharath Rupireddy wrote:

Please do, if you feel so. Thanks for your review.

I don't really mind the addition of the LSN when operating on a given
record where we are reading a location, like in the five error paths
for the header validation or the three ones in ReadRecord()

Now this one looks confusing:
+       XLogSegNoOffsetToRecPtr(openLogSegNo, startoffset,
+                               wal_segment_size, lsn);
        ereport(PANIC,
                (errcode_for_file_access(),
                 errmsg("could not write to log file %s "
-                       "at offset %u, length %zu: %m",
-                       xlogfname, startoffset, nleft)));
+                       "at offset %u, LSN %X/%X, length %zu: %m",
+                       xlogfname, startoffset,
+                       LSN_FORMAT_ARGS(lsn), nleft)));

This does not always refer to an exact LSN of a record as we may be in
the middle of a write, so I would leave it as-is.

Similarly the addition of wre_lsn would be confusing? The offset
looks kind of enough to me when referring to the middle of a page in
WALReadRaiseError().
--
Michael

#17Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#16)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Fri, Dec 2, 2022 at 12:50 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Nov 17, 2022 at 11:53:23AM +0530, Bharath Rupireddy wrote:

Please do, if you feel so. Thanks for your review.

I don't really mind the addition of the LSN when operating on a given
record where we are reading a location, like in the five error paths
for the header validation or the three ones in ReadRecord()

Thanks for reviewing.

Now this one looks confusing:
+       XLogSegNoOffsetToRecPtr(openLogSegNo, startoffset,
+                               wal_segment_size, lsn);
ereport(PANIC,
(errcode_for_file_access(),
errmsg("could not write to log file %s "
-                       "at offset %u, length %zu: %m",
-                       xlogfname, startoffset, nleft)));
+                       "at offset %u, LSN %X/%X, length %zu: %m",
+                       xlogfname, startoffset,
+                       LSN_FORMAT_ARGS(lsn), nleft)));

This does not always refer to an exact LSN of a record as we may be in
the middle of a write, so I would leave it as-is.

Similarly the addition of wre_lsn would be confusing? The offset
looks kind of enough to me when referring to the middle of a page in
WALReadRaiseError().

Yes, I removed those changes. Even if someone sees an offset of a
record within a WAL file elsewhere, they have the new utility function
(0002) pg_walfile_offset_lsn().

I'm attaching the v4 patch set for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v4-0001-Emit-record-LSN-in-WAL-read-and-validate-error-me.patchapplication/x-patch; name=v4-0001-Emit-record-LSN-in-WAL-read-and-validate-error-me.patchDownload+18-11
v4-0002-Add-new-a-function-to-convert-offset-to-LSN.patchapplication/x-patch; name=v4-0002-Add-new-a-function-to-convert-offset-to-LSN.patchDownload+130-1
#18Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#17)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Sat, Dec 03, 2022 at 09:07:38AM +0530, Bharath Rupireddy wrote:

Yes, I removed those changes. Even if someone sees an offset of a
record within a WAL file elsewhere, they have the new utility function
(0002) pg_walfile_offset_lsn().

I'm attaching the v4 patch set for further review.

+ * Compute an LSN and timline ID given a WAL file name and decimal byte offset.
s/timline/timeline/, exactly two times

+   Datum       values[2] = {0};
+   bool        isnull[2] = {0};
I would not hardcode the number of attributes of the record returned.

Regarding pg_walfile_offset_lsn(), I am not sure that this is the best
move we can do as it is possible to compile a LSN from 0/0 with just a
segment number, say:
select '0/0'::pg_lsn + :segno * setting::int + :offset
from pg_settings where name = 'wal_segment_size';

+   resultTupleDesc = CreateTemplateTupleDesc(2);
+   TupleDescInitEntry(resultTupleDesc, (AttrNumber) 1, "lsn",
+                      PG_LSNOID, -1, 0);
+   TupleDescInitEntry(resultTupleDesc, (AttrNumber) 2, "timeline_id",
+                      INT4OID, -1, 0);
Let's use get_call_result_type() to get the TupleDesc and to avoid a
duplication between pg_proc.dat and this code.

Hence I would tend to let XLogFromFileName do the job, while having a
SQL function that is just a thin wrapper around it that returns the
segment TLI and its number, leaving the offset out of the equation as
well as this new XLogIdFromFileName().
--
Michael

#19Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#18)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Mon, Dec 5, 2022 at 6:34 AM Michael Paquier <michael@paquier.xyz> wrote:

Regarding pg_walfile_offset_lsn(), I am not sure that this is the best
move we can do as it is possible to compile a LSN from 0/0 with just a
segment number, say:
select '0/0'::pg_lsn + :segno * setting::int + :offset
from pg_settings where name = 'wal_segment_size';

Nice.

+   resultTupleDesc = CreateTemplateTupleDesc(2);
+   TupleDescInitEntry(resultTupleDesc, (AttrNumber) 1, "lsn",
+                      PG_LSNOID, -1, 0);
+   TupleDescInitEntry(resultTupleDesc, (AttrNumber) 2, "timeline_id",
+                      INT4OID, -1, 0);
Let's use get_call_result_type() to get the TupleDesc and to avoid a
duplication between pg_proc.dat and this code.

Hence I would tend to let XLogFromFileName do the job, while having a
SQL function that is just a thin wrapper around it that returns the
segment TLI and its number, leaving the offset out of the equation as
well as this new XLogIdFromFileName().

So, a SQL function pg_dissect_walfile_name (or some other better name)
given a WAL file name returns the tli and seg number. Then the
pg_walfile_offset_lsn can just be a SQL-defined function (in
system_functions.sql) using this new function and pg_settings. If this
understanding is correct, it looks good to me at this point.

That said, let's also hear from others.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#20Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#19)
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

On Mon, Dec 05, 2022 at 08:48:25AM +0530, Bharath Rupireddy wrote:

So, a SQL function pg_dissect_walfile_name (or some other better name)
given a WAL file name returns the tli and seg number. Then the
pg_walfile_offset_lsn can just be a SQL-defined function (in
system_functions.sql) using this new function and pg_settings. If this
understanding is correct, it looks good to me at this point.

I would do without the SQL function that looks at pg_settings, FWIW.

That said, let's also hear from others.

Sure. Perhaps my set of suggestions will not get the majority,
though..
--
Michael

#21Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Michael Paquier (#18)
#22Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Kyotaro Horiguchi (#21)
#23Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#22)
#24Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#21)
#25Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#23)
#26Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#25)
#27Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Bharath Rupireddy (#26)
#28Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#27)
#29Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#28)
#30Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#29)
#31Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Magnus Hagander (#30)
#32Ian Lawrence Barwick
barwick@gmail.com
In reply to: Bharath Rupireddy (#31)
#33Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#31)
#34Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#33)
#35Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#33)
#36Michael Paquier
michael@paquier.xyz
In reply to: Magnus Hagander (#35)
#37Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#36)
#38Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bharath Rupireddy (#37)
#39Michael Paquier
michael@paquier.xyz
In reply to: Kyotaro Horiguchi (#38)
#40Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Michael Paquier (#39)
#41Michael Paquier
michael@paquier.xyz
In reply to: Bharath Rupireddy (#40)
#42Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#41)