Two noncritical bugs of pg_waldump

Started by Kyotaro Horiguchiabout 4 years ago9 messageshackers
Jump to latest
#1Kyotaro Horiguchi
horikyota.ntt@gmail.com

Hello.

pg_waldump complains at the end in any case. I noticed that the LSN
it shows in the finish message is incorrect. (I faintly thought that
I posted about this but I didn't find it..)

pg_waldump: fatal: error in WAL record at 0/15073F8: invalid record length at 0/1507470: wanted 24, got 0

xlogreader found the error at the record begins at 1507470, but
pg_waldump tells that error happens at 15073F8, which is actually the
beginning of the last sound record.

If I give an empty file to the tool it complains as the follows.

pg_waldump: fatal: could not read file "hoge": No such file or directory

No, the file exists. The cause is it reads uninitialized errno to
detect errors from the system call. read(2) is defined to set errno
always when it returns -1 and doesn't otherwise. Thus it seems to me
that it is better to check that the return value is less than zero
than to clear errno before the call to read().

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Fix-errornious-messages-of-pg_waldump.patchtext/x-patch; charset=us-asciiDownload+6-10
#2Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#1)
Re: Two noncritical bugs of pg_waldump

On Thu, Jan 27, 2022 at 10:07:38AM +0900, Kyotaro Horiguchi wrote:

pg_waldump complains at the end in any case. I noticed that the LSN
it shows in the finish message is incorrect. (I faintly thought that
I posted about this but I didn't find it..)

Is this thread [0]/messages/by-id/2B4510B2-3D70-4990-BFE3-0FE64041C08A@amazon.com what you are remembering?

[0]: /messages/by-id/2B4510B2-3D70-4990-BFE3-0FE64041C08A@amazon.com

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com/

#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#2)
Re: Two noncritical bugs of pg_waldump

At Wed, 26 Jan 2022 17:25:14 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in

On Thu, Jan 27, 2022 at 10:07:38AM +0900, Kyotaro Horiguchi wrote:

pg_waldump complains at the end in any case. I noticed that the LSN
it shows in the finish message is incorrect. (I faintly thought that
I posted about this but I didn't find it..)

Is this thread [0] what you are remembering?

[0] /messages/by-id/2B4510B2-3D70-4990-BFE3-0FE64041C08A@amazon.com

Maybe exactly. I rememberd the discussion.

So the issue there is neither EndRecPtr and ReadRecPtr always points
to the current read LSN. The first proposal from Nathen was to use
currRecPtr but it was a private member. But after discussion, it
seems to me it is (at least now) exactly what we need here so if we
ofccially exposed the member the problem would be blown away. Do you
want to conitnue that?

Otherwise, I think we need to add a comment there about the known
issue.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#3)
Re: Two noncritical bugs of pg_waldump

(this is off-topic)

At Thu, 27 Jan 2022 13:23:06 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Wed, 26 Jan 2022 17:25:14 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in

On Thu, Jan 27, 2022 at 10:07:38AM +0900, Kyotaro Horiguchi wrote:

pg_waldump complains at the end in any case. I noticed that the LSN
it shows in the finish message is incorrect. (I faintly thought that
I posted about this but I didn't find it..)

Is this thread [0] what you are remembering?

[0] /messages/by-id/2B4510B2-3D70-4990-BFE3-0FE64041C08A@amazon.com

Maybe exactly. I rememberd the discussion.

So the issue there is neither EndRecPtr and ReadRecPtr always points
to the current read LSN. The first proposal from Nathen was to use

Mmm. Sorry for my fat finger, Nathan.

currRecPtr but it was a private member. But after discussion, it
seems to me it is (at least now) exactly what we need here so if we
ofccially exposed the member the problem would be blown away. Do you
want to conitnue that?

Otherwise, I think we need to add a comment there about the known
issue.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#5Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#3)
Re: Two noncritical bugs of pg_waldump

On Thu, Jan 27, 2022 at 01:23:06PM +0900, Kyotaro Horiguchi wrote:

So the issue there is neither EndRecPtr and ReadRecPtr always points
to the current read LSN. The first proposal from Nathen was to use
currRecPtr but it was a private member. But after discussion, it
seems to me it is (at least now) exactly what we need here so if we
ofccially exposed the member the problem would be blown away. Do you
want to conitnue that?

Presumably there is a good reason for keeping it private, but if not, maybe
that is a reasonable approach to consider. I don't think we wanted to
proceed with the approaches discussed on the old thread, anyway.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com/

#6Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#4)
Re: Two noncritical bugs of pg_waldump

On Thu, Jan 27, 2022 at 01:27:36PM +0900, Kyotaro Horiguchi wrote:

At Thu, 27 Jan 2022 13:23:06 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

So the issue there is neither EndRecPtr and ReadRecPtr always points
to the current read LSN. The first proposal from Nathen was to use

Mmm. Sorry for my fat finger, Nathan.

It's okay. :)

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com/

#7Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#1)
Re: Two noncritical bugs of pg_waldump

Hmm..

At Thu, 27 Jan 2022 10:07:38 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

pg_waldump complains at the end in any case. I noticed that the LSN
it shows in the finish message is incorrect. (I faintly thought that
I posted about this but I didn't find it..)

pg_waldump: fatal: error in WAL record at 0/15073F8: invalid record length at 0/1507470: wanted 24, got 0

xlogreader found the error at the record begins at 1507470, but
pg_waldump tells that error happens at 15073F8, which is actually the
beginning of the last sound record.

It is arguable, but the following is indisputable.

If I give an empty file to the tool it complains as the follows.

pg_waldump: fatal: could not read file "hoge": No such file or directory

No, the file exists. The cause is it reads uninitialized errno to
detect errors from the system call. read(2) is defined to set errno
always when it returns -1 and doesn't otherwise. Thus it seems to me
that it is better to check that the return value is less than zero
than to clear errno before the call to read().

So I post a patch contains only the indisputable part.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

v1-0001-Fix-incorrect-error-handling-of-pg_waldump.patchtext/x-patch; charset=us-asciiDownload+5-9
#8Andres Freund
andres@anarazel.de
In reply to: Kyotaro Horiguchi (#7)
Re: Two noncritical bugs of pg_waldump

Hi,

On 2022-02-14 18:18:47 +0900, Kyotaro Horiguchi wrote:

If I give an empty file to the tool it complains as the follows.

pg_waldump: fatal: could not read file "hoge": No such file or directory

No, the file exists. The cause is it reads uninitialized errno to
detect errors from the system call. read(2) is defined to set errno
always when it returns -1 and doesn't otherwise. Thus it seems to me
that it is better to check that the return value is less than zero
than to clear errno before the call to read().

So I post a patch contains only the indisputable part.

Thanks for the report and fix. Pushed. This was surprisingly painful, all but
one branch had conflicts...

Greetings,

Andres Freund

#9Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Andres Freund (#8)
Re: Two noncritical bugs of pg_waldump

At Fri, 25 Feb 2022 10:48:47 -0800, Andres Freund <andres@anarazel.de> wrote in

Hi,

On 2022-02-14 18:18:47 +0900, Kyotaro Horiguchi wrote:

If I give an empty file to the tool it complains as the follows.

pg_waldump: fatal: could not read file "hoge": No such file or directory

No, the file exists. The cause is it reads uninitialized errno to
detect errors from the system call. read(2) is defined to set errno
always when it returns -1 and doesn't otherwise. Thus it seems to me
that it is better to check that the return value is less than zero
than to clear errno before the call to read().

So I post a patch contains only the indisputable part.

Thanks for the report and fix. Pushed. This was surprisingly painful, all but
one branch had conflicts...

Ah, I didn't expect that this is committed so quickly. I should have
created patches for all versions. Anyway thanks for committing this!

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center