Two noncritical bugs of pg_waldump

Started by Kyotaro Horiguchialmost 4 years ago9 messages
#1Kyotaro Horiguchi
horikyota.ntt@gmail.com
1 attachment(s)

Hello.

pg_waldump complains at the end in any case. I noticed that the LSN
it shows in the finish message is incorrect. (I faintly thought that
I posted about this but I didn't find it..)

pg_waldump: fatal: error in WAL record at 0/15073F8: invalid record length at 0/1507470: wanted 24, got 0

xlogreader found the error at the record begins at 1507470, but
pg_waldump tells that error happens at 15073F8, which is actually the
beginning of the last sound record.

If I give an empty file to the tool it complains as the follows.

pg_waldump: fatal: could not read file "hoge": No such file or directory

No, the file exists. The cause is it reads uninitialized errno to
detect errors from the system call. read(2) is defined to set errno
always when it returns -1 and doesn't otherwise. Thus it seems to me
that it is better to check that the return value is less than zero
than to clear errno before the call to read().

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Fix-errornious-messages-of-pg_waldump.patchtext/x-patch; charset=us-asciiDownload
From 773f14f01c4a5cfd334d0a80778aa819c55c1cb7 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 27 Jan 2022 09:47:10 +0900
Subject: [PATCH] Fix errornious messages of pg_waldump

Fix pg_waldump to give the LSN where read error happens instead of the
LSN of the last record successfully read. And fix to give the correct
error message for zero-sized blocks instead of a bogus system-call
error message.
---
 src/bin/pg_waldump/pg_waldump.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index a6251e1a96..58dc4044b5 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -222,15 +222,12 @@ search_directory(const char *directory, const char *fname)
 									 WalSegSz),
 							fname, WalSegSz);
 		}
+		else if (r < 0)
+			fatal_error("could not read file \"%s\": %m",
+						fname);
 		else
-		{
-			if (errno != 0)
-				fatal_error("could not read file \"%s\": %m",
-							fname);
-			else
-				fatal_error("could not read file \"%s\": read %d of %d",
-							fname, r, XLOG_BLCKSZ);
-		}
+			fatal_error("could not read file \"%s\": read %d of %d",
+						fname, r, XLOG_BLCKSZ);
 		close(fd);
 		return true;
 	}
@@ -1177,7 +1174,7 @@ main(int argc, char **argv)
 
 	if (errormsg)
 		fatal_error("error in WAL record at %X/%X: %s",
-					LSN_FORMAT_ARGS(xlogreader_state->ReadRecPtr),
+					LSN_FORMAT_ARGS(xlogreader_state->EndRecPtr),
 					errormsg);
 
 	XLogReaderFree(xlogreader_state);
-- 
2.27.0

#2Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#1)
Re: Two noncritical bugs of pg_waldump

On Thu, Jan 27, 2022 at 10:07:38AM +0900, Kyotaro Horiguchi wrote:

pg_waldump complains at the end in any case. I noticed that the LSN
it shows in the finish message is incorrect. (I faintly thought that
I posted about this but I didn't find it..)

Is this thread [0]/messages/by-id/2B4510B2-3D70-4990-BFE3-0FE64041C08A@amazon.com what you are remembering?

[0]: /messages/by-id/2B4510B2-3D70-4990-BFE3-0FE64041C08A@amazon.com

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com/

#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#2)
Re: Two noncritical bugs of pg_waldump

At Wed, 26 Jan 2022 17:25:14 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in

On Thu, Jan 27, 2022 at 10:07:38AM +0900, Kyotaro Horiguchi wrote:

pg_waldump complains at the end in any case. I noticed that the LSN
it shows in the finish message is incorrect. (I faintly thought that
I posted about this but I didn't find it..)

Is this thread [0] what you are remembering?

[0] /messages/by-id/2B4510B2-3D70-4990-BFE3-0FE64041C08A@amazon.com

Maybe exactly. I rememberd the discussion.

So the issue there is neither EndRecPtr and ReadRecPtr always points
to the current read LSN. The first proposal from Nathen was to use
currRecPtr but it was a private member. But after discussion, it
seems to me it is (at least now) exactly what we need here so if we
ofccially exposed the member the problem would be blown away. Do you
want to conitnue that?

Otherwise, I think we need to add a comment there about the known
issue.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#3)
Re: Two noncritical bugs of pg_waldump

(this is off-topic)

At Thu, 27 Jan 2022 13:23:06 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Wed, 26 Jan 2022 17:25:14 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in

On Thu, Jan 27, 2022 at 10:07:38AM +0900, Kyotaro Horiguchi wrote:

pg_waldump complains at the end in any case. I noticed that the LSN
it shows in the finish message is incorrect. (I faintly thought that
I posted about this but I didn't find it..)

Is this thread [0] what you are remembering?

[0] /messages/by-id/2B4510B2-3D70-4990-BFE3-0FE64041C08A@amazon.com

Maybe exactly. I rememberd the discussion.

So the issue there is neither EndRecPtr and ReadRecPtr always points
to the current read LSN. The first proposal from Nathen was to use

Mmm. Sorry for my fat finger, Nathan.

currRecPtr but it was a private member. But after discussion, it
seems to me it is (at least now) exactly what we need here so if we
ofccially exposed the member the problem would be blown away. Do you
want to conitnue that?

Otherwise, I think we need to add a comment there about the known
issue.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#5Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#3)
Re: Two noncritical bugs of pg_waldump

On Thu, Jan 27, 2022 at 01:23:06PM +0900, Kyotaro Horiguchi wrote:

So the issue there is neither EndRecPtr and ReadRecPtr always points
to the current read LSN. The first proposal from Nathen was to use
currRecPtr but it was a private member. But after discussion, it
seems to me it is (at least now) exactly what we need here so if we
ofccially exposed the member the problem would be blown away. Do you
want to conitnue that?

Presumably there is a good reason for keeping it private, but if not, maybe
that is a reasonable approach to consider. I don't think we wanted to
proceed with the approaches discussed on the old thread, anyway.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com/

#6Nathan Bossart
nathandbossart@gmail.com
In reply to: Kyotaro Horiguchi (#4)
Re: Two noncritical bugs of pg_waldump

On Thu, Jan 27, 2022 at 01:27:36PM +0900, Kyotaro Horiguchi wrote:

At Thu, 27 Jan 2022 13:23:06 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

So the issue there is neither EndRecPtr and ReadRecPtr always points
to the current read LSN. The first proposal from Nathen was to use

Mmm. Sorry for my fat finger, Nathan.

It's okay. :)

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com/

#7Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#1)
1 attachment(s)
Re: Two noncritical bugs of pg_waldump

Hmm..

At Thu, 27 Jan 2022 10:07:38 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

pg_waldump complains at the end in any case. I noticed that the LSN
it shows in the finish message is incorrect. (I faintly thought that
I posted about this but I didn't find it..)

pg_waldump: fatal: error in WAL record at 0/15073F8: invalid record length at 0/1507470: wanted 24, got 0

xlogreader found the error at the record begins at 1507470, but
pg_waldump tells that error happens at 15073F8, which is actually the
beginning of the last sound record.

It is arguable, but the following is indisputable.

If I give an empty file to the tool it complains as the follows.

pg_waldump: fatal: could not read file "hoge": No such file or directory

No, the file exists. The cause is it reads uninitialized errno to
detect errors from the system call. read(2) is defined to set errno
always when it returns -1 and doesn't otherwise. Thus it seems to me
that it is better to check that the return value is less than zero
than to clear errno before the call to read().

So I post a patch contains only the indisputable part.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

v1-0001-Fix-incorrect-error-handling-of-pg_waldump.patchtext/x-patch; charset=us-asciiDownload
From f43cf938b0686dc406a1c81863a6415de1190b40 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Mon, 14 Feb 2022 18:11:19 +0900
Subject: [PATCH v1] Fix incorrect error handling of pg_waldump

search_directory of pg_waldump.c puts a wrong assumption about the
return value from read(2) and results in a bogus error message.  It
should check (r < 0) instead of (r != XLOG_BLCKSZ) to correctly detect
errors that %m work correctly on.
---
 src/bin/pg_waldump/pg_waldump.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index a6251e1a96..2340dc247b 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -222,15 +222,12 @@ search_directory(const char *directory, const char *fname)
 									 WalSegSz),
 							fname, WalSegSz);
 		}
+		else if (r < 0)
+			fatal_error("could not read file \"%s\": %m",
+						fname);
 		else
-		{
-			if (errno != 0)
-				fatal_error("could not read file \"%s\": %m",
-							fname);
-			else
-				fatal_error("could not read file \"%s\": read %d of %d",
-							fname, r, XLOG_BLCKSZ);
-		}
+			fatal_error("could not read file \"%s\": read %d of %d",
+						fname, r, XLOG_BLCKSZ);
 		close(fd);
 		return true;
 	}
-- 
2.27.0

#8Andres Freund
andres@anarazel.de
In reply to: Kyotaro Horiguchi (#7)
Re: Two noncritical bugs of pg_waldump

Hi,

On 2022-02-14 18:18:47 +0900, Kyotaro Horiguchi wrote:

If I give an empty file to the tool it complains as the follows.

pg_waldump: fatal: could not read file "hoge": No such file or directory

No, the file exists. The cause is it reads uninitialized errno to
detect errors from the system call. read(2) is defined to set errno
always when it returns -1 and doesn't otherwise. Thus it seems to me
that it is better to check that the return value is less than zero
than to clear errno before the call to read().

So I post a patch contains only the indisputable part.

Thanks for the report and fix. Pushed. This was surprisingly painful, all but
one branch had conflicts...

Greetings,

Andres Freund

#9Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Andres Freund (#8)
Re: Two noncritical bugs of pg_waldump

At Fri, 25 Feb 2022 10:48:47 -0800, Andres Freund <andres@anarazel.de> wrote in

Hi,

On 2022-02-14 18:18:47 +0900, Kyotaro Horiguchi wrote:

If I give an empty file to the tool it complains as the follows.

pg_waldump: fatal: could not read file "hoge": No such file or directory

No, the file exists. The cause is it reads uninitialized errno to
detect errors from the system call. read(2) is defined to set errno
always when it returns -1 and doesn't otherwise. Thus it seems to me
that it is better to check that the return value is less than zero
than to clear errno before the call to read().

So I post a patch contains only the indisputable part.

Thanks for the report and fix. Pushed. This was surprisingly painful, all but
one branch had conflicts...

Ah, I didn't expect that this is committed so quickly. I should have
created patches for all versions. Anyway thanks for committing this!

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center