XLogBeginRead's header comment lies

Started by Robert Haasover 3 years ago7 messages
#1Robert Haas
robertmhaas@gmail.com

It claims that:

* 'RecPtr' should point to the beginning of a valid WAL record. Pointing at
* the beginning of a page is also OK, if there is a new record right after
* the page header, i.e. not a continuation.

But this actually doesn't seem to work. This function doesn't itself
have any problem with any LSNs you want to pass it, so if you call
this function with an LSN that is at the beginning of a page, you'll
end up with EndRecPtr set to the LSN you specify and DecodeRecPtr set
to NULL. When you then call XLogReadRecord, you'll reach
XLogDecodeNextRecord, which will do this:

if (state->DecodeRecPtr != InvalidXLogRecPtr)
{
/* read the record after the one we just read */

/*
* NextRecPtr is pointing to end+1 of the previous WAL record. If
* we're at a page boundary, no more records can fit on the current
* page. We must skip over the page header, but we can't do that until
* we've read in the page, since the header size is variable.
*/
}
else
{
/*
* Caller supplied a position to start at.
*
* In this case, NextRecPtr should already be pointing to a valid
* record starting position.
*/
Assert(XRecOffIsValid(RecPtr));
randAccess = true;
}

Since DecodeRecPtr is NULL, you take the else branch, and then you
fail an assertion.

I tried adding a --beginread argument to pg_waldump (patch attached)
to further verify this:

[rhaas pgsql]$ pg_waldump -n1
/Users/rhaas/pgstandby/pg_wal/0000000200000005000000A0
rmgr: Heap len (rec/tot): 72/ 72, tx: 5778572, lsn:
5/A0000028, prev 5/9FFFFFB8, desc: HOT_UPDATE off 39 xmax 5778572
flags 0x20 ; new off 62 xmax 0, blkref #0: rel 1663/16388/16402 blk 1
[rhaas pgsql]$ pg_waldump -n1 --beginread
/Users/rhaas/pgstandby/pg_wal/0000000200000005000000A0
Assertion failed: (((RecPtr) % 8192 >= (((uintptr_t)
((sizeof(XLogPageHeaderData))) + ((8) - 1)) & ~((uintptr_t) ((8) -
1))))), function XLogDecodeNextRecord, file xlogreader.c, line 582.
Abort trap: 6 (core dumped)

The WAL record begins at offset 0x28 in the block, which I believe is
the length of a long page header, so this is indeed a WAL segment that
begins with a brand new record, not a continuation record.

There are two ways we could fix this, I believe. One is to correct the
comment at the start of XLogBeginRead() to reflect the way things
actually work at present. The other is to correct the code to do what
the header comment claims. I would prefer the latter, because I'd like
to be able to use the EndRecPtr of the last record read by one
xlogreader as the starting point for a new xlogreader created at a
later time. I've found that, when there's no record spanning the block
boundary, the EndRecPtr points to the start of the next block, not the
start of the first record in the next block. I could dodge the problem
here by just always using XLogFindNextRecord() rather than
XLogBeginRecord(), but I'd actually like it to go boom if I somehow
end up trying to start from an LSN that's in the middle of a record
somewhere (or the middle of the page header) because those cases
shouldn't happen. But if I just have an LSN that happens to be the
start of the block header rather than the start of the record that
follows the block header, I'd like that case to be tolerated, because
the LSN I'm using came from the xlogreader machinery.

Thoughts?

--
Robert Haas
EDB: http://www.enterprisedb.com

#2Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#1)
1 attachment(s)
Re: XLogBeginRead's header comment lies

Forgot the attachment.

Attachments:

waldump-beginread.patchapplication/octet-stream; name=waldump-beginread.patchDownload
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 6528113628..069c388510 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -700,6 +700,7 @@ main(int argc, char **argv)
 	XLogRecPtr	first_record;
 	char	   *waldir = NULL;
 	char	   *errormsg;
+	bool		beginread = false;
 
 	static struct option long_options[] = {
 		{"bkp-details", no_argument, NULL, 'b'},
@@ -719,6 +720,7 @@ main(int argc, char **argv)
 		{"xid", required_argument, NULL, 'x'},
 		{"version", no_argument, NULL, 'V'},
 		{"stats", optional_argument, NULL, 'z'},
+		{"beginread", no_argument, NULL, 'X'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -927,6 +929,9 @@ main(int argc, char **argv)
 				}
 				config.filter_by_xid_enabled = true;
 				break;
+			case 'X':
+				beginread = true;
+				break;
 			case 'z':
 				config.stats = true;
 				config.stats_per_record = false;
@@ -1073,6 +1078,15 @@ main(int argc, char **argv)
 	if (!xlogreader_state)
 		pg_fatal("out of memory while allocating a WAL reading processor");
 
+	if (beginread)
+	{
+		XLogBeginRead(xlogreader_state, private.startptr);
+		first_record = private.startptr;
+	}
+	else
+	{
+	/* NOT REINDENTED - THIS PATCH IS JUST FOR DEBUGGING */
+
 	/* first find a valid recptr to start from */
 	first_record = XLogFindNextRecord(xlogreader_state, private.startptr);
 
@@ -1094,6 +1108,9 @@ main(int argc, char **argv)
 			   LSN_FORMAT_ARGS(first_record),
 			   (uint32) (first_record - private.startptr));
 
+	/* NOT REINDENTED - THIS PATCH IS JUST FOR DEBUGGING */
+	}
+
 	if (config.stats == true && !config.quiet)
 		stats.startptr = first_record;
 
#3Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#1)
1 attachment(s)
Re: XLogBeginRead's header comment lies

On Tue, Aug 16, 2022 at 11:28 PM Robert Haas <robertmhaas@gmail.com> wrote:

It claims that:

* 'RecPtr' should point to the beginning of a valid WAL record. Pointing at
* the beginning of a page is also OK, if there is a new record right after
* the page header, i.e. not a continuation.

But this actually doesn't seem to work. This function doesn't itself
have any problem with any LSNs you want to pass it, so if you call
this function with an LSN that is at the beginning of a page, you'll
end up with EndRecPtr set to the LSN you specify and DecodeRecPtr set
to NULL. When you then call XLogReadRecord, you'll reach
XLogDecodeNextRecord, which will do this:

if (state->DecodeRecPtr != InvalidXLogRecPtr)
{
/* read the record after the one we just read */

/*
* NextRecPtr is pointing to end+1 of the previous WAL record. If
* we're at a page boundary, no more records can fit on the current
* page. We must skip over the page header, but we can't do that until
* we've read in the page, since the header size is variable.
*/
}
else
{
/*
* Caller supplied a position to start at.
*
* In this case, NextRecPtr should already be pointing to a valid
* record starting position.
*/
Assert(XRecOffIsValid(RecPtr));
randAccess = true;
}

Since DecodeRecPtr is NULL, you take the else branch, and then you
fail an assertion.

I tried adding a --beginread argument to pg_waldump (patch attached)
to further verify this:

[rhaas pgsql]$ pg_waldump -n1
/Users/rhaas/pgstandby/pg_wal/0000000200000005000000A0
rmgr: Heap len (rec/tot): 72/ 72, tx: 5778572, lsn:
5/A0000028, prev 5/9FFFFFB8, desc: HOT_UPDATE off 39 xmax 5778572
flags 0x20 ; new off 62 xmax 0, blkref #0: rel 1663/16388/16402 blk 1
[rhaas pgsql]$ pg_waldump -n1 --beginread
/Users/rhaas/pgstandby/pg_wal/0000000200000005000000A0
Assertion failed: (((RecPtr) % 8192 >= (((uintptr_t)
((sizeof(XLogPageHeaderData))) + ((8) - 1)) & ~((uintptr_t) ((8) -
1))))), function XLogDecodeNextRecord, file xlogreader.c, line 582.
Abort trap: 6 (core dumped)

The WAL record begins at offset 0x28 in the block, which I believe is
the length of a long page header, so this is indeed a WAL segment that
begins with a brand new record, not a continuation record.

There are two ways we could fix this, I believe. One is to correct the
comment at the start of XLogBeginRead() to reflect the way things
actually work at present. The other is to correct the code to do what
the header comment claims. I would prefer the latter, because I'd like
to be able to use the EndRecPtr of the last record read by one
xlogreader as the starting point for a new xlogreader created at a
later time. I've found that, when there's no record spanning the block
boundary, the EndRecPtr points to the start of the next block, not the
start of the first record in the next block. I could dodge the problem
here by just always using XLogFindNextRecord() rather than
XLogBeginRecord(), but I'd actually like it to go boom if I somehow
end up trying to start from an LSN that's in the middle of a record
somewhere (or the middle of the page header) because those cases
shouldn't happen. But if I just have an LSN that happens to be the
start of the block header rather than the start of the record that
follows the block header, I'd like that case to be tolerated, because
the LSN I'm using came from the xlogreader machinery.

Thoughts?

Yeah I think it makes sense to make it work as per the comment in
XLogBeginRecord(). I think if we modify the Assert as per the comment
of XLogBeginRecord() then the remaining code of the
XLogDecodeNextRecord() is capable enough to take care of skipping the
page header if we are pointing at the beginning of the block.

See attached patch.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

0001-Adjust-XLogDecodeNextRecord-assert-to-match-with-XLo.patchtext/x-patch; charset=US-ASCII; name=0001-Adjust-XLogDecodeNextRecord-assert-to-match-with-XLo.patchDownload
From aa3a39d8bf0e10f3664bdb8c9ca009f17c46d30d Mon Sep 17 00:00:00 2001
From: Dilip Kumar <dilip.kumar@enterprisedb.com>
Date: Wed, 17 Aug 2022 11:15:58 +0530
Subject: [PATCH] Adjust XLogDecodeNextRecord assert to match with
 XLogBeginRead

---
 src/backend/access/transam/xlogreader.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 06e9154..4c901ad 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -577,9 +577,10 @@ XLogDecodeNextRecord(XLogReaderState *state, bool nonblocking)
 		 * Caller supplied a position to start at.
 		 *
 		 * In this case, NextRecPtr should already be pointing to a valid
-		 * record starting position.
+		 * record starting position or to beginning of a page, refer comments
+		 * atop XLogBeginRead.
 		 */
-		Assert(XRecOffIsValid(RecPtr));
+		Assert(RecPtr % XLOG_BLCKSZ == 0 || XRecOffIsValid(RecPtr));
 		randAccess = true;
 	}
 
-- 
1.8.3.1

#4Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#3)
Re: XLogBeginRead's header comment lies

On Wed, Aug 17, 2022 at 11:18 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Aug 16, 2022 at 11:28 PM Robert Haas <robertmhaas@gmail.com> wrote:

Yeah I think it makes sense to make it work as per the comment in
XLogBeginRecord(). I think if we modify the Assert as per the comment
of XLogBeginRecord() then the remaining code of the
XLogDecodeNextRecord() is capable enough to take care of skipping the
page header if we are pointing at the beginning of the block.

See attached patch.

I think that is not sufficient, if there is a record continuing from
the previous page and we are pointing to the start of the page then
this assertion is not sufficient. I think if the
targetRecOff is zero then we should additionally read the header and
verify that XLP_FIRST_IS_CONTRECORD is not set.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#5Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#4)
Re: XLogBeginRead's header comment lies

On Wed, Aug 17, 2022 at 11:31 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Aug 17, 2022 at 11:18 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Aug 16, 2022 at 11:28 PM Robert Haas <robertmhaas@gmail.com> wrote:

Yeah I think it makes sense to make it work as per the comment in
XLogBeginRecord(). I think if we modify the Assert as per the comment
of XLogBeginRecord() then the remaining code of the
XLogDecodeNextRecord() is capable enough to take care of skipping the
page header if we are pointing at the beginning of the block.

See attached patch.

I think that is not sufficient, if there is a record continuing from
the previous page and we are pointing to the start of the page then
this assertion is not sufficient. I think if the
targetRecOff is zero then we should additionally read the header and
verify that XLP_FIRST_IS_CONTRECORD is not set.

Thinking again, there is already a code in XLogDecodeNextRecord() to
error out if XLP_FIRST_IS_CONTRECORD is set so probably we don't need
to do anything else and the previous patch with modified assert should
just work fine?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#6Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#5)
Re: XLogBeginRead's header comment lies

On Wed, Aug 17, 2022 at 6:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Thinking again, there is already a code in XLogDecodeNextRecord() to
error out if XLP_FIRST_IS_CONTRECORD is set so probably we don't need
to do anything else and the previous patch with modified assert should
just work fine?

Yeah, that looks right to me. I'm inclined to commit your patch with
some changes to wording of the comments. I'm also inclined not to
back-patch, since we don't know that this breaks anything for existing
users of the xlogreader facility.

If anyone doesn't want this committed or does want it back-patched,
please speak up.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

#7Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#6)
Re: XLogBeginRead's header comment lies

On Wed, Aug 17, 2022 at 10:57 AM Robert Haas <robertmhaas@gmail.com> wrote:

Yeah, that looks right to me. I'm inclined to commit your patch with
some changes to wording of the comments. I'm also inclined not to
back-patch, since we don't know that this breaks anything for existing
users of the xlogreader facility.

Done.

--
Robert Haas
EDB: http://www.enterprisedb.com