Track Oldest Initialized WAL Buffer Page

Started by Bharath Rupireddyalmost 3 years ago8 messages
#1Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
1 attachment(s)

Hi,

While working on [1]/messages/by-id/CALj2ACXKKK=wbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54+Na=Q@mail.gmail.com, I was looking for a quick way to tell if a WAL
record is present in the WAL buffers array without scanning but I
couldn't find one. Hence, I put up a patch that basically tracks the
oldest initialized WAL buffer page, named OldestInitializedPage, in
XLogCtl. With OldestInitializedPage, we can easily illustrate WAL
buffers array properties:

1) At any given point of time, pages in the WAL buffers array are
sorted in an ascending order from OldestInitializedPage till
InitializedUpTo. Note that we verify this property for assert-only
builds, see IsXLogBuffersArraySorted() in the patch for more details.

2) OldestInitializedPage is monotonically increasing (by virtue of how
postgres generates WAL records), that is, its value never decreases.
This property lets someone read its value without a lock. There's no
problem even if its value is slightly stale i.e. concurrently being
updated. One can still use it for finding if a given WAL record is
available in WAL buffers. At worst, one might get false positives
(i.e. OldestInitializedPage may tell that the WAL record is available
in WAL buffers, but when one actually looks at it, it isn't really
available). This is more efficient and performant than acquiring a
lock for reading. Note that we may not need a lock to read
OldestInitializedPage but we need to update it holding
WALBufMappingLock.

3) One can start traversing WAL buffers from OldestInitializedPage
till InitializedUpTo to list out all valid WAL records and stats, and
expose them via SQL-callable functions to users, for instance, as
pg_walinspect functions.

4) WAL buffers array is inherently organized as a circular, sorted and
rotated array with OldestInitializedPage as pivot/first element of the
array with the property where LSN of previous buffer page (if valid)
is greater than OldestInitializedPage and LSN of the next buffer page
(if
valid) is greater than OldestInitializedPage.

Thoughts?

[1]: /messages/by-id/CALj2ACXKKK=wbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54+Na=Q@mail.gmail.com

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-Track-Oldest-Initialized-WAL-Buffer-Page.patchapplication/x-patch; name=v1-0001-Track-Oldest-Initialized-WAL-Buffer-Page.patchDownload
From 683f8f2764970ca5737039577d64c91e491908d6 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Mon, 6 Feb 2023 06:58:54 +0000
Subject: [PATCH v1] Track Oldest Initialized WAL Buffer Page

---
 src/backend/access/transam/xlog.c | 170 ++++++++++++++++++++++++++++++
 src/include/access/xlog.h         |   1 +
 2 files changed, 171 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f9f0f6db8d..7625907542 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -504,6 +504,44 @@ typedef struct XLogCtlData
 	XLogRecPtr *xlblocks;		/* 1st byte ptr-s + XLOG_BLCKSZ */
 	int			XLogCacheBlck;	/* highest allocated xlog buffer index */
 
+	/*
+	 * Start address of oldest initialized page in XLog buffers.
+	 *
+	 * We mainly track oldest initialized page explicitly to quickly tell if a
+	 * given WAL record is available in XLog buffers. It also can be used for
+	 * other purposes, see notes below.
+	 *
+	 * OldestInitializedPage gives XLog buffers following properties:
+	 *
+	 * 1) At any given point of time, pages in XLog buffers array are sorted in
+	 * an ascending order from OldestInitializedPage till InitializedUpTo.
+	 * Note that we verify this property for assert-only builds, see
+	 * IsXLogBuffersArraySorted() for more details.
+	 *
+	 * 2) OldestInitializedPage is monotonically increasing (by virtue of how
+	 * postgres generates WAL records), that is, its value never decreases.
+	 * This property lets someone read its value without a lock. There's no
+	 * problem even if its value is slightly stale i.e. concurrently being
+	 * updated. One can still use it for finding if a given WAL record is
+	 * available in XLog buffers. At worst, one might get false positives (i.e.
+	 * OldestInitializedPage may tell that the WAL record is available in XLog
+	 * buffers, but when one actually looks at it, it isn't really available).
+	 * This is more efficient and performant than acquiring a lock for reading.
+	 * Note that we may not need a lock to read OldestInitializedPage but we
+	 * need to update it holding WALBufMappingLock.
+	 *
+	 * 3) One can start traversing XLog buffers from OldestInitializedPage till
+	 * InitializedUpTo to list out all valid WAL records and stats, and expose
+	 * them via SQL-callable functions to users.
+	 *
+	 * 4) XLog buffers array is inherently organized as a circular, sorted and
+	 * rotated array with OldestInitializedPage as pivot with the property
+	 * where LSN of previous buffer page (if valid) is greater than
+	 * OldestInitializedPage and LSN of next buffer page (if valid) is greater
+	 * than OldestInitializedPage.
+	 */
+	XLogRecPtr	OldestInitializedPage;
+
 	/*
 	 * InsertTimeLineID is the timeline into which new WAL is being inserted
 	 * and flushed. It is zero during recovery, and does not change once set.
@@ -580,6 +618,10 @@ static ControlFileData *ControlFile = NULL;
 #define NextBufIdx(idx)		\
 		(((idx) == XLogCtl->XLogCacheBlck) ? 0 : ((idx) + 1))
 
+/* Macro to retreat to previous buffer index. */
+#define PreviousBufIdx(idx)		\
+		(((idx) == 0) ? XLogCtl->XLogCacheBlck : ((idx) - 1))
+
 /*
  * XLogRecPtrToBufIdx returns the index of the WAL buffer that holds, or
  * would hold if it was in cache, the page containing 'recptr'.
@@ -698,6 +740,10 @@ static void WALInsertLockAcquireExclusive(void);
 static void WALInsertLockRelease(void);
 static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
 
+#ifdef USE_ASSERT_CHECKING
+static bool IsXLogBuffersArraySorted(void);
+#endif
+
 /*
  * Insert an XLOG record represented by an already-constructed chain of data
  * chunks.  This is a low-level routine; to construct the WAL record header
@@ -1925,6 +1971,53 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
 		XLogCtl->InitializedUpTo = NewPageEndPtr;
 
 		npages++;
+
+		/*
+		 * Try updating oldest initialized XLog buffer page.
+		 *
+		 * Update it if we are initializing an XLog buffer page for the first
+		 * time or if XLog buffers are full and we are wrapping around.
+		 */
+		if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
+			(!XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) &&
+			 XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx))
+		{
+			Assert(XLogCtl->OldestInitializedPage < NewPageBeginPtr);
+
+			XLogCtl->OldestInitializedPage = NewPageBeginPtr;
+		}
+
+		/*
+		 * Check some properties about XLog buffers array. We essentially
+		 * perform these checks as asserts to avoid extra costs.
+		 *
+		 * XXX: Perhaps these extra checks are too much for an assert build, so
+		 * placing them under WAL_DEBUG might be worth trying.
+		 */
+
+		/* OldestInitializedPage must have already been initialized. */
+		Assert(!XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage));
+
+		/*
+		 * OldestInitializedPage is always a starting address of XLog buffer
+		 * page.
+		 */
+		Assert((XLogCtl->OldestInitializedPage % XLOG_BLCKSZ) == 0);
+
+		/*
+		 * OldestInitializedPage and InitializedUpTo are always starting and
+		 * ending addresses of (same or different) XLog buffer page
+		 * respectively. Hence, they can never be same even if there's only one
+		 * initialized page in XLog buffers.
+		 */
+		Assert(XLogCtl->OldestInitializedPage != XLogCtl->InitializedUpTo);
+
+		/*
+		 * At any given point of time, pages in XLog buffers array are sorted
+		 * in an ascending order from OldestInitializedPage till
+		 * InitializedUpTo.
+		 */
+		Assert(IsXLogBuffersArraySorted());
 	}
 	LWLockRelease(WALBufMappingLock);
 
@@ -4616,6 +4709,7 @@ XLOGShmemInit(void)
 	XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
 	XLogCtl->InstallXLogFileSegmentActive = false;
 	XLogCtl->WalWriterSleeping = false;
+	XLogCtl->OldestInitializedPage = InvalidXLogRecPtr;
 
 	SpinLockInit(&XLogCtl->Insert.insertpos_lck);
 	SpinLockInit(&XLogCtl->info_lck);
@@ -5622,6 +5716,14 @@ StartupXLOG(void)
 
 		XLogCtl->xlblocks[firstIdx] = endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ;
 		XLogCtl->InitializedUpTo = endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ;
+		XLogCtl->OldestInitializedPage = endOfRecoveryInfo->lastPageBeginPtr;
+
+		/*
+		 * OldestInitializedPage is always a starting address of XLog buffer
+		 * page.
+		 */
+		Assert(!XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage));
+		Assert((XLogCtl->OldestInitializedPage % XLOG_BLCKSZ) == 0);
 	}
 	else
 	{
@@ -8931,3 +9033,71 @@ SetWalWriterSleeping(bool sleeping)
 	XLogCtl->WalWriterSleeping = sleeping;
 	SpinLockRelease(&XLogCtl->info_lck);
 }
+
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Returns whether or not XLog buffers array is sorted.
+ *
+ * XXX: Perhaps this function is too much for an assert build, so placing it
+ * under WAL_DEBUG might be worth trying.
+ */
+static bool
+IsXLogBuffersArraySorted(void)
+{
+	int	start;
+	int	end;
+	int	current;
+	int	next;
+	XLogRecPtr CurrentPage;
+	XLogRecPtr	NextPage;
+
+	start = XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage);
+	end = XLogRecPtrToBufIdx(XLogCtl->InitializedUpTo - XLOG_BLCKSZ);
+
+	if (start == end)
+		return true;
+
+	current = start;
+
+	while (current != end)
+	{
+		CurrentPage = XLogCtl->xlblocks[current];
+
+		next = NextBufIdx(current);
+		NextPage = XLogCtl->xlblocks[next];
+
+		if (!XLogRecPtrIsInvalid(NextPage) &&
+			CurrentPage > NextPage)
+			return false;
+
+		current = next;
+	}
+
+	Assert(XLogCtl->xlblocks[current] == XLogCtl->xlblocks[end]);
+
+	return true;
+}
+#endif
+
+/*
+ * Returns whether or not a given WAL record is available in XLog buffers.
+ *
+ * Note that we don't read OldestInitializedPage under a lock, see description
+ * near its definition in xlog.c for more details.
+ *
+ * Note that caller needs to pass in an LSN known to the server, not a future
+ * or unwritten or unflushed LSN.
+ */
+bool
+IsWALRecordAvailableInXLogBuffers(XLogRecPtr lsn)
+{
+	if (!XLogRecPtrIsInvalid(lsn) &&
+		!XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) &&
+		lsn >= XLogCtl->OldestInitializedPage &&
+		lsn < XLogCtl->InitializedUpTo)
+	{
+		return true;
+	}
+
+	return false;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index cfe5409738..2afa53008e 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -257,6 +257,7 @@ extern void ReachedEndOfBackup(XLogRecPtr EndRecPtr, TimeLineID tli);
 extern void SetInstallXLogFileSegmentActive(void);
 extern bool IsInstallXLogFileSegmentActive(void);
 extern void XLogShutdownWalRcv(void);
+extern bool	IsWALRecordAvailableInXLogBuffers(XLogRecPtr lsn);
 
 /*
  * Routines to start, stop, and get status of a base backup.
-- 
2.34.1

#2Nathan Bossart
nathandbossart@gmail.com
In reply to: Bharath Rupireddy (#1)
Re: Track Oldest Initialized WAL Buffer Page

On Tue, Feb 07, 2023 at 07:30:00PM +0530, Bharath Rupireddy wrote:

+		/*
+		 * Try updating oldest initialized XLog buffer page.
+		 *
+		 * Update it if we are initializing an XLog buffer page for the first
+		 * time or if XLog buffers are full and we are wrapping around.
+		 */
+		if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
+			(!XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) &&
+			 XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx))
+		{
+			Assert(XLogCtl->OldestInitializedPage < NewPageBeginPtr);
+
+			XLogCtl->OldestInitializedPage = NewPageBeginPtr;
+		}

nitpick: I think you can simplify the conditional to

if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx)

It's confusing to me that OldestInitializedPage is set to NewPageBeginPtr.
Doesn't that set it to the beginning of the newest initialized page?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#3Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nathan Bossart (#2)
1 attachment(s)
Re: Track Oldest Initialized WAL Buffer Page

On Tue, Feb 28, 2023 at 5:52 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Tue, Feb 07, 2023 at 07:30:00PM +0530, Bharath Rupireddy wrote:

+             /*
+              * Try updating oldest initialized XLog buffer page.
+              *
+              * Update it if we are initializing an XLog buffer page for the first
+              * time or if XLog buffers are full and we are wrapping around.
+              */
+             if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
+                     (!XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) &&
+                      XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx))
+             {
+                     Assert(XLogCtl->OldestInitializedPage < NewPageBeginPtr);
+
+                     XLogCtl->OldestInitializedPage = NewPageBeginPtr;
+             }

nitpick: I think you can simplify the conditional to

if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx)

Oh, yes, done that.

It's confusing to me that OldestInitializedPage is set to NewPageBeginPtr.
Doesn't that set it to the beginning of the newest initialized page?

Yes, that's the intention, see below. OldestInitializedPage points to
the start address of the oldest initialized page whereas the
InitializedUpTo points to the end address of the latest initialized
page. With this, one can easily track all the WAL between
OldestInitializedPage and InitializedUpTo.

+        /*
+         * OldestInitializedPage and InitializedUpTo are always starting and
+         * ending addresses of (same or different) XLog buffer page
+         * respectively. Hence, they can never be same even if there's only one
+         * initialized page in XLog buffers.
+         */
+        Assert(XLogCtl->OldestInitializedPage != XLogCtl->InitializedUpTo);

Thanks for looking at it. I'm attaching v2 patch with the above review
comment addressed for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-Track-Oldest-Initialized-WAL-Buffer-Page.patchapplication/x-patch; name=v2-0001-Track-Oldest-Initialized-WAL-Buffer-Page.patchDownload
From b20ddef1ca852f86cf309417e598f02ff91ff946 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 28 Feb 2023 05:22:20 +0000
Subject: [PATCH v2] Track Oldest Initialized WAL Buffer Page

---
 src/backend/access/transam/xlog.c | 169 ++++++++++++++++++++++++++++++
 src/include/access/xlog.h         |   1 +
 2 files changed, 170 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f9f0f6db8d..f4531d3fb5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -504,6 +504,44 @@ typedef struct XLogCtlData
 	XLogRecPtr *xlblocks;		/* 1st byte ptr-s + XLOG_BLCKSZ */
 	int			XLogCacheBlck;	/* highest allocated xlog buffer index */
 
+	/*
+	 * Start address of oldest initialized page in XLog buffers.
+	 *
+	 * We mainly track oldest initialized page explicitly to quickly tell if a
+	 * given WAL record is available in XLog buffers. It also can be used for
+	 * other purposes, see notes below.
+	 *
+	 * OldestInitializedPage gives XLog buffers following properties:
+	 *
+	 * 1) At any given point of time, pages in XLog buffers array are sorted in
+	 * an ascending order from OldestInitializedPage till InitializedUpTo.
+	 * Note that we verify this property for assert-only builds, see
+	 * IsXLogBuffersArraySorted() for more details.
+	 *
+	 * 2) OldestInitializedPage is monotonically increasing (by virtue of how
+	 * postgres generates WAL records), that is, its value never decreases.
+	 * This property lets someone read its value without a lock. There's no
+	 * problem even if its value is slightly stale i.e. concurrently being
+	 * updated. One can still use it for finding if a given WAL record is
+	 * available in XLog buffers. At worst, one might get false positives (i.e.
+	 * OldestInitializedPage may tell that the WAL record is available in XLog
+	 * buffers, but when one actually looks at it, it isn't really available).
+	 * This is more efficient and performant than acquiring a lock for reading.
+	 * Note that we may not need a lock to read OldestInitializedPage but we
+	 * need to update it holding WALBufMappingLock.
+	 *
+	 * 3) One can start traversing XLog buffers from OldestInitializedPage till
+	 * InitializedUpTo to list out all valid WAL records and stats, and expose
+	 * them via SQL-callable functions to users.
+	 *
+	 * 4) XLog buffers array is inherently organized as a circular, sorted and
+	 * rotated array with OldestInitializedPage as pivot with the property
+	 * where LSN of previous buffer page (if valid) is greater than
+	 * OldestInitializedPage and LSN of next buffer page (if valid) is greater
+	 * than OldestInitializedPage.
+	 */
+	XLogRecPtr	OldestInitializedPage;
+
 	/*
 	 * InsertTimeLineID is the timeline into which new WAL is being inserted
 	 * and flushed. It is zero during recovery, and does not change once set.
@@ -580,6 +618,10 @@ static ControlFileData *ControlFile = NULL;
 #define NextBufIdx(idx)		\
 		(((idx) == XLogCtl->XLogCacheBlck) ? 0 : ((idx) + 1))
 
+/* Macro to retreat to previous buffer index. */
+#define PreviousBufIdx(idx)		\
+		(((idx) == 0) ? XLogCtl->XLogCacheBlck : ((idx) - 1))
+
 /*
  * XLogRecPtrToBufIdx returns the index of the WAL buffer that holds, or
  * would hold if it was in cache, the page containing 'recptr'.
@@ -698,6 +740,10 @@ static void WALInsertLockAcquireExclusive(void);
 static void WALInsertLockRelease(void);
 static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
 
+#ifdef USE_ASSERT_CHECKING
+static bool IsXLogBuffersArraySorted(void);
+#endif
+
 /*
  * Insert an XLOG record represented by an already-constructed chain of data
  * chunks.  This is a low-level routine; to construct the WAL record header
@@ -1925,6 +1971,52 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
 		XLogCtl->InitializedUpTo = NewPageEndPtr;
 
 		npages++;
+
+		/*
+		 * Try updating oldest initialized XLog buffer page.
+		 *
+		 * Update it if we are initializing an XLog buffer page for the first
+		 * time or if XLog buffers are full and we are wrapping around.
+		 */
+		if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
+			XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx)
+		{
+			Assert(XLogCtl->OldestInitializedPage < NewPageBeginPtr);
+
+			XLogCtl->OldestInitializedPage = NewPageBeginPtr;
+		}
+
+		/*
+		 * Check some properties about XLog buffers array. We essentially
+		 * perform these checks as asserts to avoid extra costs.
+		 *
+		 * XXX: Perhaps these extra checks are too much for an assert build, so
+		 * placing them under WAL_DEBUG might be worth trying.
+		 */
+
+		/* OldestInitializedPage must have already been initialized. */
+		Assert(!XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage));
+
+		/*
+		 * OldestInitializedPage is always a starting address of XLog buffer
+		 * page.
+		 */
+		Assert((XLogCtl->OldestInitializedPage % XLOG_BLCKSZ) == 0);
+
+		/*
+		 * OldestInitializedPage and InitializedUpTo are always starting and
+		 * ending addresses of (same or different) XLog buffer page
+		 * respectively. Hence, they can never be same even if there's only one
+		 * initialized page in XLog buffers.
+		 */
+		Assert(XLogCtl->OldestInitializedPage != XLogCtl->InitializedUpTo);
+
+		/*
+		 * At any given point of time, pages in XLog buffers array are sorted
+		 * in an ascending order from OldestInitializedPage till
+		 * InitializedUpTo.
+		 */
+		Assert(IsXLogBuffersArraySorted());
 	}
 	LWLockRelease(WALBufMappingLock);
 
@@ -4616,6 +4708,7 @@ XLOGShmemInit(void)
 	XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
 	XLogCtl->InstallXLogFileSegmentActive = false;
 	XLogCtl->WalWriterSleeping = false;
+	XLogCtl->OldestInitializedPage = InvalidXLogRecPtr;
 
 	SpinLockInit(&XLogCtl->Insert.insertpos_lck);
 	SpinLockInit(&XLogCtl->info_lck);
@@ -5622,6 +5715,14 @@ StartupXLOG(void)
 
 		XLogCtl->xlblocks[firstIdx] = endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ;
 		XLogCtl->InitializedUpTo = endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ;
+		XLogCtl->OldestInitializedPage = endOfRecoveryInfo->lastPageBeginPtr;
+
+		/*
+		 * OldestInitializedPage is always a starting address of XLog buffer
+		 * page.
+		 */
+		Assert(!XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage));
+		Assert((XLogCtl->OldestInitializedPage % XLOG_BLCKSZ) == 0);
 	}
 	else
 	{
@@ -8931,3 +9032,71 @@ SetWalWriterSleeping(bool sleeping)
 	XLogCtl->WalWriterSleeping = sleeping;
 	SpinLockRelease(&XLogCtl->info_lck);
 }
+
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Returns whether or not XLog buffers array is sorted.
+ *
+ * XXX: Perhaps this function is too much for an assert build, so placing it
+ * under WAL_DEBUG might be worth trying.
+ */
+static bool
+IsXLogBuffersArraySorted(void)
+{
+	int	start;
+	int	end;
+	int	current;
+	int	next;
+	XLogRecPtr CurrentPage;
+	XLogRecPtr	NextPage;
+
+	start = XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage);
+	end = XLogRecPtrToBufIdx(XLogCtl->InitializedUpTo - XLOG_BLCKSZ);
+
+	if (start == end)
+		return true;
+
+	current = start;
+
+	while (current != end)
+	{
+		CurrentPage = XLogCtl->xlblocks[current];
+
+		next = NextBufIdx(current);
+		NextPage = XLogCtl->xlblocks[next];
+
+		if (!XLogRecPtrIsInvalid(NextPage) &&
+			CurrentPage > NextPage)
+			return false;
+
+		current = next;
+	}
+
+	Assert(XLogCtl->xlblocks[current] == XLogCtl->xlblocks[end]);
+
+	return true;
+}
+#endif
+
+/*
+ * Returns whether or not a given WAL record is available in XLog buffers.
+ *
+ * Note that we don't read OldestInitializedPage under a lock, see description
+ * near its definition in xlog.c for more details.
+ *
+ * Note that caller needs to pass in an LSN known to the server, not a future
+ * or unwritten or unflushed LSN.
+ */
+bool
+IsWALRecordAvailableInXLogBuffers(XLogRecPtr lsn)
+{
+	if (!XLogRecPtrIsInvalid(lsn) &&
+		!XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) &&
+		lsn >= XLogCtl->OldestInitializedPage &&
+		lsn < XLogCtl->InitializedUpTo)
+	{
+		return true;
+	}
+
+	return false;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index cfe5409738..2afa53008e 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -257,6 +257,7 @@ extern void ReachedEndOfBackup(XLogRecPtr EndRecPtr, TimeLineID tli);
 extern void SetInstallXLogFileSegmentActive(void);
 extern bool IsInstallXLogFileSegmentActive(void);
 extern void XLogShutdownWalRcv(void);
+extern bool	IsWALRecordAvailableInXLogBuffers(XLogRecPtr lsn);
 
 /*
  * Routines to start, stop, and get status of a base backup.
-- 
2.34.1

#4Nathan Bossart
nathandbossart@gmail.com
In reply to: Bharath Rupireddy (#3)
Re: Track Oldest Initialized WAL Buffer Page

On Tue, Feb 28, 2023 at 11:12:29AM +0530, Bharath Rupireddy wrote:

On Tue, Feb 28, 2023 at 5:52 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

It's confusing to me that OldestInitializedPage is set to NewPageBeginPtr.
Doesn't that set it to the beginning of the newest initialized page?

Yes, that's the intention, see below. OldestInitializedPage points to
the start address of the oldest initialized page whereas the
InitializedUpTo points to the end address of the latest initialized
page. With this, one can easily track all the WAL between
OldestInitializedPage and InitializedUpTo.

This is where I'm confused. Why would we set the variable for the start
address of the _oldest_ initialized page to the start address of the
_newest_ initialized page? I must be missing something obvious, so sorry
if this is a silly question.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#5Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Nathan Bossart (#4)
Re: Track Oldest Initialized WAL Buffer Page

On Wed, Mar 1, 2023 at 9:49 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Tue, Feb 28, 2023 at 11:12:29AM +0530, Bharath Rupireddy wrote:

On Tue, Feb 28, 2023 at 5:52 AM Nathan Bossart <nathandbossart@gmail.com> wrote:

It's confusing to me that OldestInitializedPage is set to NewPageBeginPtr.
Doesn't that set it to the beginning of the newest initialized page?

Yes, that's the intention, see below. OldestInitializedPage points to
the start address of the oldest initialized page whereas the
InitializedUpTo points to the end address of the latest initialized
page. With this, one can easily track all the WAL between
OldestInitializedPage and InitializedUpTo.

This is where I'm confused. Why would we set the variable for the start
address of the _oldest_ initialized page to the start address of the
_newest_ initialized page? I must be missing something obvious, so sorry
if this is a silly question.

That's the crux of the patch. Let me clarify it a bit.

Firstly, we try to set OldestInitializedPage at the end of the
recovery but that's conditional, that is, only when the last replayed
WAL record spans partially to the end block.

Secondly, we set OldestInitializedPage while initializing the page for
the first time, so the missed-conditional case above gets coverd too.

And, OldestInitializedPage isn't updated for every new initialized
page, only when the previous OldestInitializedPage is being reused
i.e. the wal_buffers are full and it wraps around. Please see the
comment and the condition
XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx which
holds true if we're crossing-over/wrapping around previous
OldestInitializedPage.

+        /*
+         * Try updating oldest initialized XLog buffer page.
+         *
+         * Update it if we are initializing an XLog buffer page for the first
+         * time or if XLog buffers are full and we are wrapping around.
+         */
+        if (XLogRecPtrIsInvalid(XLogCtl->OldestInitializedPage) ||
+            XLogRecPtrToBufIdx(XLogCtl->OldestInitializedPage) == nextidx)
+        {
+            Assert(XLogCtl->OldestInitializedPage < NewPageBeginPtr);
+
+            XLogCtl->OldestInitializedPage = NewPageBeginPtr;
+        }

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#6Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Bharath Rupireddy (#1)
Re: Track Oldest Initialized WAL Buffer Page

On 07/02/2023 16:00, Bharath Rupireddy wrote:

Hi,

While working on [1], I was looking for a quick way to tell if a WAL
record is present in the WAL buffers array without scanning but I
couldn't find one.

/* The end-ptr of the page that contains the record */
expectedEndPtr += XLOG_BLCKSZ - recptr % XLOG_BLCKSZ;

/* get the buffer where the record is, if it's in WAL buffers at all */
idx = XLogRecPtrToBufIdx(recptr);

/* prevent the WAL buffer from being evicted while we look at it */
LWLockAcquire(WALBufMappingLock, LW_SHARED);

/* Check if the page we're interested in is in the buffer */
found = XLogCtl->xlblocks[idx] == expectedEndPtr;

LWLockRelease(WALBufMappingLock, LW_SHARED);

Hence, I put up a patch that basically tracks the
oldest initialized WAL buffer page, named OldestInitializedPage, in
XLogCtl. With OldestInitializedPage, we can easily illustrate WAL
buffers array properties:

1) At any given point of time, pages in the WAL buffers array are
sorted in an ascending order from OldestInitializedPage till
InitializedUpTo. Note that we verify this property for assert-only
builds, see IsXLogBuffersArraySorted() in the patch for more details.

2) OldestInitializedPage is monotonically increasing (by virtue of how
postgres generates WAL records), that is, its value never decreases.
This property lets someone read its value without a lock. There's no
problem even if its value is slightly stale i.e. concurrently being
updated. One can still use it for finding if a given WAL record is
available in WAL buffers. At worst, one might get false positives
(i.e. OldestInitializedPage may tell that the WAL record is available
in WAL buffers, but when one actually looks at it, it isn't really
available). This is more efficient and performant than acquiring a
lock for reading. Note that we may not need a lock to read
OldestInitializedPage but we need to update it holding
WALBufMappingLock.

You actually hint at the above solution here, so I'm confused. If you're
OK with slightly stale results, you can skip the WALBufferMappingLock
above too, and perform an atomic read of xlblocks[idx] instead.

3) One can start traversing WAL buffers from OldestInitializedPage
till InitializedUpTo to list out all valid WAL records and stats, and
expose them via SQL-callable functions to users, for instance, as
pg_walinspect functions.

4) WAL buffers array is inherently organized as a circular, sorted and
rotated array with OldestInitializedPage as pivot/first element of the
array with the property where LSN of previous buffer page (if valid)
is greater than OldestInitializedPage and LSN of the next buffer page
(if
valid) is greater than OldestInitializedPage.

These properties are true, maybe we should document them explicitly in a
comment. But I don't see the point of tracking OldestInitializedPage. It
seems cheap enough that we could, if there's a need for it, but I don't
see the need.

--
Heikki Linnakangas
Neon (https://neon.tech)

#7Daniel Gustafsson
daniel@yesql.se
In reply to: Heikki Linnakangas (#6)
Re: Track Oldest Initialized WAL Buffer Page

On 3 Jul 2023, at 15:27, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

But I don't see the point of tracking OldestInitializedPage. It seems cheap enough that we could, if there's a need for it, but I don't see the need.

Based on the above comments, and the thread stalling, I am marking this
returned with feedback. Please feel free to continue the discussion here and
re-open a new entry in a future CF if there is a new version of the patch.

--
Daniel Gustafsson

#8Bharath Rupireddy
bharath.rupireddyforpostgres@gmail.com
In reply to: Heikki Linnakangas (#6)
Re: Track Oldest Initialized WAL Buffer Page

On Mon, Jul 3, 2023 at 6:57 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Thanks a lot for responding. Sorry for being late.

On 07/02/2023 16:00, Bharath Rupireddy wrote:

Hi,

While working on [1], I was looking for a quick way to tell if a WAL
record is present in the WAL buffers array without scanning but I
couldn't find one.

/* The end-ptr of the page that contains the record */
expectedEndPtr += XLOG_BLCKSZ - recptr % XLOG_BLCKSZ;

/* get the buffer where the record is, if it's in WAL buffers at all */
idx = XLogRecPtrToBufIdx(recptr);

/* prevent the WAL buffer from being evicted while we look at it */
LWLockAcquire(WALBufMappingLock, LW_SHARED);

/* Check if the page we're interested in is in the buffer */
found = XLogCtl->xlblocks[idx] == expectedEndPtr;

LWLockRelease(WALBufMappingLock, LW_SHARED);

This is exactly what I'm doing in the 0001 patch here
/messages/by-id/CALj2ACU3ZYzjOv4vZTR+LFk5PL4ndUnbLS6E1vG2dhDBjQGy2A@mail.gmail.com.

My bad! I should have mentioned the requirement properly - I want to
avoid taking WALBufMappingLock to peek into wal_buffers to determine
if the WAL buffer page containing the required WAL record exists.

You actually hint at the above solution here, so I'm confused. If you're
OK with slightly stale results, you can skip the WALBufferMappingLock
above too, and perform an atomic read of xlblocks[idx] instead.

I get that and I see GetXLogBuffer first reading xlblocks without lock
and then to confirm it anyways takes the lock again in
AdvanceXLInsertBuffer.

* However, we don't hold a lock while we read the value. If someone has
* just initialized the page, it's possible that we get a "torn read" of
* the XLogRecPtr if 64-bit fetches are not atomic on this platform. In
* that case we will see a bogus value. That's ok, we'll grab the mapping
* lock (in AdvanceXLInsertBuffer) and retry if we see anything else than
* the page we're looking for. But it means that when we do this unlocked
* read, we might see a value that appears to be ahead of the page we're
* looking for. Don't PANIC on that, until we've verified the value while
* holding the lock.
*/

The the 0001 patch at
/messages/by-id/CALj2ACU3ZYzjOv4vZTR+LFk5PL4ndUnbLS6E1vG2dhDBjQGy2A@mail.gmail.com
reads the WAL buffer page with WALBufMappingLock. So, the patch can
avoid WALBufMappingLock and do something like [1]{ idx = XLogRecPtrToBufIdx(ptr); expectedEndPtr = ptr; expectedEndPtr += XLOG_BLCKSZ - ptr % XLOG_BLCKSZ;:

[1]: { idx = XLogRecPtrToBufIdx(ptr); expectedEndPtr = ptr; expectedEndPtr += XLOG_BLCKSZ - ptr % XLOG_BLCKSZ;
{
idx = XLogRecPtrToBufIdx(ptr);
expectedEndPtr = ptr;
expectedEndPtr += XLOG_BLCKSZ - ptr % XLOG_BLCKSZ;

/*
* Do a stale read of xlblocks without WALBufMappingLock. All the callers
* of this function are expected to read WAL that's already flushed to disk
* from WAL buffers. If this stale read says the requested WAL buffer page
* doesn't exist, it means that the WAL buffer page either is being or has
* already been replaced for reuse. If this stale read says the requested
* WAL buffer page exists, we then take WALBufMappingLock and re-read the
* xlblocks to ensure the WAL buffer page really exists and nobody is
* replacing it meanwhile.
*/
endptr = XLogCtl->xlblocks[idx];

/* Requested WAL isn't available in WAL buffers. */
if (expectedEndPtr != endptr)
break;

/*
* Requested WAL is available in WAL buffers, so recheck the existence
* under the WALBufMappingLock and read if the page still exists, otherwise
* return.
*/
LWLockAcquire(WALBufMappingLock, LW_SHARED);

endptr = XLogCtl->xlblocks[idx];

/* Requested WAL isn't available in WAL buffers. */
if (expectedEndPtr != endptr)
break;

/*
* We found the WAL buffer page containing the given XLogRecPtr.
Get starting
* address of the page and a pointer to the right location given
* XLogRecPtr in that page.
*/
page = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ;
data = page + ptr % XLOG_BLCKSZ;

return data;
}

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com