Unlogged rel fake lsn vs GetVictimBuffer()

Started by Andres Freund3 months ago8 messageshackers

andres@anarazel.de

3 months ago

Hi,

In [1]/messages/by-id/6blo6ebftb6eic6cadvkhn64zhxl5klf4fxgxfgbwgoijhxyiq@mfz2kofehnjy I noticed that:

FlushBuffer() is careful to not use a page's LSN unless it's
BM_PERMANENT. However, GetVictimBuffer() is not:

/* Read the LSN while holding buffer header lock */
buf_state = LockBufHdr(buf_hdr);
lsn = BufferGetLSN(buf_hdr);
UnlockBufHdr(buf_hdr);

if (XLogNeedsFlush(lsn)
&& StrategyRejectBuffer(strategy, buf_hdr, from_ring))
{
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
UnpinBuffer(buf_hdr);
goto again;
}

If this is an unlogged table, the call to XLogNeedsFlush() will use a fake
LSN, while XLogNeedsFlush() expects a real LSN.

The consequences seem luckily limited, I don't think anything today uses
bulkread strategies (which are the only ones where StrategyRejectBuffer()
returns false if a flush is needed) with gist or such. But it'd be completely
reasonable for XLogNeedsFlush() to have assertions about the LSN being in
range, we just don't today.

I think we ought to fix it, even if it is currently harmless. It seems like
it'd not be too hard for this to go wrong in the future, it'd e.g. make sense
for XLogNeedsFlush() to XLogCtl->LogwrtRqst.Write and wake up walwriter. Which
would have bad consequences if it were done with a fake LSN.

I also think it might be good for XLogNeedsFlush() to have an assertion
verifying that the LSN in the realm of the possible. It's too easy to feed it
something entirely bogus right now and never notice.

Greetings,

Andres Freund

[1]: /messages/by-id/6blo6ebftb6eic6cadvkhn64zhxl5klf4fxgxfgbwgoijhxyiq@mfz2kofehnjy

Melanie Plageman

melanieplageman@gmail.com

3 months ago

In reply to: Andres Freund (#1)

Re: Unlogged rel fake lsn vs GetVictimBuffer()

On Sat, Feb 28, 2026 at 1:09 PM Andres Freund <andres@anarazel.de> wrote:

I think we ought to fix it, even if it is currently harmless. It seems like
it'd not be too hard for this to go wrong in the future, it'd e.g. make sense
for XLogNeedsFlush() to XLogCtl->LogwrtRqst.Write and wake up walwriter. Which
would have bad consequences if it were done with a fake LSN.

I also think it might be good for XLogNeedsFlush() to have an assertion
verifying that the LSN in the realm of the possible. It's too easy to feed it
something entirely bogus right now and never notice.

I agree we should fix it. I noticed it while working on write
combining. I wrote the attached patch to clean up. It's more than you
were mentioning doing, but I think it makes the code nicer. You can of
course also add assertions to XLogNeedsFlush() too.

- Melanie

Andres Freund

andres@anarazel.de

3 months ago

In reply to: Melanie Plageman (#2)

Re: Unlogged rel fake lsn vs GetVictimBuffer()

Hi,

On 2026-02-28 13:58:48 -0500, Melanie Plageman wrote:

On Sat, Feb 28, 2026 at 1:09 PM Andres Freund <andres@anarazel.de> wrote:

I think we ought to fix it, even if it is currently harmless. It seems like
it'd not be too hard for this to go wrong in the future, it'd e.g. make sense
for XLogNeedsFlush() to XLogCtl->LogwrtRqst.Write and wake up walwriter. Which
would have bad consequences if it were done with a fake LSN.

I also think it might be good for XLogNeedsFlush() to have an assertion
verifying that the LSN in the realm of the possible. It's too easy to feed it
something entirely bogus right now and never notice.

I agree we should fix it. I noticed it while working on write
combining. I wrote the attached patch to clean up. It's more than you
were mentioning doing, but I think it makes the code nicer. You can of
course also add assertions to XLogNeedsFlush() too.

Ah.

From dd75df0df8184ff87034df30982f763fe4cd15c8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 7 Jan 2026 13:32:18 -0500
Subject: [PATCH v0 03/15] Streamline buffer rejection for bulkreads of
unlogged tables

Bulk-read buffer access strategies reject reusing a buffer from the
buffer access strategy ring if doing so would require flushing WAL.
Unlogged relations never require WAL flushes, so this check can be
skipped. It can also be skipped for buffer access strategies other than
bulkread. This avoids taking the buffer header lock unnecessarily.

This under-sells the need for the change a bit, given that we'll just pass a
bogus value to XLogNeedsFlush() today for unlogged rels.

Author: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: /messages/by-id/flat/0198DBB9-4A76-49E4-87F8-43D46DD0FD76@gmail.com
---
src/backend/storage/buffer/bufmgr.c | 60 ++++++++++++++++++++-------
src/backend/storage/buffer/freelist.c | 13 +++++-
src/include/storage/buf_internals.h | 1 +
3 files changed, 57 insertions(+), 17 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 286742e6968..0b5adc9e4a6 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2520,25 +2520,14 @@ again:
* If using a nondefault strategy, and writing the buffer would
* require a WAL flush, let the strategy decide whether to go ahead
* and write/reuse the buffer or to choose another victim.  We need a
-		 * lock to inspect the page LSN, so this can't be done inside
+		 * content lock to inspect the page LSN, so this can't be done inside
* StrategyGetBuffer.
*/
-		if (strategy != NULL)
+		if (strategy && StrategyRejectBuffer(strategy, buf_hdr, from_ring))
{
-			XLogRecPtr	lsn;
-
-			/* Read the LSN while holding buffer header lock */
-			buf_state = LockBufHdr(buf_hdr);
-			lsn = BufferGetLSN(buf_hdr);
-			UnlockBufHdr(buf_hdr);
-
-			if (XLogNeedsFlush(lsn)
-				&& StrategyRejectBuffer(strategy, buf_hdr, from_ring))
-			{
-				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-				UnpinBuffer(buf_hdr);
-				goto again;
-			}
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+			UnpinBuffer(buf_hdr);
+			goto again;
}

/* OK, do the I/O */
@@ -3438,6 +3427,45 @@ TrackNewBufferPin(Buffer buf)
BLCKSZ);
}

+/*
+ * Returns true if the buffer needs WAL flushed before it can be written out.
+ *
+ * If the result is required to be correct, the caller must hold a buffer
+ * content lock. If they only hold a shared content lock, we'll need to
+ * acquire the buffer header spinlock, so they must not already hold it.
+ */
+bool
+BufferNeedsWALFlush(BufferDesc *bufdesc, bool exclusive_locked)
+{
+	uint32		buf_state = pg_atomic_read_u64(&bufdesc->state);
+	Buffer		buffer;
+	char	   *page;
+	XLogRecPtr	lsn;
+
+	/*
+	 * Unlogged buffers can't need WAL flush. See FlushBuffer() for more
+	 * details on unlogged relations with LSNs.
+	 */
+	if (!(buf_state & BM_PERMANENT))
+		return false;
+
+	buffer = BufferDescriptorGetBuffer(bufdesc);
+	page = BufferGetPage(buffer);
+
+	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer) || exclusive_locked)
+		lsn = PageGetLSN(page);
+	else
+	{
+		/* Buffer is either share locked or not locked */
+		LockBufHdr(bufdesc);
+		lsn = PageGetLSN(page);
+		UnlockBufHdr(bufdesc);
+	}

I buy the exclusive_locked branch, but the rest seems a bit dubious:

- BufferIsLocal(buffer) is impossible, the buffer would not be permanent, and
I don't think we use strategy

- XLogHintBitIsNeeded() isn't *that* cheap and if we dirtied the buffer, the
relative cost of locking the buffer header isn't going to matter.

diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index b7687836188..37398865d19 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -780,12 +780,17 @@ IOContextForStrategy(BufferAccessStrategy strategy)
* be written out and doing so would require flushing WAL too.  This gives us
* a chance to choose a different victim.
*
+ * The buffer must be pinned and content locked and the buffer header spinlock
+ * must not be held.
+ *
* Returns true if buffer manager should ask for a new victim, and false
* if this buffer should be written and re-used.
*/
bool
StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_ring)
{
+	Assert(strategy);
+
/* We only do this in bulkread mode */
if (strategy->btype != BAS_BULKREAD)
return false;
@@ -795,8 +800,14 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_r
strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf))
return false;

+	Assert(BufferIsLockedByMe(BufferDescriptorGetBuffer(buf)));
+	Assert(!(pg_atomic_read_u64(&buf->state) & BM_LOCKED));
+
+	if (!BufferNeedsWALFlush(buf, false))
+		return false;
+
/*
-	 * Remove the dirty buffer from the ring; necessary to prevent infinite
+	 * Remove the dirty buffer from the ring; necessary to prevent an infinite
* loop if all ring members are dirty.
*/
strategy->buffers[strategy->current] = InvalidBuffer;

It's a bit unfortunate that now you have an external function call from
bufmgr.c (for StrategyRejectBuffer()) and then an external call back to
bufmgr.c, just to then return back to freelist.c and then to bufmgr.c. And
you need to re-read the buf_state in BufferNeedsWALFlush() again.

I'd probably just call BufferNeedsWALFlush() in GetVictimBuffer() and pass it
the old buf_state, to avoid having to re-fetch it.

Greetings,

Andres Freund

Melanie Plageman

melanieplageman@gmail.com

3 months ago

In reply to: Andres Freund (#3)

Re: Unlogged rel fake lsn vs GetVictimBuffer()

On Sat, Feb 28, 2026 at 7:16 PM Andres Freund <andres@anarazel.de> wrote:

+BufferNeedsWALFlush(BufferDesc *bufdesc, bool exclusive_locked)
+{
+     uint32          buf_state = pg_atomic_read_u64(&bufdesc->state);
+     Buffer          buffer;
+     char       *page;
+     XLogRecPtr      lsn;
+
+     /*
+      * Unlogged buffers can't need WAL flush. See FlushBuffer() for more
+      * details on unlogged relations with LSNs.
+      */
+     if (!(buf_state & BM_PERMANENT))
+             return false;
+
+     buffer = BufferDescriptorGetBuffer(bufdesc);
+     page = BufferGetPage(buffer);
+
+     if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer) || exclusive_locked)
+             lsn = PageGetLSN(page);
+     else
+     {
+             /* Buffer is either share locked or not locked */
+             LockBufHdr(bufdesc);
+             lsn = PageGetLSN(page);
+             UnlockBufHdr(bufdesc);
+     }

I buy the exclusive_locked branch, but the rest seems a bit dubious:

- BufferIsLocal(buffer) is impossible, the buffer would not be permanent, and
I don't think we use strategy

Changed it to an assert.

- XLogHintBitIsNeeded() isn't *that* cheap and if we dirtied the buffer, the
relative cost of locking the buffer header isn't going to matter.

BufferGetLSNAtomic() checks XLogHintBitIsNeeded() as part of what it
calls a "fast path". I suppose in the case of BufferNeedsWALFlush() we
expect it only to be called on dirty buffers, so perhaps we don't need
to worry about being "fast". But why would XLogHintBitIsNeeded() be
slower than the buffer header lock? Is accessing the ControlFile
variable expensive or is it just the external function call?

diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
@@ -795,8 +800,14 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_r
strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf))
return false;
+     Assert(BufferIsLockedByMe(BufferDescriptorGetBuffer(buf)));
+     Assert(!(pg_atomic_read_u64(&buf->state) & BM_LOCKED));
+
+     if (!BufferNeedsWALFlush(buf, false))
+             return false;
It's a bit unfortunate that now you have an external function call from
bufmgr.c (for StrategyRejectBuffer()) and then an external call back to
bufmgr.c, just to then return back to freelist.c and then to bufmgr.c. And
you need to re-read the buf_state in BufferNeedsWALFlush() again.

I'd probably just call BufferNeedsWALFlush() in GetVictimBuffer() and pass it
the old buf_state, to avoid having to re-fetch it.

My idea was to avoid taking the buffer header lock if it is a
non-bulkread strategy -- which you can't do if you need to call
BufferNeedsWALFlush() from bufmgr.c. You have to check
BufferNeedsWALFlush() before StrategyRejectBuffer() because
StrategyRejectBuffer() sets the buffer to InvalidBuffer in the ring.
So you think that the external function call will be more expensive
than taking the buffer header lock?

And, I find the current StrategyRejectBuffer() kind of silly and hard
to understand -- its comment mentions needing WAL flush, but, of
course it doesn't check that at all. I've changed it as you suggest,
but I did really prefer it the other way from a code readability
perspective.

- Melanie

Melanie Plageman

melanieplageman@gmail.com

3 months ago

In reply to: Melanie Plageman (#4)

Re: Unlogged rel fake lsn vs GetVictimBuffer()

None of my points above are blockers, so barring any objections I plan
to push this later today/tomorrow. It hasn't been sitting out long but
it is pretty trivial and I don't think it has any correctness issues.

- Melanie

Melanie Plageman

melanieplageman@gmail.com

3 months ago

In reply to: Melanie Plageman (#5)

Re: Unlogged rel fake lsn vs GetVictimBuffer()

On Tue, Mar 10, 2026 at 3:31 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

None of my points above are blockers, so barring any objections I plan
to push this later today/tomorrow. It hasn't been sitting out long but
it is pretty trivial and I don't think it has any correctness issues.

Now, I'm thinking that I should allow BufferNeedsWALFlush() to be
called on local buffers. I removed it in v2 because Andres mentioned
it could never happen when called by StrategyRejectBuffer() (because
we don't use strategies on local buffers), but there's no reason
BufferNeedsWALFlush() can't be used more widely in the future.

- Melanie

Melanie Plageman

melanieplageman@gmail.com

3 months ago

In reply to: Melanie Plageman (#6)

Re: Unlogged rel fake lsn vs GetVictimBuffer()

On Tue, Mar 10, 2026 at 3:46 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Now, I'm thinking that I should allow BufferNeedsWALFlush() to be
called on local buffers. I removed it in v2 because Andres mentioned
it could never happen when called by StrategyRejectBuffer() (because
we don't use strategies on local buffers), but there's no reason
BufferNeedsWALFlush() can't be used more widely in the future.

Well, due to 82467f627bd478569de, this is now a tiny one-liner. Will
push in an hour or so barring objections.

- Melanie

Melanie Plageman

melanieplageman@gmail.com

3 months ago

In reply to: Melanie Plageman (#7)

Re: Unlogged rel fake lsn vs GetVictimBuffer()

On Wed, Mar 11, 2026 at 2:03 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Well, due to 82467f627bd478569de, this is now a tiny one-liner. Will
push in an hour or so barring objections.

This has been committed.

- Melanie

Unlogged rel fake lsn vs GetVictimBuffer()

Attachments:

Attachments:

Attachments: