BgBufferSync(): clarification about reusable_buffers variable

Started by M vd H11 months ago4 messages
#1M vd H
mvdholst@gmail.com

I have been reading the source code of the BgWriter, and there is some
code in BgBufferSync() that I don't fully understand.

In BgBufferSync(), we have the following code:

while (num_to_scan > 0 && reusable_buffers < upcoming_alloc_est)
{
int sync_state = SyncOneBuffer(next_to_clean, true,
wb_context);

if (++next_to_clean >= NBuffers)
{
next_to_clean = 0;
next_passes++;
}
num_to_scan--;

if (sync_state & BUF_WRITTEN)
{
reusable_buffers++;
if (++num_written >= bgwriter_lru_maxpages)
{
PendingBgWriterStats.maxwritten_clean++;
break;
}
}
else if (sync_state & BUF_REUSABLE)
reusable_buffers++;
}

In SyncOneBuffer(), we lock the bufHdr and then check if both the
refcount and usage_count are zero. If so, we mark the return value as
BUF_REUSABLE.
My understanding is that this means that the buffer could be reused
when am empty shared buffer is needed by a backend. However, in the
code above, we seem to track these in the reusable_buffers variable.
But that variable is always incremented when the buffer was written in
SyncOneBuffer() even though that buffer might have a non-zero refcount
or non-zero usage_count.

Initially, I thought this might be a bug, but then realized that the
reusable_buffers variable is used to make the BGWriter more or less
aggressive in reading pages, and now I'm unsure.

I discussed this with some of the PG committers at Microsoft, and we
all agree that the current code is a little confusing and it might
help if some more comments were added.

Does anybody know what the reusable_buffers variable really accounts
for, and whether the code as currently written is correct?

Thanks,

Marcel

--
Marcel van der Holst
mvdholst@gmail.com

#2Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: M vd H (#1)
1 attachment(s)
Re: BgBufferSync(): clarification about reusable_buffers variable

Hi Marcel,

On Sat, Feb 8, 2025 at 6:24 AM M vd H <mvdholst@gmail.com> wrote:

I have been reading the source code of the BgWriter, and there is some
code in BgBufferSync() that I don't fully understand.

In BgBufferSync(), we have the following code:

while (num_to_scan > 0 && reusable_buffers < upcoming_alloc_est)
{
int sync_state = SyncOneBuffer(next_to_clean, true,
wb_context);

if (++next_to_clean >= NBuffers)
{
next_to_clean = 0;
next_passes++;
}
num_to_scan--;

if (sync_state & BUF_WRITTEN)
{
reusable_buffers++;
if (++num_written >= bgwriter_lru_maxpages)
{
PendingBgWriterStats.maxwritten_clean++;
break;
}
}
else if (sync_state & BUF_REUSABLE)
reusable_buffers++;
}

In SyncOneBuffer(), we lock the bufHdr and then check if both the
refcount and usage_count are zero. If so, we mark the return value as
BUF_REUSABLE.
My understanding is that this means that the buffer could be reused
when am empty shared buffer is needed by a backend. However, in the
code above, we seem to track these in the reusable_buffers variable.
But that variable is always incremented when the buffer was written in
SyncOneBuffer() even though that buffer might have a non-zero refcount
or non-zero usage_count.

I also think tha reusable_buffers keep track of the number of reusable
buffers. BgBufferSync() calls SyncOneBuffer() with skip_recently_used
= true. In that case, if SyncOneBuffer() finds the buffer with
refcount or usage_count non-zero, it just unlocks the header and
returns. Hence when called from BgBufferSync(), SyncOneBuffer() would
write a buffer only when it is not used. Hence the result would be 0
or BUF_REUSABLE or BUF_REUSABLE | BUF_WRITTEN. It can never be just
BUF_WRITTEN.

I guess, a patch like the one attached will be more readable and clear.

I ran pgbench for 5 minutes with this patch applied and didn't see the
Assert failing. But I don't think that's a good enough test to cover
all scenarios.

--
Best Wishes,
Ashutosh Bapat

Attachments:

bg_buffer_sync_clarity.patchapplication/x-patch; name=bg_buffer_sync_clarity.patchDownload
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 80b0d0c5ded..bdacd5ad980 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3414,17 +3414,19 @@ BgBufferSync(WritebackContext *wb_context)
 		}
 		num_to_scan--;
 
+		if (sync_state & BUF_REUSABLE)
+			reusable_buffers++;
+
 		if (sync_state & BUF_WRITTEN)
 		{
-			reusable_buffers++;
+			Assert(sync_state & BUF_REUSABLE);
+
 			if (++num_written >= bgwriter_lru_maxpages)
 			{
 				PendingBgWriterStats.maxwritten_clean++;
 				break;
 			}
 		}
-		else if (sync_state & BUF_REUSABLE)
-			reusable_buffers++;
 	}
 
 	PendingBgWriterStats.buf_written_clean += num_written;
#3Xuneng Zhou
xunengzhou@gmail.com
In reply to: Ashutosh Bapat (#2)
Re: BgBufferSync(): clarification about reusable_buffers variable

Hi, tks for working on this. I had a chance to look at this while
googling BgBufferSync
function.

I also think tha reusable_buffers keep track of the number of reusable
buffers. BgBufferSync() calls SyncOneBuffer() with skip_recently_used
= true. In that case, if SyncOneBuffer() finds the buffer with
refcount or usage_count non-zero, it just unlocks the header and
returns. Hence when called from BgBufferSync(), SyncOneBuffer() would
write a buffer only when it is not used. Hence the result would be 0
or BUF_REUSABLE or BUF_REUSABLE | BUF_WRITTEN. It can never be just
BUF_WRITTEN.

Agrees. For this call stack, if skip_recently_used is set to be true,
sync_state cannot be BUF_WRITTEN alone.

I guess, a patch like the one attached will be more readable and clear.

I'm new to this part of code, and I found the patch version seems to be
more straightforward and less prone to misinterpretation.

I ran pgbench for 5 minutes with this patch applied and didn't see the
Assert failing. But I don't think that's a good enough test to cover
all scenarios.

The patch LGTM.

--

Show quoted text

Best Wishes,
Ashutosh Bapat

#4Xuneng Zhou
xunengzhou@gmail.com
In reply to: Xuneng Zhou (#3)
1 attachment(s)
Re: BgBufferSync(): clarification about reusable_buffers variable

Here's a rebase.

Attachments:

bg_buffer_sync_clarity.patchapplication/octet-stream; name=bg_buffer_sync_clarity.patchDownload
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 0b317d2d80..93ba3dad52 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3853,18 +3853,20 @@ BgBufferSync(WritebackContext *wb_context)
 			next_passes++;
 		}
 		num_to_scan--;
+        
+		if (sync_state & BUF_REUSABLE) 
+			reusable_buffers++;
 
 		if (sync_state & BUF_WRITTEN)
 		{
-			reusable_buffers++;
+			Assert(sync_state & BUF_REUSABLE);
+
 			if (++num_written >= bgwriter_lru_maxpages)
 			{
 				PendingBgWriterStats.maxwritten_clean++;
 				break;
 			}
 		}
-		else if (sync_state & BUF_REUSABLE)
-			reusable_buffers++;
 	}
 
 	PendingBgWriterStats.buf_written_clean += num_written;