From 4fb43ea17f6f34e9eb3ec7d5f740eb9ceb2c231e Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Tue, 3 Mar 2026 18:00:53 -0500
Subject: [PATCH v4 04/14] read_stream: Only increase distance when waiting for
 IO

This avoids increasing the distance to the maximum in cases where the IO
subsystem is already keeping up. This turns out to be important for
performance for two reasons:

- Pinning a lot of buffers is not cheap. If additional pins allow us to avoid
  IO waits, it's definitely worth it, but if we can already do all the
  necessary readahead at a distance of 16, reading ahead 512 buffers can
  increase the CPU overhead substantially.  This is particularly noticeable
  when the to-be-read blocks are already in the kernel page cache.

- If the read stream is read to completion, reading in data earlier than
  needed is of limited consequences, leaving aside the CPU costs mentioned
  above. But if the read stream will not be fully consumed, e.g. because it is
  on the inner side of a nested loop join, the additional IO can be a serious
  performance issue. This is not that commonly a problem for current read
  stream users, but the upcoming work, to use a read stream to fetch table
  pages as part of an index scan, frequently encounters this.

Note that this commit would have substantial performance downsides without
earlier commits. In particular the earlier commit to avoid decreasing the
readahead distance when there was recent IO is crucial, as otherwise we very
often would end up not reading ahead aggressively enough anymore with this
commit, due to increasing the distance less often.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/storage/aio/read_stream.c | 39 +++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index 31f9e35dee3..4b7db4ab372 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -952,22 +952,51 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data)
 	{
 		int16		io_index = stream->oldest_io_index;
 		int32		distance;	/* wider temporary value, clamped below */
+		bool		needed_wait;
 
 		/* Sanity check that we still agree on the buffers. */
 		Assert(stream->ios[io_index].op.buffers ==
 			   &stream->buffers[oldest_buffer_index]);
 
-		WaitReadBuffers(&stream->ios[io_index].op);
+		needed_wait = WaitReadBuffers(&stream->ios[io_index].op);
 
 		Assert(stream->ios_in_progress > 0);
 		stream->ios_in_progress--;
 		if (++stream->oldest_io_index == stream->max_ios)
 			stream->oldest_io_index = 0;
 
-		/* Look-ahead distance ramps up rapidly after we do I/O. */
-		distance = stream->distance * 2;
-		distance = Min(distance, stream->max_pinned_buffers);
-		stream->distance = distance;
+		/*
+		 * If the IO was executed synchronously, we will never see
+		 * WaitReadBuffers() block. This is particularly crucial when
+		 * effective_io_concurrency=0 is used, as all IO will be
+		 * synchronous. Without treating synchronous IO as having waited, we'd
+		 * never allow the distance to get large enough to allow for IO
+		 * combining, resulting in bad performance.
+		 */
+		if (stream->ios[io_index].op.flags & READ_BUFFERS_SYNCHRONOUSLY)
+			needed_wait = true;
+
+		/*
+		 * Have the look-ahead distance ramp up rapidly after we needed to
+		 * wait for IO. We only increase the distance when we needed to wait,
+		 * to avoid increasing the distance further than necessary, as looking
+		 * ahead too far can be costly, both due to the cost of unnecessarily
+		 * pinning many buffers and due to doing IOs that may never be
+		 * consumed if the stream is ended/reset before completion.
+		 *
+		 * If we did not need to wait, the current distance was evidently
+		 * sufficient.
+		 *
+		 * NB: May not increase the distance if we already reached the end of
+		 * the stream, as stream->distance == 0 is used to keep track of
+		 * having reached the end.
+		 */
+		if (stream->distance > 0 && needed_wait)
+		{
+			distance = stream->distance * 2;
+			distance = Min(distance, stream->max_pinned_buffers);
+			stream->distance = distance;
+		}
 
 		/*
 		 * As we needed IO, prevent distance from being reduced within our
-- 
2.53.0.1.gb2826b52eb

