Use streaming read I/O in BRIN vacuuming

Started by Arseniy Mukhin4 months ago10 messages
#1Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
1 attachment(s)

Hi,

B-tree, GiST and SP-GiST take advantage of the read stream API during
vacuuming. BRIN vacuum seems like a perfect candidate for it as well.
During a vacuum it reads the entire relation sequentially, page by
page.

PFA the patch that migrates BRIN vacuum to the read stream API.

Best regards,
Arseniy Mukhin

Attachments:

v1-0001-Use-streaming-read-I-O-in-BRIN-vacuuming.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Use-streaming-read-I-O-in-BRIN-vacuuming.patchDownload
From fcc3ed0983151791f6d6934b7137cdcf2d839857 Mon Sep 17 00:00:00 2001
From: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Date: Sun, 31 Aug 2025 15:44:17 +0300
Subject: [PATCH v1] Use streaming read I/O in BRIN vacuuming

BRIN vacuum processes all index pages in physical order. Now it
uses the read stream API instead of explicitly invoking ReadBuffer().
---
 src/backend/access/brin/brin.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 7ff7467e462..80e85066872 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2171,28 +2171,43 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
 static void
 brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy)
 {
-	BlockNumber nblocks;
-	BlockNumber blkno;
+	BlockRangeReadStreamPrivate p;
+	ReadStream *stream;
+	Buffer		buf;
+
+	p.current_blocknum = 0;
+	p.last_exclusive = RelationGetNumberOfBlocks(idxrel);
+
+	/*
+	 * It is safe to use batchmode as block_range_read_stream_cb takes no
+	 * locks.
+	 */
+	stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+										READ_STREAM_FULL |
+										READ_STREAM_SEQUENTIAL |
+										READ_STREAM_USE_BATCHING,
+										strategy,
+										idxrel,
+										MAIN_FORKNUM,
+										block_range_read_stream_cb,
+										&p,
+										0);
 
 	/*
 	 * Scan the index in physical order, and clean up any possible mess in
 	 * each page.
 	 */
-	nblocks = RelationGetNumberOfBlocks(idxrel);
-	for (blkno = 0; blkno < nblocks; blkno++)
+	while ((buf = read_stream_next_buffer(stream, NULL)) != InvalidBuffer)
 	{
-		Buffer		buf;
-
 		CHECK_FOR_INTERRUPTS();
 
-		buf = ReadBufferExtended(idxrel, MAIN_FORKNUM, blkno,
-								 RBM_NORMAL, strategy);
-
 		brin_page_cleanup(idxrel, buf);
 
 		ReleaseBuffer(buf);
 	}
 
+	read_stream_end(stream);
+
 	/*
 	 * Update all upper pages in the index's FSM, as well.  This ensures not
 	 * only that we propagate leaf-page FSM updates made by brin_page_cleanup,
-- 
2.43.0

#2Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Arseniy Mukhin (#1)
Re: Use streaming read I/O in BRIN vacuuming

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...
2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

Best regards, Andrey Borodin.

#3Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Andrey Borodin (#2)
Re: Use streaming read I/O in BRIN vacuuming

Hi!

On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Thank you for the review!

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...

Yes, I had similar thoughts, but having these comments at callsites
has its own benefits, there is a thread about these comments [0]/messages/by-id/CAE7r3M+P1Qcv7CYxi=Jw_d40L=sVRAxdDti-Vo4X5x0opZ3XVw@mail.gmail.com...

2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

If I understand correctly, in other access methods you need to be sure
that you read the relation up to the end, so you don't leave any index
tuples that should be pruned. BRIN doesn't have a prune phase, there
is only a cleanup phase. So it seems it's not a big deal if you miss
several pages that were allocated during the vacuum.

[0]: /messages/by-id/CAE7r3M+P1Qcv7CYxi=Jw_d40L=sVRAxdDti-Vo4X5x0opZ3XVw@mail.gmail.com

Best regards,
Arseniy Mukhin

#4Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Arseniy Mukhin (#3)
Re: Use streaming read I/O in BRIN vacuuming

On Sun, Aug 31, 2025 at 12:48 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Thank you for the review!

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...

Yes, I had similar thoughts, but having these comments at callsites
has its own benefits, there is a thread about these comments [0]...

2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

If I understand correctly, in other access methods you need to be sure
that you read the relation up to the end, so you don't leave any index
tuples that should be pruned. BRIN doesn't have a prune phase, there
is only a cleanup phase. So it seems it's not a big deal if you miss
several pages that were allocated during the vacuum.

Thank you for proposing the patch! I've reviewed the patch and have
some comments:

+   stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+                                       READ_STREAM_FULL |
+                                       READ_STREAM_SEQUENTIAL |
+                                       READ_STREAM_USE_BATCHING,
+                                       strategy,
+                                       idxrel,
+                                       MAIN_FORKNUM,
+                                       block_range_read_stream_cb,
+                                       &p,
+                                       0);

Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are
some reasons to use it, we should leave comments there.

---
Have you done any performance testing with this patch? Since BRIN
indexes typically don't grow very large, I'm curious how much benefit
the read_stream provides for BRIN index cleanup.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#5Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Masahiko Sawada (#4)
Re: Use streaming read I/O in BRIN vacuuming

Hi,

On Thu, 9 Oct 2025 at 02:03, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thank you for proposing the patch! I've reviewed the patch and have
some comments:

+   stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+                                       READ_STREAM_FULL |
+                                       READ_STREAM_SEQUENTIAL |
+                                       READ_STREAM_USE_BATCHING,
+                                       strategy,
+                                       idxrel,
+                                       MAIN_FORKNUM,
+                                       block_range_read_stream_cb,
+                                       &p,
+                                       0);

Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are
some reasons to use it, we should leave comments there.

Andres has some comments about retiring the READ_STREAM_SEQUENTIAL
flag as it is automatically detected and can potentially hurt
performance [1]/messages/by-id/go7c2sudqg4pp7dsabsak4ajugti4f3tyqoweja5ihcjiw65dc@yh6wi4zqys23.

[1]: /messages/by-id/go7c2sudqg4pp7dsabsak4ajugti4f3tyqoweja5ihcjiw65dc@yh6wi4zqys23

--
Regards,
Nazir Bilal Yavuz
Microsoft

#6Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Nazir Bilal Yavuz (#5)
4 attachment(s)
Re: Use streaming read I/O in BRIN vacuuming

Hi,

On Thu, Oct 9, 2025 at 2:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Aug 31, 2025 at 12:48 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Thank you for the review!

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...

Yes, I had similar thoughts, but having these comments at callsites
has its own benefits, there is a thread about these comments [0]...

2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

If I understand correctly, in other access methods you need to be sure
that you read the relation up to the end, so you don't leave any index
tuples that should be pruned. BRIN doesn't have a prune phase, there
is only a cleanup phase. So it seems it's not a big deal if you miss
several pages that were allocated during the vacuum.

Thank you for proposing the patch! I've reviewed the patch and have
some comments:

Thank you for the review!

+   stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+                                       READ_STREAM_FULL |
+                                       READ_STREAM_SEQUENTIAL |
+                                       READ_STREAM_USE_BATCHING,
+                                       strategy,
+                                       idxrel,
+                                       MAIN_FORKNUM,
+                                       block_range_read_stream_cb,
+                                       &p,
+                                       0);

Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are
some reasons to use it, we should leave comments there.

Good point, thank you. I looked again at the usage of the
READ_STREAM_SEQUENTIAL flag, and I'm leaning toward not using it here.
But I'm not completely sure, so I decided to ask about the flag usage
in the thread [0]/messages/by-id/CAE7r3MJ5NHb1BMo1oCNWmfrG=Ytx-GKX44YNdA21FuQQQeq_Qg@mail.gmail.com.

---
Have you done any performance testing with this patch? Since BRIN
indexes typically don't grow very large, I'm curious how much benefit
the read_stream provides for BRIN index cleanup.

Yeah, BRIN is known as a compact index, so these changes should have
less impact, but I thought it should still be useful, just on a
smaller scale. I didn't run any measurements, as the changes are
similar to those made to other index AMs, where this was considered an
improvement. But after your question, I decided to try running some
benchmarks.

------------------------------------------
Performance testing
------------------------------------------

Setup & data (fill_data.sql)

shared_buffers = 1Gb
Table size: ~970Mb
Index size: ~30Mb (BRIN index artificially was made quite big, but it
lets us see patch impact better).

Test cases

1) Cold data - restart db, drop page cache, table seqscan, run vacuum.
This way we read index data from the disk (restart_with_cache_drop.sh)
2) Page cache - restart db, table seqscan, run vacuum. This way we
read index data from the page cache (restart_with_wo_cache_drop.sh)
3) Shared buffers - run vacuum with data in shared buffers (use query
with bitmap scan to warm cache) (vacuum_cached.sh)

Every io_method was used with every test case. I also tried to remove
READ_STREAM_SEQUENTIAL and measured it too (it's called
patched_wo_seq).

What were measured:

1) Duration of brin_vacuum_scan (without FreeSpaceMapVacuum(idxrel))
2) Avg read rate from vacuum report. Index has 4260 pages, and the
vacuum report says 4300 pages were read, so it looks like almost all
read rate is from brin_vacuum_scan().

Results

--- Cold data ---
Method     | Version         |   Duration (ms) |    Throughput (MB/s)
----------------------------------------------------------------------
io_uring   | master             |      14.6 ± 0.8 |       2068.1 ± 127.6
              | patched              |       9.9 ± 1.3 |       2947.3 ± 363.0
             | patched_wo_seq  |      10.5 ± 1.5 |    2817.5 ± 359.7
sync       | master                |      16.1 ± 1.9 |    1921.6 ± 211.6
              | patched               |      15.4 ± 1.6 |    1985.9 ± 182.3
              | patched_wo_seq  |      13.9 ± 1.2 |    2155.7 ± 180.4
worker     | master               |      16.4 ± 1.6 |     1850.0 ± 168.6
              | patched               |       9.2 ± 0.5 |       3077.3 ± 154.3
              | patched_wo_seq  |       9.0 ± 0.5 |     3133.3 ± 158.1
--- Page cache ---

Method | Version | Duration (ms) | Throughput (MB/s)
----------------------------------------------------------------------
io_uring | master | 10.4 ± 1.7 | 2971.3 ± 469.6
| patched | 7.3 ± 1.7 | 4152.7 ± 910.6
| patched_wo_seq | 7.0 ± 1.4 | 4208.0 ± 742.2
sync | master | 10.4 ± 2.3 | 3037.9 ± 655.6
| patched | 9.2 ± 1.5 | 3319.8 ± 646.7
| patched_wo_seq | 9.1 ± 1.6 | 3384.7 ± 674.9
worker | master | 10.2 ± 1.7 | 3013.4 ± 490.0
| patched | 3.0 ± 0.2 | 7878.6 ± 540.3
| patched_wo_seq | 3.0 ± 0.3 | 8127.3 ± 1093.7

--- Shared buffers ---

Method | Version | Duration (ms) | Throughput (MB/s)
----------------------------------------------------------------------
io_uring | master | 3.1 ± 0.3 | 0.0 ± 0.0
| patched | 3.2 ± 0.5 | 0.0 ± 0.0
| patched_wo_seq | 3.3 ± 0.4 | 0.0 ± 0.0
sync | master | 3.1 ± 0.4 | 0.0 ± 0.0
| patched | 3.3 ± 0.5 | 0.0 ± 0.0
| patched_wo_seq | 3.1 ± 0.5 | 0.0 ± 0.0
worker | master | 3.2 ± 0.4 | 0.0 ± 0.0
| patched | 3.4 ± 0.3 | 0.0 ± 0.0
| patched_wo_seq | 3.6 ± 0.3 | 0.0 ± 0.0

Looks like we have a win for 'cold data' and 'page cache' cases with
worker and io_uring, and some modest improvement for sync.
READ_STREAM_SEQUENTIAL doesn't seem to change anything. I don't have
much benchmarking experience, so hope I didn't make any bad
mistakes.I'm new to benchmarking, so I hope I haven't made any serious
mistakes.

Thank you!

[0]: /messages/by-id/CAE7r3MJ5NHb1BMo1oCNWmfrG=Ytx-GKX44YNdA21FuQQQeq_Qg@mail.gmail.com

Best regards,
Arseniy Mukhin

Attachments:

restart_with_wo_cache_drop.shapplication/x-shellscript; name=restart_with_wo_cache_drop.shDownload
restart_with_cache_drop.shapplication/x-shellscript; name=restart_with_cache_drop.shDownload
vacuum_cached.shapplication/x-shellscript; name=vacuum_cached.shDownload
fill_data.sqlapplication/sql; name=fill_data.sqlDownload
#7Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Arseniy Mukhin (#6)
1 attachment(s)
Re: Use streaming read I/O in BRIN vacuuming

Hi,

On Mon, Oct 13, 2025 at 5:54 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi,

On Thu, Oct 9, 2025 at 2:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Aug 31, 2025 at 12:48 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Thank you for the review!

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...

Yes, I had similar thoughts, but having these comments at callsites
has its own benefits, there is a thread about these comments [0]...

2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

If I understand correctly, in other access methods you need to be sure
that you read the relation up to the end, so you don't leave any index
tuples that should be pruned. BRIN doesn't have a prune phase, there
is only a cleanup phase. So it seems it's not a big deal if you miss
several pages that were allocated during the vacuum.

Thank you for proposing the patch! I've reviewed the patch and have
some comments:

Thank you for the review!

+   stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+                                       READ_STREAM_FULL |
+                                       READ_STREAM_SEQUENTIAL |
+                                       READ_STREAM_USE_BATCHING,
+                                       strategy,
+                                       idxrel,
+                                       MAIN_FORKNUM,
+                                       block_range_read_stream_cb,
+                                       &p,
+                                       0);

Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are
some reasons to use it, we should leave comments there.

Good point, thank you. I looked again at the usage of the
READ_STREAM_SEQUENTIAL flag, and I'm leaning toward not using it here.
But I'm not completely sure, so I decided to ask about the flag usage
in the thread [0].

I removed READ_STREAM_SEQUENTIAL. The comment around
READ_STREAM_SEQUENTIAL says it should be used for cases where
sequential access might not be correctly detected. We use
block_range_read_stream_cb here, so the pattern should be clear. PFA
the new version.

Best regards,
Arseniy Mukhin

Attachments:

v2-0001-Use-streaming-read-I-O-in-BRIN-vacuuming.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Use-streaming-read-I-O-in-BRIN-vacuuming.patchDownload
From 9dd9176f98f57c77ecaff47c2ad1753657f807a6 Mon Sep 17 00:00:00 2001
From: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Date: Sun, 31 Aug 2025 15:44:17 +0300
Subject: [PATCH v2] Use streaming read I/O in BRIN vacuuming

BRIN vacuum processes all index pages in physical order. Now it
uses the read stream API instead of explicitly invoking ReadBuffer().
---
 src/backend/access/brin/brin.c | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2f7d1437919..cb3331921cb 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2171,28 +2171,42 @@ union_tuples(BrinDesc *bdesc, BrinMemTuple *a, BrinTuple *b)
 static void
 brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy)
 {
-	BlockNumber nblocks;
-	BlockNumber blkno;
+	BlockRangeReadStreamPrivate p;
+	ReadStream *stream;
+	Buffer		buf;
+
+	p.current_blocknum = 0;
+	p.last_exclusive = RelationGetNumberOfBlocks(idxrel);
+
+	/*
+	 * It is safe to use batchmode as block_range_read_stream_cb takes no
+	 * locks.
+	 */
+	stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+										READ_STREAM_FULL |
+										READ_STREAM_USE_BATCHING,
+										strategy,
+										idxrel,
+										MAIN_FORKNUM,
+										block_range_read_stream_cb,
+										&p,
+										0);
 
 	/*
 	 * Scan the index in physical order, and clean up any possible mess in
 	 * each page.
 	 */
-	nblocks = RelationGetNumberOfBlocks(idxrel);
-	for (blkno = 0; blkno < nblocks; blkno++)
+	while ((buf = read_stream_next_buffer(stream, NULL)) != InvalidBuffer)
 	{
-		Buffer		buf;
-
 		CHECK_FOR_INTERRUPTS();
 
-		buf = ReadBufferExtended(idxrel, MAIN_FORKNUM, blkno,
-								 RBM_NORMAL, strategy);
-
 		brin_page_cleanup(idxrel, buf);
 
 		ReleaseBuffer(buf);
 	}
 
+	read_stream_end(stream);
+
 	/*
 	 * Update all upper pages in the index's FSM, as well.  This ensures not
 	 * only that we propagate leaf-page FSM updates made by brin_page_cleanup,
-- 
2.43.0

#8Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Arseniy Mukhin (#7)
Re: Use streaming read I/O in BRIN vacuuming

On Wed, Nov 12, 2025 at 11:57 AM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi,

On Mon, Oct 13, 2025 at 5:54 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi,

On Thu, Oct 9, 2025 at 2:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Aug 31, 2025 at 12:48 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Thank you for the review!

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...

Yes, I had similar thoughts, but having these comments at callsites
has its own benefits, there is a thread about these comments [0]...

2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

If I understand correctly, in other access methods you need to be sure
that you read the relation up to the end, so you don't leave any index
tuples that should be pruned. BRIN doesn't have a prune phase, there
is only a cleanup phase. So it seems it's not a big deal if you miss
several pages that were allocated during the vacuum.

Thank you for proposing the patch! I've reviewed the patch and have
some comments:

Thank you for the review!

+   stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+                                       READ_STREAM_FULL |
+                                       READ_STREAM_SEQUENTIAL |
+                                       READ_STREAM_USE_BATCHING,
+                                       strategy,
+                                       idxrel,
+                                       MAIN_FORKNUM,
+                                       block_range_read_stream_cb,
+                                       &p,
+                                       0);

Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are
some reasons to use it, we should leave comments there.

Good point, thank you. I looked again at the usage of the
READ_STREAM_SEQUENTIAL flag, and I'm leaning toward not using it here.
But I'm not completely sure, so I decided to ask about the flag usage
in the thread [0].

I removed READ_STREAM_SEQUENTIAL. The comment around
READ_STREAM_SEQUENTIAL says it should be used for cases where
sequential access might not be correctly detected. We use
block_range_read_stream_cb here, so the pattern should be clear. PFA
the new version.

Thank you for updating the patch and sharing the performance test
results! The patch looks good to me. I'm going to push it, barring any
objections.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#9Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Masahiko Sawada (#8)
Re: Use streaming read I/O in BRIN vacuuming

On Thu, Nov 13, 2025 at 4:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 12, 2025 at 11:57 AM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi,

On Mon, Oct 13, 2025 at 5:54 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi,

On Thu, Oct 9, 2025 at 2:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Aug 31, 2025 at 12:48 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Thank you for the review!

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...

Yes, I had similar thoughts, but having these comments at callsites
has its own benefits, there is a thread about these comments [0]...

2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

If I understand correctly, in other access methods you need to be sure
that you read the relation up to the end, so you don't leave any index
tuples that should be pruned. BRIN doesn't have a prune phase, there
is only a cleanup phase. So it seems it's not a big deal if you miss
several pages that were allocated during the vacuum.

Thank you for proposing the patch! I've reviewed the patch and have
some comments:

Thank you for the review!

+   stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+                                       READ_STREAM_FULL |
+                                       READ_STREAM_SEQUENTIAL |
+                                       READ_STREAM_USE_BATCHING,
+                                       strategy,
+                                       idxrel,
+                                       MAIN_FORKNUM,
+                                       block_range_read_stream_cb,
+                                       &p,
+                                       0);

Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are
some reasons to use it, we should leave comments there.

Good point, thank you. I looked again at the usage of the
READ_STREAM_SEQUENTIAL flag, and I'm leaning toward not using it here.
But I'm not completely sure, so I decided to ask about the flag usage
in the thread [0].

I removed READ_STREAM_SEQUENTIAL. The comment around
READ_STREAM_SEQUENTIAL says it should be used for cases where
sequential access might not be correctly detected. We use
block_range_read_stream_cb here, so the pattern should be clear. PFA
the new version.

Thank you for updating the patch and sharing the performance test
results! The patch looks good to me. I'm going to push it, barring any
objections.

Pushed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#10Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Masahiko Sawada (#9)
Re: Use streaming read I/O in BRIN vacuuming

On Tue, Nov 18, 2025 at 3:20 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 13, 2025 at 4:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 12, 2025 at 11:57 AM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi,

On Mon, Oct 13, 2025 at 5:54 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi,

On Thu, Oct 9, 2025 at 2:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Aug 31, 2025 at 12:48 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Thank you for the review!

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...

Yes, I had similar thoughts, but having these comments at callsites
has its own benefits, there is a thread about these comments [0]...

2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

If I understand correctly, in other access methods you need to be sure
that you read the relation up to the end, so you don't leave any index
tuples that should be pruned. BRIN doesn't have a prune phase, there
is only a cleanup phase. So it seems it's not a big deal if you miss
several pages that were allocated during the vacuum.

Thank you for proposing the patch! I've reviewed the patch and have
some comments:

Thank you for the review!

+   stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+                                       READ_STREAM_FULL |
+                                       READ_STREAM_SEQUENTIAL |
+                                       READ_STREAM_USE_BATCHING,
+                                       strategy,
+                                       idxrel,
+                                       MAIN_FORKNUM,
+                                       block_range_read_stream_cb,
+                                       &p,
+                                       0);

Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are
some reasons to use it, we should leave comments there.

Good point, thank you. I looked again at the usage of the
READ_STREAM_SEQUENTIAL flag, and I'm leaning toward not using it here.
But I'm not completely sure, so I decided to ask about the flag usage
in the thread [0].

I removed READ_STREAM_SEQUENTIAL. The comment around
READ_STREAM_SEQUENTIAL says it should be used for cases where
sequential access might not be correctly detected. We use
block_range_read_stream_cb here, so the pattern should be clear. PFA
the new version.

Thank you for updating the patch and sharing the performance test
results! The patch looks good to me. I'm going to push it, barring any
objections.

Pushed.

Thank you for reviewing and pushing!

Best regards,
Arseniy Mukhin