Use streaming read I/O in BRIN vacuuming

Started by Arseniy Mukhin10 months ago10 messageshackers

arseniy.mukhin.dev@gmail.com

10 months ago

Hi,

B-tree, GiST and SP-GiST take advantage of the read stream API during
vacuuming. BRIN vacuum seems like a perfect candidate for it as well.
During a vacuum it reads the entire relation sequentially, page by
page.

PFA the patch that migrates BRIN vacuum to the read stream API.

Best regards,
Arseniy Mukhin

Andrey Borodin

amborodin@acm.org

10 months ago

In reply to: Arseniy Mukhin (#1)

Re: Use streaming read I/O in BRIN vacuuming

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...
2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

Best regards, Andrey Borodin.

Arseniy Mukhin

arseniy.mukhin.dev@gmail.com

10 months ago

In reply to: Andrey Borodin (#2)

Re: Use streaming read I/O in BRIN vacuuming

Hi!

On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Thank you for the review!

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...

Yes, I had similar thoughts, but having these comments at callsites
has its own benefits, there is a thread about these comments [0]/messages/by-id/CAE7r3M+P1Qcv7CYxi=Jw_d40L=sVRAxdDti-Vo4X5x0opZ3XVw@mail.gmail.com...

2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

If I understand correctly, in other access methods you need to be sure
that you read the relation up to the end, so you don't leave any index
tuples that should be pruned. BRIN doesn't have a prune phase, there
is only a cleanup phase. So it seems it's not a big deal if you miss
several pages that were allocated during the vacuum.

[0]: /messages/by-id/CAE7r3M+P1Qcv7CYxi=Jw_d40L=sVRAxdDti-Vo4X5x0opZ3XVw@mail.gmail.com

Best regards,
Arseniy Mukhin

Masahiko Sawada

sawada.mshk@gmail.com

9 months ago

In reply to: Arseniy Mukhin (#3)

Re: Use streaming read I/O in BRIN vacuuming

On Sun, Aug 31, 2025 at 12:48 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Thank you for the review!

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...

Yes, I had similar thoughts, but having these comments at callsites
has its own benefits, there is a thread about these comments [0]...

2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

If I understand correctly, in other access methods you need to be sure
that you read the relation up to the end, so you don't leave any index
tuples that should be pruned. BRIN doesn't have a prune phase, there
is only a cleanup phase. So it seems it's not a big deal if you miss
several pages that were allocated during the vacuum.

Thank you for proposing the patch! I've reviewed the patch and have
some comments:

+   stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+                                       READ_STREAM_FULL |
+                                       READ_STREAM_SEQUENTIAL |
+                                       READ_STREAM_USE_BATCHING,
+                                       strategy,
+                                       idxrel,
+                                       MAIN_FORKNUM,
+                                       block_range_read_stream_cb,
+                                       &p,
+                                       0);

Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are
some reasons to use it, we should leave comments there.

---
Have you done any performance testing with this patch? Since BRIN
indexes typically don't grow very large, I'm curious how much benefit
the read_stream provides for BRIN index cleanup.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Nazir Bilal Yavuz

byavuz81@gmail.com

9 months ago

In reply to: Masahiko Sawada (#4)

Re: Use streaming read I/O in BRIN vacuuming

Hi,

On Thu, 9 Oct 2025 at 02:03, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thank you for proposing the patch! I've reviewed the patch and have
some comments:

+   stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+                                       READ_STREAM_FULL |
+                                       READ_STREAM_SEQUENTIAL |
+                                       READ_STREAM_USE_BATCHING,
+                                       strategy,
+                                       idxrel,
+                                       MAIN_FORKNUM,
+                                       block_range_read_stream_cb,
+                                       &p,
+                                       0);

Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are
some reasons to use it, we should leave comments there.

Andres has some comments about retiring the READ_STREAM_SEQUENTIAL
flag as it is automatically detected and can potentially hurt
performance [1]/messages/by-id/go7c2sudqg4pp7dsabsak4ajugti4f3tyqoweja5ihcjiw65dc@yh6wi4zqys23.

[1]: /messages/by-id/go7c2sudqg4pp7dsabsak4ajugti4f3tyqoweja5ihcjiw65dc@yh6wi4zqys23

--
Regards,
Nazir Bilal Yavuz
Microsoft

Arseniy Mukhin

arseniy.mukhin.dev@gmail.com

9 months ago

In reply to: Nazir Bilal Yavuz (#5)

Re: Use streaming read I/O in BRIN vacuuming

Hi,

On Thu, Oct 9, 2025 at 2:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Aug 31, 2025 at 12:48 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Thank you for the review!

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...

Yes, I had similar thoughts, but having these comments at callsites
has its own benefits, there is a thread about these comments [0]...

2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

If I understand correctly, in other access methods you need to be sure
that you read the relation up to the end, so you don't leave any index
tuples that should be pruned. BRIN doesn't have a prune phase, there
is only a cleanup phase. So it seems it's not a big deal if you miss
several pages that were allocated during the vacuum.

Thank you for proposing the patch! I've reviewed the patch and have
some comments:

Thank you for the review!

+   stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+                                       READ_STREAM_FULL |
+                                       READ_STREAM_SEQUENTIAL |
+                                       READ_STREAM_USE_BATCHING,
+                                       strategy,
+                                       idxrel,
+                                       MAIN_FORKNUM,
+                                       block_range_read_stream_cb,
+                                       &p,
+                                       0);

Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are
some reasons to use it, we should leave comments there.

Good point, thank you. I looked again at the usage of the
READ_STREAM_SEQUENTIAL flag, and I'm leaning toward not using it here.
But I'm not completely sure, so I decided to ask about the flag usage
in the thread [0]/messages/by-id/CAE7r3MJ5NHb1BMo1oCNWmfrG=Ytx-GKX44YNdA21FuQQQeq_Qg@mail.gmail.com.

---
Have you done any performance testing with this patch? Since BRIN
indexes typically don't grow very large, I'm curious how much benefit
the read_stream provides for BRIN index cleanup.

Yeah, BRIN is known as a compact index, so these changes should have
less impact, but I thought it should still be useful, just on a
smaller scale. I didn't run any measurements, as the changes are
similar to those made to other index AMs, where this was considered an
improvement. But after your question, I decided to try running some
benchmarks.

------------------------------------------
Performance testing
------------------------------------------

Setup & data (fill_data.sql)

shared_buffers = 1Gb
Table size: ~970Mb
Index size: ~30Mb (BRIN index artificially was made quite big, but it
lets us see patch impact better).

Test cases

1) Cold data - restart db, drop page cache, table seqscan, run vacuum.
This way we read index data from the disk (restart_with_cache_drop.sh)
2) Page cache - restart db, table seqscan, run vacuum. This way we
read index data from the page cache (restart_with_wo_cache_drop.sh)
3) Shared buffers - run vacuum with data in shared buffers (use query
with bitmap scan to warm cache) (vacuum_cached.sh)

Every io_method was used with every test case. I also tried to remove
READ_STREAM_SEQUENTIAL and measured it too (it's called
patched_wo_seq).

What were measured:

1) Duration of brin_vacuum_scan (without FreeSpaceMapVacuum(idxrel))
2) Avg read rate from vacuum report. Index has 4260 pages, and the
vacuum report says 4300 pages were read, so it looks like almost all
read rate is from brin_vacuum_scan().

Results

--- Cold data ---
Method     | Version         |   Duration (ms) |    Throughput (MB/s)
----------------------------------------------------------------------
io_uring   | master             |      14.6 ± 0.8 |       2068.1 ± 127.6
              | patched              |       9.9 ± 1.3 |       2947.3 ± 363.0
             | patched_wo_seq  |      10.5 ± 1.5 |    2817.5 ± 359.7
sync       | master                |      16.1 ± 1.9 |    1921.6 ± 211.6
              | patched               |      15.4 ± 1.6 |    1985.9 ± 182.3
              | patched_wo_seq  |      13.9 ± 1.2 |    2155.7 ± 180.4
worker     | master               |      16.4 ± 1.6 |     1850.0 ± 168.6
              | patched               |       9.2 ± 0.5 |       3077.3 ± 154.3
              | patched_wo_seq  |       9.0 ± 0.5 |     3133.3 ± 158.1

--- Page cache ---

Method | Version | Duration (ms) | Throughput (MB/s)
----------------------------------------------------------------------
io_uring | master | 10.4 ± 1.7 | 2971.3 ± 469.6
| patched | 7.3 ± 1.7 | 4152.7 ± 910.6
| patched_wo_seq | 7.0 ± 1.4 | 4208.0 ± 742.2
sync | master | 10.4 ± 2.3 | 3037.9 ± 655.6
| patched | 9.2 ± 1.5 | 3319.8 ± 646.7
| patched_wo_seq | 9.1 ± 1.6 | 3384.7 ± 674.9
worker | master | 10.2 ± 1.7 | 3013.4 ± 490.0
| patched | 3.0 ± 0.2 | 7878.6 ± 540.3
| patched_wo_seq | 3.0 ± 0.3 | 8127.3 ± 1093.7

--- Shared buffers ---

Method | Version | Duration (ms) | Throughput (MB/s)
----------------------------------------------------------------------
io_uring | master | 3.1 ± 0.3 | 0.0 ± 0.0
| patched | 3.2 ± 0.5 | 0.0 ± 0.0
| patched_wo_seq | 3.3 ± 0.4 | 0.0 ± 0.0
sync | master | 3.1 ± 0.4 | 0.0 ± 0.0
| patched | 3.3 ± 0.5 | 0.0 ± 0.0
| patched_wo_seq | 3.1 ± 0.5 | 0.0 ± 0.0
worker | master | 3.2 ± 0.4 | 0.0 ± 0.0
| patched | 3.4 ± 0.3 | 0.0 ± 0.0
| patched_wo_seq | 3.6 ± 0.3 | 0.0 ± 0.0

Looks like we have a win for 'cold data' and 'page cache' cases with
worker and io_uring, and some modest improvement for sync.
READ_STREAM_SEQUENTIAL doesn't seem to change anything. I don't have
much benchmarking experience, so hope I didn't make any bad
mistakes.I'm new to benchmarking, so I hope I haven't made any serious
mistakes.

Thank you!

[0]: /messages/by-id/CAE7r3MJ5NHb1BMo1oCNWmfrG=Ytx-GKX44YNdA21FuQQQeq_Qg@mail.gmail.com

Best regards,
Arseniy Mukhin

Arseniy Mukhin

arseniy.mukhin.dev@gmail.com

8 months ago

In reply to: Arseniy Mukhin (#6)

Re: Use streaming read I/O in BRIN vacuuming

Hi,

On Mon, Oct 13, 2025 at 5:54 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi,

On Thu, Oct 9, 2025 at 2:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Aug 31, 2025 at 12:48 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Thank you for the review!

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...

Yes, I had similar thoughts, but having these comments at callsites
has its own benefits, there is a thread about these comments [0]...

2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

If I understand correctly, in other access methods you need to be sure
that you read the relation up to the end, so you don't leave any index
tuples that should be pruned. BRIN doesn't have a prune phase, there
is only a cleanup phase. So it seems it's not a big deal if you miss
several pages that were allocated during the vacuum.

Thank you for proposing the patch! I've reviewed the patch and have
some comments:

Thank you for the review!
+   stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+                                       READ_STREAM_FULL |
+                                       READ_STREAM_SEQUENTIAL |
+                                       READ_STREAM_USE_BATCHING,
+                                       strategy,
+                                       idxrel,
+                                       MAIN_FORKNUM,
+                                       block_range_read_stream_cb,
+                                       &p,
+                                       0);
Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are
some reasons to use it, we should leave comments there.
Good point, thank you. I looked again at the usage of the
READ_STREAM_SEQUENTIAL flag, and I'm leaning toward not using it here.
But I'm not completely sure, so I decided to ask about the flag usage
in the thread [0].

I removed READ_STREAM_SEQUENTIAL. The comment around
READ_STREAM_SEQUENTIAL says it should be used for cases where
sequential access might not be correctly detected. We use
block_range_read_stream_cb here, so the pattern should be clear. PFA
the new version.

Best regards,
Arseniy Mukhin

Masahiko Sawada

sawada.mshk@gmail.com

8 months ago

In reply to: Arseniy Mukhin (#7)

Re: Use streaming read I/O in BRIN vacuuming

On Wed, Nov 12, 2025 at 11:57 AM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi,

On Mon, Oct 13, 2025 at 5:54 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:
Hi,

On Thu, Oct 9, 2025 at 2:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Aug 31, 2025 at 12:48 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Thank you for the review!

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...

Yes, I had similar thoughts, but having these comments at callsites
has its own benefits, there is a thread about these comments [0]...

2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

If I understand correctly, in other access methods you need to be sure
that you read the relation up to the end, so you don't leave any index
tuples that should be pruned. BRIN doesn't have a prune phase, there
is only a cleanup phase. So it seems it's not a big deal if you miss
several pages that were allocated during the vacuum.

Thank you for proposing the patch! I've reviewed the patch and have
some comments:

Thank you for the review!
+   stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+                                       READ_STREAM_FULL |
+                                       READ_STREAM_SEQUENTIAL |
+                                       READ_STREAM_USE_BATCHING,
+                                       strategy,
+                                       idxrel,
+                                       MAIN_FORKNUM,
+                                       block_range_read_stream_cb,
+                                       &p,
+                                       0);
Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are
some reasons to use it, we should leave comments there.
Good point, thank you. I looked again at the usage of the
READ_STREAM_SEQUENTIAL flag, and I'm leaning toward not using it here.
But I'm not completely sure, so I decided to ask about the flag usage
in the thread [0].
I removed READ_STREAM_SEQUENTIAL. The comment around
READ_STREAM_SEQUENTIAL says it should be used for cases where
sequential access might not be correctly detected. We use
block_range_read_stream_cb here, so the pattern should be clear. PFA
the new version.

Thank you for updating the patch and sharing the performance test
results! The patch looks good to me. I'm going to push it, barring any
objections.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Masahiko Sawada

sawada.mshk@gmail.com

7 months ago

In reply to: Masahiko Sawada (#8)

Re: Use streaming read I/O in BRIN vacuuming

On Thu, Nov 13, 2025 at 4:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 12, 2025 at 11:57 AM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:
Hi,

On Mon, Oct 13, 2025 at 5:54 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:
Hi,

On Thu, Oct 9, 2025 at 2:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Aug 31, 2025 at 12:48 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Thank you for the review!

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...

Yes, I had similar thoughts, but having these comments at callsites
has its own benefits, there is a thread about these comments [0]...

2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

If I understand correctly, in other access methods you need to be sure
that you read the relation up to the end, so you don't leave any index
tuples that should be pruned. BRIN doesn't have a prune phase, there
is only a cleanup phase. So it seems it's not a big deal if you miss
several pages that were allocated during the vacuum.

Thank you for proposing the patch! I've reviewed the patch and have
some comments:

Thank you for the review!
+   stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+                                       READ_STREAM_FULL |
+                                       READ_STREAM_SEQUENTIAL |
+                                       READ_STREAM_USE_BATCHING,
+                                       strategy,
+                                       idxrel,
+                                       MAIN_FORKNUM,
+                                       block_range_read_stream_cb,
+                                       &p,
+                                       0);
Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are
some reasons to use it, we should leave comments there.
Good point, thank you. I looked again at the usage of the
READ_STREAM_SEQUENTIAL flag, and I'm leaning toward not using it here.
But I'm not completely sure, so I decided to ask about the flag usage
in the thread [0].
I removed READ_STREAM_SEQUENTIAL. The comment around
READ_STREAM_SEQUENTIAL says it should be used for cases where
sequential access might not be correctly detected. We use
block_range_read_stream_cb here, so the pattern should be clear. PFA
the new version.
Thank you for updating the patch and sharing the performance test
results! The patch looks good to me. I'm going to push it, barring any
objections.

Pushed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#10

Arseniy Mukhin

arseniy.mukhin.dev@gmail.com

7 months ago

In reply to: Masahiko Sawada (#9)

Re: Use streaming read I/O in BRIN vacuuming

On Tue, Nov 18, 2025 at 3:20 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 13, 2025 at 4:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Nov 12, 2025 at 11:57 AM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:
Hi,

On Mon, Oct 13, 2025 at 5:54 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:
Hi,

On Thu, Oct 9, 2025 at 2:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Aug 31, 2025 at 12:48 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi!

On Sun, Aug 31, 2025 at 8:49 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

On 31 Aug 2025, at 21:17, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

PFA the patch that migrates BRIN vacuum to the read stream API.

The patch is nice and straightforward. Looks good to me.

Thank you for the review!

Some notes that do not seem to me problem of this patch:
1. This comment is copied 7 times already across codebase.
"It is safe to use batchmode as block_range_read_stream_cb"
Maybe we can refactor comments and function names...

Yes, I had similar thoughts, but having these comments at callsites
has its own benefits, there is a thread about these comments [0]...

2. Somehow brin_vacuum_scan() avoid dance of getting RelationGetNumberOfBlocks() many times to be entirely sure everything is scanned. Unlike other index vacuums, see btvacuumscan() for example.

If I understand correctly, in other access methods you need to be sure
that you read the relation up to the end, so you don't leave any index
tuples that should be pruned. BRIN doesn't have a prune phase, there
is only a cleanup phase. So it seems it's not a big deal if you miss
several pages that were allocated during the vacuum.

Thank you for proposing the patch! I've reviewed the patch and have
some comments:

Thank you for the review!
+   stream = read_stream_begin_relation(READ_STREAM_MAINTENANCE |
+                                       READ_STREAM_FULL |
+                                       READ_STREAM_SEQUENTIAL |
+                                       READ_STREAM_USE_BATCHING,
+                                       strategy,
+                                       idxrel,
+                                       MAIN_FORKNUM,
+                                       block_range_read_stream_cb,
+                                       &p,
+                                       0);
Unlike other index AM's it uses READ_STREAM_SEQUENTIAL. If there are
some reasons to use it, we should leave comments there.
Good point, thank you. I looked again at the usage of the
READ_STREAM_SEQUENTIAL flag, and I'm leaning toward not using it here.
But I'm not completely sure, so I decided to ask about the flag usage
in the thread [0].
I removed READ_STREAM_SEQUENTIAL. The comment around
READ_STREAM_SEQUENTIAL says it should be used for cases where
sequential access might not be correctly detected. We use
block_range_read_stream_cb here, so the pattern should be clear. PFA
the new version.
Thank you for updating the patch and sharing the performance test
results! The patch looks good to me. I'm going to push it, barring any
objections.
Pushed.

Thank you for reviewing and pushing!

Best regards,
Arseniy Mukhin

Use streaming read I/O in BRIN vacuuming

Attachments:

Attachments:

Attachments: