Using read_stream in index vacuum

Started by Andrey Borodinover 1 year ago36 messageshackers
Jump to latest
#1Andrey Borodin
amborodin@acm.org

Hi hackers!

On a recent hacking workshop [0]https://rhaas.blogspot.com/2024/08/postgresql-hacking-workshop-september.html Thomas mentioned that patches using new API would be welcomed.
So I prototyped streamlining of B-tree vacuum for a discussion.
When cleaning an index we must visit every index tuple, thus we uphold a special invariant:
After checking a trailing block, it must be last according to subsequent RelationGetNumberOfBlocks(rel) call.

This invariant does not allow us to completely replace block loop with streamlining. That's why streamlining is done only for number of blocks returned by first RelationGetNumberOfBlocks(rel) call. A tail is processed with regular ReadBufferExtended().

Also, it's worth mentioning that we have to jump to the left blocks from a recently split pages. We also do it with regular ReadBufferExtended(). That's why signature btvacuumpage() now accepts a buffer, not a block number.

I've benchmarked the patch on my laptop (MacBook Air M3) with following workload:
1. Initialization
create unlogged table x as select random() r from generate_series(1,1e7);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
vacuum;
2. pgbench with 1 client
insert into x select random() from generate_series(0,10) x;
vacuum x;

On my laptop I see ~3% increase in TPS of the the pgbench (~ from 101 to 104), but statistical noise is very significant, bigger than performance change. Perhaps, a less noisy benchmark can be devised.

What do you think? If this approach seems worthwhile, I can adapt same technology to other AMs.

Best regards, Andrey Borodin.

[0]: https://rhaas.blogspot.com/2024/08/postgresql-hacking-workshop-september.html

Attachments:

0001-Prototype-B-tree-vacuum-streamlineing.patchapplication/octet-stream; name=0001-Prototype-B-tree-vacuum-streamlineing.patch; x-unix-mode=0644Download+54-13
#2Junwang Zhao
zhjwpku@gmail.com
In reply to: Andrey Borodin (#1)
Re: Using read_stream in index vacuum

Hi Andrey,

On Sat, Oct 19, 2024 at 5:39 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Hi hackers!

On a recent hacking workshop [0] Thomas mentioned that patches using new API would be welcomed.
So I prototyped streamlining of B-tree vacuum for a discussion.
When cleaning an index we must visit every index tuple, thus we uphold a special invariant:
After checking a trailing block, it must be last according to subsequent RelationGetNumberOfBlocks(rel) call.

This invariant does not allow us to completely replace block loop with streamlining. That's why streamlining is done only for number of blocks returned by first RelationGetNumberOfBlocks(rel) call. A tail is processed with regular ReadBufferExtended().

I'm wondering why is the case, ISTM that we can do *p.current_blocknum
= scanblkno*
and *p.last_exclusive = num_pages* in each loop of the outer for?

+ /* We only streamline number of blocks that are know at the beginning */
know -> known

+ * However, we do not depent on it much, and in future ths
+ * expetation might change.

depent -> depend
ths -> this
expetation -> expectation

Also, it's worth mentioning that we have to jump to the left blocks from a recently split pages. We also do it with regular ReadBufferExtended(). That's why signature btvacuumpage() now accepts a buffer, not a block number.

I've benchmarked the patch on my laptop (MacBook Air M3) with following workload:
1. Initialization
create unlogged table x as select random() r from generate_series(1,1e7);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
create index on x(r);
vacuum;
2. pgbench with 1 client
insert into x select random() from generate_series(0,10) x;
vacuum x;

On my laptop I see ~3% increase in TPS of the the pgbench (~ from 101 to 104), but statistical noise is very significant, bigger than performance change. Perhaps, a less noisy benchmark can be devised.

What do you think? If this approach seems worthwhile, I can adapt same technology to other AMs.

I think this is a use case where the read stream api fits very well, thanks.

Best regards, Andrey Borodin.

[0] https://rhaas.blogspot.com/2024/08/postgresql-hacking-workshop-september.html

--
Regards
Junwang Zhao

#3Andrey Borodin
amborodin@acm.org
In reply to: Junwang Zhao (#2)
Re: Using read_stream in index vacuum

On 19 Oct 2024, at 20:41, Junwang Zhao <zhjwpku@gmail.com> wrote:

I'm wondering why is the case, ISTM that we can do *p.current_blocknum
= scanblkno*
and *p.last_exclusive = num_pages* in each loop of the outer for?

Thanks for reviewing!
AFAIK we cannot restart stream if it's finished, so we have a race condition of main loop and callback caller.
Resolving this race condition would make code much more complex for a relatively small benefit.

I'll address typos in next patch version, thank you for looking into this.

Best regards, Andrey Borodin.

#4Junwang Zhao
zhjwpku@gmail.com
In reply to: Andrey Borodin (#3)
Re: Using read_stream in index vacuum

On Sun, Oct 20, 2024 at 3:19 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 19 Oct 2024, at 20:41, Junwang Zhao <zhjwpku@gmail.com> wrote:

I'm wondering why is the case, ISTM that we can do *p.current_blocknum
= scanblkno*
and *p.last_exclusive = num_pages* in each loop of the outer for?

Thanks for reviewing!
AFAIK we cannot restart stream if it's finished, so we have a race condition of main loop and callback caller.
Resolving this race condition would make code much more complex for a relatively small benefit.

I'm not sure if I did not express myself correctly, I didn't mean to
restart the stream,
I mean we can create a new stream for each outer loop, I attached a
refactor 0002
based on your 0001, correct me if I'm wrong.

I'll address typos in next patch version, thank you for looking into this.

Best regards, Andrey Borodin.

--
Regards
Junwang Zhao

Attachments:

v2-0001-Prototype-B-tree-vacuum-streamlineing.patchapplication/octet-stream; name=v2-0001-Prototype-B-tree-vacuum-streamlineing.patchDownload+54-13
v2-0002-refactor.patchapplication/octet-stream; name=v2-0002-refactor.patchDownload+16-41
#5Andrey Borodin
amborodin@acm.org
In reply to: Junwang Zhao (#4)
Re: Using read_stream in index vacuum

On 20 Oct 2024, at 15:16, Junwang Zhao <zhjwpku@gmail.com> wrote:

I'm not sure if I did not express myself correctly, I didn't mean to
restart the stream,
I mean we can create a new stream for each outer loop, I attached a
refactor 0002
based on your 0001, correct me if I'm wrong.

I really like how the code looks with this refactoring. But I think we need some input from Thomas.
Is it OK if we start handful of streams for 1 page at the end of vacuum scan? How costly is to start a new scan?

Best regards, Andrey Borodin.

#6Melanie Plageman
melanieplageman@gmail.com
In reply to: Andrey Borodin (#5)
Re: Using read_stream in index vacuum

On Sun, Oct 20, 2024 at 10:19 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 20 Oct 2024, at 15:16, Junwang Zhao <zhjwpku@gmail.com> wrote:

I'm not sure if I did not express myself correctly, I didn't mean to
restart the stream,
I mean we can create a new stream for each outer loop, I attached a
refactor 0002
based on your 0001, correct me if I'm wrong.

I really like how the code looks with this refactoring. But I think we need some input from Thomas.
Is it OK if we start handful of streams for 1 page at the end of vacuum scan? How costly is to start a new scan?

You shouldn't need either loop in btvacuumscan().

For the inner loop:

for (; scanblkno < num_pages; scanblkno++)
{
Buffer buf = read_stream_next_buffer(stream, NULL);
btvacuumpage(&vstate, buf);
if (info->report_progress)
pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,
scanblkno);
}

you should just be able to be do something like

Buffer buf = read_stream_next_buffer(stream, NULL);
if (BufferIsValid(buf))
btvacuumpage(&vstate, buf);

Obviously I am eliding some details and clean-up and such. But your
read stream callback should be responsible for advancing the block
number and thus you shouldn't need to loop like this in
btvacuumscan().

The whole point of the read stream callback provided by the caller is
that the logic to get the next block should be contained there
(read_stream_get_block() is an exception to this).

For the outer loop, I feel like we have options. For example, maybe
the read stream callback can call RelationGetNumberOfBlocks(). I mean
maybe we don't want to have to take a relation extension lock in a
callback.

So, alternatively, we could add some kind of restartable flag to the
read stream API. So, after the callback returns InvalidBlockNumber and
the read_stream_next_buffer() returns NULL, we could call something
like read_stream_update() or read_stream_restart() or something. We
would have updated the BlockRangeReadStreamPrivate->last_exclusive
value. In your case it might not be substantially different
operationally than making new read streams (since you are not
allocating memory for a per-buffer data structure). But, I think the
code would read much better than making new read stream objects in a
loop.

- Melanie

#7Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#6)
Re: Using read_stream in index vacuum

On Mon, Oct 21, 2024 at 3:34 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

For the outer loop, I feel like we have options. For example, maybe
the read stream callback can call RelationGetNumberOfBlocks(). I mean
maybe we don't want to have to take a relation extension lock in a
callback.

Also, given this note in btvacuumscan:

* XXX: Now that new pages are locked with RBM_ZERO_AND_LOCK, I don't
* think the use of the extension lock is still required.

Maybe we can stop requiring the extension lock and then I think it
might be okay to call RelationGetNumberOfBlocks() in the callback.
Probably needs more thought though.

- Melanie

#8Andrey Borodin
amborodin@acm.org
In reply to: Melanie Plageman (#6)
Re: Using read_stream in index vacuum

Melanie, thanks for your comments.<br /><br />21.10.2024, 22:34, "Melanie Plageman" &lt;melanieplageman@gmail.com&gt;:<br /><blockquote><p></p><p>The whole point of the read stream callback provided by the caller is<br />that the logic to get the next block should be there</p></blockquote>We must get number of blocks after examining last block. But callback returning EOF might be called before. With current API we have to restart.<div><br /></div><div>Removing extension lock will not change this.</div><div><br /></div><div><br /></div><div>Best regards, Andrey Borodin.</div>

#9Melanie Plageman
melanieplageman@gmail.com
In reply to: Andrey Borodin (#8)
Re: Using read_stream in index vacuum

On Mon, Oct 21, 2024 at 4:49 PM Andrei Borodin <x4mmm@yandex-team.ru> wrote:

21.10.2024, 22:34, "Melanie Plageman" <melanieplageman@gmail.com>:

The whole point of the read stream callback provided by the caller is
that the logic to get the next block should be there

We must get number of blocks after examining last block. But callback returning EOF might be called before. With current API we have to restart.

Removing extension lock will not change this.

I was suggesting you call RelationGetNumberOfBlocks() once
current_block == last_exclusive in the callback itself.

- Melanie

#10Andrey Borodin
amborodin@acm.org
In reply to: Melanie Plageman (#9)
Re: Using read_stream in index vacuum

On 22 Oct 2024, at 00:05, Melanie Plageman <melanieplageman@gmail.com> wrote:

I was suggesting you call RelationGetNumberOfBlocks() once
current_block == last_exclusive in the callback itself.

Consider following sequence of events:

0. We schedule some buffers for IO
1. We call RelationGetNumberOfBlocks() in callback when current_block == last_exclusive and return InvalidBlockNumber to signal EOF
After this:
2. Some page is getting split into new page with number last_exclusive
3. Buffers from IO are returned and vacuumed, but not with number last_exclusive, because it was not scheduled

Maybe I'm missing something...

Best regards, Andrey Borodin.

#11Melanie Plageman
melanieplageman@gmail.com
In reply to: Andrey Borodin (#10)
Re: Using read_stream in index vacuum

On Tue, Oct 22, 2024 at 2:30 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 22 Oct 2024, at 00:05, Melanie Plageman <melanieplageman@gmail.com> wrote:

I was suggesting you call RelationGetNumberOfBlocks() once
current_block == last_exclusive in the callback itself.

Consider following sequence of events:

0. We schedule some buffers for IO
1. We call RelationGetNumberOfBlocks() in callback when current_block == last_exclusive and return InvalidBlockNumber to signal EOF
After this:
2. Some page is getting split into new page with number last_exclusive
3. Buffers from IO are returned and vacuumed, but not with number last_exclusive, because it was not scheduled

Ah, right, the callback might return InvalidBlockNumber far before
we've actually read (and vacuumed) the blocks it is specifying.

I ran into something similar when trying to use the read stream API
for index prefetching. I added TIDs from the index to a queue that was
passed to the read stream and available in the callback. When the
queue was empty, I needed to check if there were more index entries
and, if so, add more TIDs to the queue (instead of ending the read
stream). So, I wanted some way for the callback to return
InvalidBlockNumber when there might actually be more blocks to
request. This is a kind of "restarting" behavior.

In that case, though, the reason the callback couldn't get more TIDs
when the queue was empty was because of layering violations and not,
like in the case of btree vacuum, because the index might be in a
different state after vacuuming the "last" block. Perhaps there is a
way to make the read stream restartable, though.

I just can't help wondering if there is a way to refactor the code
(potentially in a more invasive way) to make it more natural to use
the read stream API here. I usually hate when people give me such
unhelpful review feedback, though. So, carry on.

- Melanie

#12Andrey Borodin
amborodin@acm.org
In reply to: Melanie Plageman (#11)
Re: Using read_stream in index vacuum

On 22 Oct 2024, at 16:42, Melanie Plageman <melanieplageman@gmail.com> wrote:

Ah, right, the callback might return InvalidBlockNumber far before
we've actually read (and vacuumed) the blocks it is specifying.

I've discussed the issue with Thomas on PGConf.eu and he proposed to use stream reset.
PFA v3.

Best regards, Andrey Borodin.

Attachments:

v3-0001-Prototype-B-tree-vacuum-streamlineing.patchapplication/octet-stream; name=v3-0001-Prototype-B-tree-vacuum-streamlineing.patch; x-unix-mode=0644Download+34-13
#13Junwang Zhao
zhjwpku@gmail.com
In reply to: Andrey Borodin (#12)
Re: Using read_stream in index vacuum

On Wed, Oct 23, 2024 at 9:32 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 22 Oct 2024, at 16:42, Melanie Plageman <melanieplageman@gmail.com> wrote:

Ah, right, the callback might return InvalidBlockNumber far before
we've actually read (and vacuumed) the blocks it is specifying.

I've discussed the issue with Thomas on PGConf.eu and he proposed to use stream reset.
PFA v3.

Yeah, read_stream_reset fits better.

The patch LGTM, thanks.

Best regards, Andrey Borodin.

--
Regards
Junwang Zhao

#14Melanie Plageman
melanieplageman@gmail.com
In reply to: Andrey Borodin (#12)
Re: Using read_stream in index vacuum

On Wed, Oct 23, 2024 at 9:32 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 22 Oct 2024, at 16:42, Melanie Plageman <melanieplageman@gmail.com> wrote:

Ah, right, the callback might return InvalidBlockNumber far before
we've actually read (and vacuumed) the blocks it is specifying.

I've discussed the issue with Thomas on PGConf.eu and he proposed to use stream reset.

That approach seems promising.

PFA v3.

Note that you don't check if buf is valid here and break out of the
inner loop when it is invalid.

for (; scanblkno < num_pages; scanblkno++)
{
Buffer buf = read_stream_next_buffer(stream, NULL);
btvacuumpage(&vstate, buf);
...
}

By doing read_stream_reset() before you first invoke
read_stream_next_buffer(), seems like you invalidate the distance set
in read_stream_begin_relation()

if (flags & READ_STREAM_FULL)
stream->distance = Min(max_pinned_buffers, io_combine_limit);

->

stream->distance = 1 in read_stream_reset()

I still don't really like the inner loop using scanblkno:

for (; scanblkno < num_pages; scanblkno++)
{
Buffer buf = read_stream_next_buffer(stream, NULL);
btvacuumpage(&vstate, buf);
if (info->report_progress)
pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE,
scanblkno);
}

Since you already advance a block number in the callback, I find it
confusing to also use the block number as a loop condition here. I
think it would be clearer to loop on read_stream_next_buffer()
returning a valid buffer (and then move the progress reporting into
btvacuumpage()).

But, I think I would need to study this btree code more to do a more
informed review of the patch.

- Melanie

#15Andrey Borodin
amborodin@acm.org
In reply to: Melanie Plageman (#14)
Re: Using read_stream in index vacuum

Thanks!

On 23 Oct 2024, at 18:17, Melanie Plageman <melanieplageman@gmail.com> wrote:

Note that you don't check if buf is valid here and break out of the
inner loop when it is invalid.

I've added two asserts to clarify expectations.

By doing read_stream_reset() before you first invoke
read_stream_next_buffer(), seems like you invalidate the distance set
in read_stream_begin_relation()

I've moved reset so happy path have a proper distance.

Since you already advance a block number in the callback, I find it
confusing to also use the block number as a loop condition here. I
think it would be clearer to loop on read_stream_next_buffer()
returning a valid buffer (and then move the progress reporting into
btvacuumpage()).

I'll think how to restructure flow there...

I've also added a TAP test to check logic of stream resetting.

Best regards, Andrey Borodin.

Attachments:

v4-0001-Prototype-B-tree-vacuum-streamlineing.patchapplication/octet-stream; name=v4-0001-Prototype-B-tree-vacuum-streamlineing.patch; x-unix-mode=0644Download+34-13
v4-0004-Move-reset.patchapplication/octet-stream; name=v4-0004-Move-reset.patch; x-unix-mode=0644Download+5-6
v4-0003-Add-asserts-to-address-review-feedback.patchapplication/octet-stream; name=v4-0003-Add-asserts-to-address-review-feedback.patch; x-unix-mode=0644Download+2-1
v4-0002-Add-TAP-tests-that-verifies-B-tree-vacuum.patchapplication/octet-stream; name=v4-0002-Add-TAP-tests-that-verifies-B-tree-vacuum.patch; x-unix-mode=0644Download+87-1
#16Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#15)
Re: Using read_stream in index vacuum

On 23 Oct 2024, at 20:57, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

I'll think how to restructure flow there...

OK, I've understood how it should be structured. PFA v5. Sorry for the noise.

Best regards, Andrey Borodin.

Attachments:

v5-0001-Prototype-B-tree-vacuum-streamlineing.patchapplication/octet-stream; name=v5-0001-Prototype-B-tree-vacuum-streamlineing.patch; x-unix-mode=0644Download+130-19
#17Melanie Plageman
melanieplageman@gmail.com
In reply to: Andrey Borodin (#16)
Re: Using read_stream in index vacuum

On Wed, Oct 23, 2024 at 4:29 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 23 Oct 2024, at 20:57, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

I'll think how to restructure flow there...

OK, I've understood how it should be structured. PFA v5. Sorry for the noise.

I think this would be a bit nicer:

while (BufferIsValid(buf = read_stream_next_buffer(stream, NULL)))
{
block = btvacuumpage(&vstate, buf);
if (info->report_progress)
pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE, block);
}

Maybe change btvacuumpage() to return the block number to avoid the
extra BufferGetBlockNumber() calls (those add up).

While looking at this, I started to wonder if it isn't incorrect that
we are not calling pgstat_progress_update_param() for the blocks that
we backtrack and read in btvacuumpage() too (on master as well).
btvacuumpage() may actually have scanned more than one block, so...

Unrelated to code review, but btree index vacuum has the same issues
that kept us from committing streaming heap vacuum last release --
interactions between the buffer access strategy ring buffer size and
the larger reads -- one of which is an increase in the number of WAL
syncs and writes required. Thomas mentions it here [1]/messages/by-id/CA+hUKGKN3oy0bN_3yv8hd78a4+M1tJC9z7mD8+f+yA+GeoFUwQ@mail.gmail.com and here [2]/messages/by-id/CA+hUKGK1in4FiWtisXZ+Jo-cNSbWjmBcPww3w3DBM+whJTABXA@mail.gmail.com is
the thread where he proposes adding vectored writeback to address some
of these issues.

- Melanie

[1]: /messages/by-id/CA+hUKGKN3oy0bN_3yv8hd78a4+M1tJC9z7mD8+f+yA+GeoFUwQ@mail.gmail.com
[2]: /messages/by-id/CA+hUKGK1in4FiWtisXZ+Jo-cNSbWjmBcPww3w3DBM+whJTABXA@mail.gmail.com

#18Andrey Borodin
amborodin@acm.org
In reply to: Melanie Plageman (#17)
Re: Using read_stream in index vacuum

I've also added GiST vacuum to the patchset.

On 24 Oct 2024, at 01:04, Melanie Plageman <melanieplageman@gmail.com> wrote:

On Wed, Oct 23, 2024 at 4:29 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 23 Oct 2024, at 20:57, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

I'll think how to restructure flow there...

OK, I've understood how it should be structured. PFA v5. Sorry for the noise.

I think this would be a bit nicer:

while (BufferIsValid(buf = read_stream_next_buffer(stream, NULL)))
{
block = btvacuumpage(&vstate, buf);
if (info->report_progress)
pgstat_progress_update_param(PROGRESS_SCAN_BLOCKS_DONE, block);
}

OK.

Maybe change btvacuumpage() to return the block number to avoid the
extra BufferGetBlockNumber() calls (those add up).

Makes sense... now we have non-obvious function prototype, but I think it worth it.

While looking at this, I started to wonder if it isn't incorrect that
we are not calling pgstat_progress_update_param() for the blocks that
we backtrack and read in btvacuumpage() too (on master as well).
btvacuumpage() may actually have scanned more than one block, so...

It's correct, user wants to know progress and backtracks do not count towards progress.
Though, it must be relatevely infrequent case.

Unrelated to code review, but btree index vacuum has the same issues
that kept us from committing streaming heap vacuum last release --
interactions between the buffer access strategy ring buffer size and
the larger reads -- one of which is an increase in the number of WAL
syncs and writes required. Thomas mentions it here [1] and here [2] is
the thread where he proposes adding vectored writeback to address some
of these issues.

Do I need to do anything about it?

Thank you!

Best regards, Andrey Borodin.

Attachments:

v6-0001-Prototype-B-tree-vacuum-streamlineing.patchapplication/octet-stream; name=v6-0001-Prototype-B-tree-vacuum-streamlineing.patch; x-unix-mode=0644Download+131-22
v6-0002-Use-read_stream-in-GiST-vacuum.patchapplication/octet-stream; name=v6-0002-Use-read_stream-in-GiST-vacuum.patch; x-unix-mode=0644Download+33-14
#19Andrey Borodin
amborodin@acm.org
In reply to: Andrey Borodin (#18)
Re: Using read_stream in index vacuum

On 24 Oct 2024, at 10:15, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

I've also added GiST vacuum to the patchset.

I decided to add up a SP-GiST while having launched on pgconf.eu.

Best regards, Andrey Borodin.

Attachments:

v7-0001-Prototype-B-tree-vacuum-streamlineing.patchapplication/octet-stream; name=v7-0001-Prototype-B-tree-vacuum-streamlineing.patch; x-unix-mode=0644Download+131-22
v7-0002-Use-read_stream-in-GiST-vacuum.patchapplication/octet-stream; name=v7-0002-Use-read_stream-in-GiST-vacuum.patch; x-unix-mode=0644Download+33-14
v7-0003-Use-read_stream-in-SP-GiST-vacuum.patchapplication/octet-stream; name=v7-0003-Use-read_stream-in-SP-GiST-vacuum.patch; x-unix-mode=0644Download+28-11
#20Rahila Syed
rahilasyed90@gmail.com
In reply to: Andrey Borodin (#19)
Re: Using read_stream in index vacuum

Hi Andrey,

I ran the following test with
v7-0001-Prototype-B-tree-vacuum-streamlineing.patch
to measure the performance improvement.

--Table size of approx 2GB (Fits in RAM)
postgres=# create unlogged table x_big as select i from
generate_series(1,6e7) i;
SELECT 60000000
postgres=# create index on x_big(i);
CREATE INDEX
-- Perform updates to create dead tuples.
postgres=# do $$
declare
var int := 0;
begin
for counter in 1 .. 1e7 loop
var := (SELECT floor(random() * (1e7 - 1 + 1) * 1));
UPDATE x_big SET i = i + 5 WHERE i = var;
end loop;
end;
$$;
postgres=# CREATE EXTENSION pg_buffercache;
CREATE EXTENSION
-- Evict Postgres buffer cache for this relation.
postgres=# SELECT DISTINCT pg_buffercache_evict(bufferid)
FROM pg_buffercache
WHERE relfilenode = pg_relation_filenode('x_big');
pg_buffercache_evict
----------------------
t
(1 row)

postgres=# \timing on
Timing is on.
postgres=# vacuum x_big;
VACUUM

The timing does not seem to have improved with the patch.
Timing with the patch: Time: 9525.696 ms (00:09.526)
Timing without the patch: Time: 9468.739 ms (00:09.469)

While writing this email, I realized I evicted buffers for the table
and not the index. I will perform the test again. However,
I would like to know your opinion on whether this looks like
a valid test.

Thank you,
Rahila Syed

On Thu, Oct 24, 2024 at 4:45 PM Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:

Show quoted text

On 24 Oct 2024, at 10:15, Andrey M. Borodin <x4mmm@yandex-team.ru>

wrote:

I've also added GiST vacuum to the patchset.

I decided to add up a SP-GiST while having launched on pgconf.eu.

Best regards, Andrey Borodin.

#21Andrey Borodin
amborodin@acm.org
In reply to: Rahila Syed (#20)
#22Kirill Reshke
reshkekirill@gmail.com
In reply to: Andrey Borodin (#21)
#23Andrey Borodin
amborodin@acm.org
In reply to: Kirill Reshke (#22)
#24Kirill Reshke
reshkekirill@gmail.com
In reply to: Andrey Borodin (#23)
#25Melanie Plageman
melanieplageman@gmail.com
In reply to: Kirill Reshke (#24)
#26Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#25)
#27Andrey Borodin
amborodin@acm.org
In reply to: Melanie Plageman (#26)
#28Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#25)
#29Andrey Borodin
amborodin@acm.org
In reply to: Melanie Plageman (#28)
#30Melanie Plageman
melanieplageman@gmail.com
In reply to: Andrey Borodin (#29)
#31Andrey Borodin
amborodin@acm.org
In reply to: Melanie Plageman (#30)
#32Melanie Plageman
melanieplageman@gmail.com
In reply to: Andrey Borodin (#31)
#33Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#32)
#34Andrey Borodin
amborodin@acm.org
In reply to: Melanie Plageman (#33)
#35Andrey Borodin
amborodin@acm.org
In reply to: Melanie Plageman (#33)
#36Melanie Plageman
melanieplageman@gmail.com
In reply to: Andrey Borodin (#35)