Using read stream in autoprewarm

Started by Nazir Bilal Yavuzover 1 year ago44 messages

byavuz81@gmail.com

over 1 year ago

Hi,

I am working on using the read stream in autoprewarm. I observed ~10%
performance gain with this change. The patch is attached.

The downside of the read stream approach is that a new read stream
object needs to be created for each database, relation and fork. I was
wondering if this would cause a regression but it did not (at least
depending on results of my testing). Another downside could be the
code getting complicated.

For the testing,
- I created 50 databases with each of them having 50 tables and the
size of the tables are 520KB.
- patched: 51157 ms
- master: 56769 ms
- I created 5 databases with each of them having 1 table and the size
of the tables are 3GB.
- patched: 32679 ms
- master: 36706 ms

I put debugging message with timing information in
autoprewarm_database_main() function, then run autoprewarm 100 times
(by restarting the server) and cleared the OS cache before each
restart. Also, I ensured that the block number of the buffer returning
from the read stream API is correct. I am not sure if that much
testing is enough for this kind of change.

Any feedback would be appreciated.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Andrey Borodin

amborodin@acm.org

over 1 year ago

In reply to: Nazir Bilal Yavuz (#1)

Re: Using read stream in autoprewarm

On 8 Aug 2024, at 11:32, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

Any feedback would be appreciated.

I've took a look into the patch. It seems to me that you add new block numbers to the read stream until you have buffers. So when there are no more buffers you will still have some queued blocks.
Maybe can you change the logic so that number of free buffers must be enough to allocate all blocks in look-ahead distance?

Thanks!

Best regards, Andrey Borodin.

Stepan Neretin

sndcppg@gmail.com

over 1 year ago

In reply to: Andrey Borodin (#2)

Re: Using read stream in autoprewarm

Dear Nazir,

At first A quick look it looks good. I will take a closer look at it
tomorrow. Could you please let me know about the performance tests and
graphics?

Best regards, Stepan Neretin!

Nazir Bilal Yavuz

byavuz81@gmail.com

over 1 year ago

In reply to: Andrey Borodin (#2)

Re: Using read stream in autoprewarm

Hi,

Thanks for looking into this!

On Thu, 31 Oct 2024 at 21:18, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 8 Aug 2024, at 11:32, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

Any feedback would be appreciated.

I've took a look into the patch. It seems to me that you add new block numbers to the read stream until you have buffers. So when there are no more buffers you will still have some queued blocks.
Maybe can you change the logic so that number of free buffers must be enough to allocate all blocks in look-ahead distance?

I see what you mean. When the have_free_buffer() function returns
false in the callback, there are still queued blocks in the stream
although there are no free buffers in the buffer pool. I think the
best way to solve this is to get the number of free buffers in the
buffer pool by 'BufferStrategyControl.lastFreeBuffer -
BufferStrategyControl.firstFreeBuffer' and then compare it with
'stream->pending_read_nblocks'. When the 'stream->pending_read_nblocks
== number_of_free_buffers_in_buffer_pool', end the stream. The problem
with that is stream->pending_read_nblocks isn't public, also I am not
sure whether 'BufferStrategyControl.lastFreeBuffer -
BufferStrategyControl.firstFreeBuffer' is safe to use.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Nazir Bilal Yavuz

byavuz81@gmail.com

over 1 year ago

In reply to: Stepan Neretin (#3)

Re: Using read stream in autoprewarm

Hi,

Thanks for looking into this!

On Thu, 31 Oct 2024 at 22:02, Stepan Neretin <sndcppg@gmail.com> wrote:

At first A quick look it looks good. I will take a closer look at it tomorrow. Could you please let me know about the performance tests and graphics?

Sorry but I didn't understand what you mean by performance tests and
graphics. Do you want something else than the information in the first
email [1]postgr.es/m/CAN55FZ3n8Gd%2BhajbL%3D5UkGzu_aHGRqnn%2BxktXq2fuds%3D1AOR6Q%40mail.gmail.com?

[1]: postgr.es/m/CAN55FZ3n8Gd%2BhajbL%3D5UkGzu_aHGRqnn%2BxktXq2fuds%3D1AOR6Q%40mail.gmail.com

--
Regards,
Nazir Bilal Yavuz
Microsoft

Andrey Borodin

amborodin@acm.org

over 1 year ago

In reply to: Nazir Bilal Yavuz (#4)

Re: Using read stream in autoprewarm

On 1 Nov 2024, at 12:51, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

am not
sure whether 'BufferStrategyControl.lastFreeBuffer -
BufferStrategyControl.firstFreeBuffer' is safe to use.

Ugh... it will work. But it seems to me too dirty hack. There's no scalable way to know size of a free list.
Let's just comment that we might read some more buffers if database does not fit into memory?
Alternatively we can count size of a free list on the start.

Best regards, Andrey Borodin.

Nazir Bilal Yavuz

byavuz81@gmail.com

over 1 year ago

In reply to: Andrey Borodin (#6)

Re: Using read stream in autoprewarm

Hi,

On Fri, 1 Nov 2024 at 21:06, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 1 Nov 2024, at 12:51, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

am not
sure whether 'BufferStrategyControl.lastFreeBuffer -
BufferStrategyControl.firstFreeBuffer' is safe to use.

Ugh... it will work. But it seems to me too dirty hack. There's no scalable way to know size of a free list.
Let's just comment that we might read some more buffers if database does not fit into memory?
Alternatively we can count size of a free list on the start.

I agree that it is too dirty to hack. There is a minor problem with
the counting size of a free list on the start. There may be other
processes that fill the buffer pool concurrently, so we can still end
up doing unnecessary I/Os. That said, I believe this approach remains
an improvement.

The first patch includes the comment you suggested, and the second
patch implements counting size of a free list on the start.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Matheus Alcantara

mths.dev@pm.me

over 1 year ago

In reply to: Nazir Bilal Yavuz (#1)

Re: Using read stream in autoprewarm

Hi,

Newer reviewer here, trying to understand more about the read stream API.

On Tuesday, November 26th, 2024 at 11:07 AM, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

Any feedback would be appreciated.

I've executed the same test of 5 databases with each of them having 1 table of
3GB of size and I've got very similar results.

I've also tested using a single database with 4 tables with ~60GB of size and
the results compared with master was more closer but still an improvement. Note
that I've also increased the default shared_buffers to 7GB to see how it works
with large buffer pools.
- patched: 5.4259 s
- master: 5.53186 s

Not to much to say about the code, I'm currently learning more about the read
stream API and Postgresql hacking itself. Just some minor points and questions
about the patches.

v2-0002-Count-free-buffers-at-the-start-of-the-autoprewar.patch
--- a/src/backend/storage/buffer/freelist.c
+/*
+ * get_number_of_free_buffers -- a lockless way to get the number of free
+ *								 buffers in buffer pool.
+ *
+ * Note that result continuosly changes as free buffers are moved out by other
+ * operations.
+ */
+int
+get_number_of_free_buffers(void)

typo on continuosly -> continuously

v2-0001-Use-read-stream-in-autoprewarm.patch
+	bool	   *rs_have_free_buffer = per_buffer_data;
+
+
+	*rs_have_free_buffer = true;
+

Not sure if I understand why this variable is needed, it seems that it is only
written and never read? Just as comparison, the block_range_read_stream_cb
callback used on pg_prewarm seems to not use the per_buffer_data parameter.

--
Matheus Alcantara
EDB: https://www.enterprisedb.com

Nazir Bilal Yavuz

byavuz81@gmail.com

over 1 year ago

In reply to: Matheus Alcantara (#8)

Re: Using read stream in autoprewarm

Hi,

Thank you for looking into this!

On Wed, 27 Nov 2024 at 16:50, Matheus Alcantara <mths.dev@pm.me> wrote:

I've executed the same test of 5 databases with each of them having 1 table of
3GB of size and I've got very similar results.

I've also tested using a single database with 4 tables with ~60GB of size and
the results compared with master was more closer but still an improvement. Note
that I've also increased the default shared_buffers to 7GB to see how it works
with large buffer pools.
- patched: 5.4259 s
- master: 5.53186 s

Thanks for the testing.

Not to much to say about the code, I'm currently learning more about the read
stream API and Postgresql hacking itself. Just some minor points and questions
about the patches.
v2-0002-Count-free-buffers-at-the-start-of-the-autoprewar.patch
--- a/src/backend/storage/buffer/freelist.c
+/*
+ * get_number_of_free_buffers -- a lockless way to get the number of free
+ *                                                              buffers in buffer pool.
+ *
+ * Note that result continuosly changes as free buffers are moved out by other
+ * operations.
+ */
+int
+get_number_of_free_buffers(void)
typo on continuosly -> continuously

Done.

v2-0001-Use-read-stream-in-autoprewarm.patch
+       bool       *rs_have_free_buffer = per_buffer_data;
+
+
+       *rs_have_free_buffer = true;
+
Not sure if I understand why this variable is needed, it seems that it is only
written and never read? Just as comparison, the block_range_read_stream_cb
callback used on pg_prewarm seems to not use the per_buffer_data parameter.

Actually, it is read in the main loop of the
autoprewarm_database_main() function:

/* There are no free buffers left in shared buffers, break the loop. */
else if (!(*rs_have_free_buffer))
break;

apw_read_stream_next_block() callback function sets
rs_have_free_buffer's value to false when there is no free buffer left
in the shared buffers. And the code above terminates the main loop in
the autoprewarm_database_main() function when it is set to false.

block_range_read_stream_cb() callback is used when the callback only
needs to loop over the block numbers. However, for the autoprewarm
case; the callback function needs to do additional checks so another
callback and the use of this variable are required.

v3 is attached.

--
Regards,
Nazir Bilal Yavuz
Microsoft

#10

Matheus Alcantara

matheusssilv97@gmail.com

over 1 year ago

In reply to: Nazir Bilal Yavuz (#9)

Re: Using read stream in autoprewarm

On Wednesday, November 27th, 2024 at 11:19 AM, Nazir Bilal Yavuz
<byavuz81@gmail.com> wrote:

v2-0001-Use-read-stream-in-autoprewarm.patch
+ bool *rs_have_free_buffer = per_buffer_data;
+
+
+ *rs_have_free_buffer = true;
+
Not sure if I understand why this variable is needed, it seems that

it is only

written and never read? Just as comparison, the

block_range_read_stream_cb

callback used on pg_prewarm seems to not use the per_buffer_data

parameter.

Actually, it is read in the main loop of the
autoprewarm_database_main() function:

/* There are no free buffers left in shared buffers, break the loop. */
else if (!(*rs_have_free_buffer))
break;

apw_read_stream_next_block() callback function sets
rs_have_free_buffer's value to false when there is no free buffer left
in the shared buffers. And the code above terminates the main loop in
the autoprewarm_database_main() function when it is set to false.

block_range_read_stream_cb() callback is used when the callback only
needs to loop over the block numbers. However, for the autoprewarm
case; the callback function needs to do additional checks so another
callback and the use of this variable are required.

Ohh, I see, thanks very much for the explanation.

v3 is attached.

Thanks.

I don't know if there is another way that this patch could be tested?
Looking
forward on other reviews on this.

--
Matheus Alcantara
EDB: https://www.enterprisedb.com

#11

Kirill Reshke

reshkekirill@gmail.com

over 1 year ago

In reply to: Nazir Bilal Yavuz (#9)

Re: Using read stream in autoprewarm

On Wed, 27 Nov 2024 at 19:20, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

Hi,

Thank you for looking into this!

On Wed, 27 Nov 2024 at 16:50, Matheus Alcantara <mths.dev@pm.me> wrote:

I've executed the same test of 5 databases with each of them having 1 table of
3GB of size and I've got very similar results.

I've also tested using a single database with 4 tables with ~60GB of size and
the results compared with master was more closer but still an improvement. Note
that I've also increased the default shared_buffers to 7GB to see how it works
with large buffer pools.
- patched: 5.4259 s
- master: 5.53186 s

Thanks for the testing.
Not to much to say about the code, I'm currently learning more about the read
stream API and Postgresql hacking itself. Just some minor points and questions
about the patches.
v2-0002-Count-free-buffers-at-the-start-of-the-autoprewar.patch
--- a/src/backend/storage/buffer/freelist.c
+/*
+ * get_number_of_free_buffers -- a lockless way to get the number of free
+ *                                                              buffers in buffer pool.
+ *
+ * Note that result continuosly changes as free buffers are moved out by other
+ * operations.
+ */
+int
+get_number_of_free_buffers(void)
typo on continuosly -> continuously
Done.
v2-0001-Use-read-stream-in-autoprewarm.patch
+       bool       *rs_have_free_buffer = per_buffer_data;
+
+
+       *rs_have_free_buffer = true;
+
Not sure if I understand why this variable is needed, it seems that it is only
written and never read? Just as comparison, the block_range_read_stream_cb
callback used on pg_prewarm seems to not use the per_buffer_data parameter.
Actually, it is read in the main loop of the
autoprewarm_database_main() function:

/* There are no free buffers left in shared buffers, break the loop. */
else if (!(*rs_have_free_buffer))
break;

apw_read_stream_next_block() callback function sets
rs_have_free_buffer's value to false when there is no free buffer left
in the shared buffers. And the code above terminates the main loop in
the autoprewarm_database_main() function when it is set to false.

block_range_read_stream_cb() callback is used when the callback only
needs to loop over the block numbers. However, for the autoprewarm
case; the callback function needs to do additional checks so another
callback and the use of this variable are required.

v3 is attached.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Hi!

+ old_blk = &(p->block_info[p->pos - 1]);
+ cur_blk = &(p->block_info[p->pos]);

Should we Assert(p->pos > 0 && p->pos < *something*)

Patch tested with no regression.

--
Best regards,
Kirill Reshke

#12

Nazir Bilal Yavuz

byavuz81@gmail.com

over 1 year ago

In reply to: Kirill Reshke (#11)

Re: Using read stream in autoprewarm

Hi,

Thank you for looking into this.

On Fri, 29 Nov 2024 at 06:55, Kirill Reshke <reshkekirill@gmail.com> wrote:

+ old_blk = &(p->block_info[p->pos - 1]);
+ cur_blk = &(p->block_info[p->pos]);
Should we Assert(p->pos > 0 && p->pos < *something*)

I think it is worth adding:

+ Assert(p->pos > 0 && p->pos < p->max_pos);
+
+ old_blk = &(p->block_info[p->pos - 1]);
+ cur_blk = &(p->block_info[p->pos]);

v4 is attached.

--
Regards,
Nazir Bilal Yavuz
Microsoft

#13

Kirill Reshke

reshkekirill@gmail.com

over 1 year ago

In reply to: Nazir Bilal Yavuz (#12)

Re: Using read stream in autoprewarm

On Fri, 29 Nov 2024 at 16:19, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

v4 is attached.

Hi!

I feel like we are ready to mark this as RFC, WDYT?

--
Best regards,
Kirill Reshke

#14

Andrey Borodin

amborodin@acm.org

over 1 year ago

In reply to: Kirill Reshke (#13)

Re: Using read stream in autoprewarm

On 2 Dec 2024, at 16:16, Kirill Reshke <reshkekirill@gmail.com> wrote:

I feel like we are ready to mark this as RFC, WDYT?

Best regards, Andrey Borodin.

#15

Nazir Bilal Yavuz

byavuz81@gmail.com

over 1 year ago

In reply to: Andrey Borodin (#14)

Re: Using read stream in autoprewarm

Hi,

On Mon, 2 Dec 2024 at 16:30, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 2 Dec 2024, at 16:16, Kirill Reshke <reshkekirill@gmail.com> wrote:

I feel like we are ready to mark this as RFC, WDYT?

+1

Done.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Using read stream in autoprewarm

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: