Streaming read-ready sequential scan code

Started by Melanie Plagemanabout 2 years ago52 messages

melanieplageman@gmail.com

about 2 years ago

Hi,

Last year, David and I worked on a round of refactoring for
heapgettup() and heapgettup_pagemode() [1]/messages/by-id/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com. Now that the streaming
read API has been proposed [2]/messages/by-id/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com, there is a bit more refactoring that
can be done on master to prepare sequential scan to support streaming
reads.

Patches 0001 and 0002 in the attached patchset do this new round of
refactoring. 0003 is the remainder of the streaming read API that is
not yet in master. 0004 is the sequential scan streaming read user.

The primary change needed to be able to drop in streaming read support
was that heapgettup() and heapgettup_pagemode() have to wait for there
to be no more valid buffers instead of waiting until there were no
more valid BlockNumbers to know that the relation has been entirely
processed. Naturally, streaming reads prefetch ahead of the block
being currently processed by the scan, so all blocks should have been
requested long before all blocks have been processed.

To change this, I split up heapgetpage() into two functions -- one
responsible for getting blocks into buffers and the other for
processing a page (pruning, checking tuple visibility, etc). As a
consequence, I had to change the other caller of heapgetpage() (sample
scans). Since I was doing this anyway, I made a few changes there. It
is arguable that those changes could be split up differently between
0001 and 0004. However, I wanted 0004 to be *only* the sequential scan
streaming read user code.

There is an outstanding question about where to allocate the
PgStreamingRead object for sequential scans (see TODO in 0004).
However, I thought I would keep this thread focused on 0001 and 0002.

Though logically the performance with 0001 and 0002 should be the same
as master (no new non-inline function calls, no additional looping),
I've done a bit of profiling anyway. I created a large multi-GB table,
read it all into shared buffers (disabling the large sequential scan
bulkread optimization), and did a sequential SELECT count(*) from the
table. From the profiles below, you'll notice that master and the
patch are basically the same. Actual percentages vary from run-to-run.
Execution time is the same.

patch
15.49% postgres postgres [.] ExecInterpExpr
11.03% postgres postgres [.] heapgettup_pagemode
10.85% postgres postgres [.] ExecStoreBufferHeapTuple
9.14% postgres postgres [.] heap_getnextslot
8.39% postgres postgres [.] heapbuildvis
6.47% postgres postgres [.] SeqNext

master
14.16% postgres postgres [.] ExecInterpExpr
11.54% postgres postgres [.] heapgettup_pagemode
10.63% postgres postgres [.] ExecStoreBufferHeapTuple
10.22% postgres postgres [.] heap_getnextslot
8.53% postgres postgres [.] heapgetpage
5.35% postgres postgres [.] SeqNext

- Melanie

[1]: /messages/by-id/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
[2]: /messages/by-id/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com

David Rowley

dgrowleyml@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#1)

Re: Streaming read-ready sequential scan code

On Tue, 30 Jan 2024 at 10:17, Melanie Plageman
<melanieplageman@gmail.com> wrote:

Though logically the performance with 0001 and 0002 should be the same
as master (no new non-inline function calls, no additional looping),
I've done a bit of profiling anyway. I created a large multi-GB table,
read it all into shared buffers (disabling the large sequential scan
bulkread optimization), and did a sequential SELECT count(*) from the
table. From the profiles below, you'll notice that master and the
patch are basically the same. Actual percentages vary from run-to-run.
Execution time is the same.

Can you also run a test on a Seqscan with a filter that filters out
all tuples? There's less overhead in other parts of the executor with
such a query.

David

Melanie Plageman

melanieplageman@gmail.com

about 2 years ago

In reply to: David Rowley (#2)

Re: Streaming read-ready sequential scan code

On Mon, Jan 29, 2024 at 4:24 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 30 Jan 2024 at 10:17, Melanie Plageman
<melanieplageman@gmail.com> wrote:

Though logically the performance with 0001 and 0002 should be the same
as master (no new non-inline function calls, no additional looping),
I've done a bit of profiling anyway. I created a large multi-GB table,
read it all into shared buffers (disabling the large sequential scan
bulkread optimization), and did a sequential SELECT count(*) from the
table. From the profiles below, you'll notice that master and the
patch are basically the same. Actual percentages vary from run-to-run.
Execution time is the same.

Can you also run a test on a Seqscan with a filter that filters out
all tuples? There's less overhead in other parts of the executor with
such a query.

Yes, of course. Thank you so much for taking a look!

While I was at it, I changed the table schema to be entirely composed
of INT type columns and regenerated the data. Note that, both in this
example and my previous example, I ensured that the table was vacuumed
beforehand (and autovacuum disabled for the table) so there wasn't any
on-access pruning happening (heapgetpage() does that in pagemode).

This is the schema
CREATE TABLE foo(id INT, a INT, b INT, c INT, d INT, e INT, f INT, g
INT) with (autovacuum_enabled=false);

I added 46000000 rows to the table, making it 2.6 GB. Shared buffers
is double that. Before profiling, I did a SELECT * from the table with
the large sequential scan bulkread optimization disabled. Then I
vacuumed the table. Finally, I turned up parallel_setup_cost high
enough to disable query parallelism.

The query I profiled was:
SELECT * FROM foo WHERE id = 0;
With the data I generated, 0 rows match that condition.

Profiles below. Execution time essentially the same.

patch:
17.08% postgres postgres [.] ExecInterpExpr
11.17% postgres postgres [.] tts_buffer_heap_getsomeattrs
10.64% postgres postgres [.] ExecStoreBufferHeapTuple
9.82% postgres postgres [.] heap_getnextslot
9.13% postgres postgres [.] heapgettup_pagemode
8.98% postgres postgres [.] heapbuildvis
5.40% postgres postgres [.] HeapCheckForSerializableConflictOut
5.16% postgres postgres [.] SeqNext

master:
17.89% postgres postgres [.] ExecInterpExpr
12.28% postgres postgres [.] tts_buffer_heap_getsomeattrs
10.54% postgres postgres [.] ExecStoreBufferHeapTuple
10.11% postgres postgres [.] heapgettup_pagemode
8.52% postgres postgres [.] heapgetpage
8.28% postgres postgres [.] heap_getnextslot
5.00% postgres postgres [.] HeapCheckForSerializableConflictOut
4.71% postgres postgres [.] SeqNext

- Melanie

Melanie Plageman

melanieplageman@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#1)

Re: Streaming read-ready sequential scan code

On Mon, Jan 29, 2024 at 4:17 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

There is an outstanding question about where to allocate the
PgStreamingRead object for sequential scans

I've written three alternative implementations of the actual streaming
read user for sequential scan which handle the question of where to
allocate the streaming read object and how to handle changing scan
direction in different ways.

Option A) https://github.com/melanieplageman/postgres/tree/seqscan_pgsr_initscan_allocation
- Allocates the streaming read object in initscan(). Since we do not
know the scan direction at this time, if the scan ends up not being a
forwards scan, the streaming read object must later be freed -- so
this will sometimes allocate a streaming read object it never uses.
- Only supports ForwardScanDirection and once the scan direction
changes, streaming is never supported again -- even if we return to
ForwardScanDirection
- Must maintain a "fallback" codepath that does not use the streaming read API

Option B) https://github.com/melanieplageman/postgres/tree/seqscan_pgsr_heapgettup_alloc_forward_only
- Allocates the streaming read object in heapgettup[_pagemode]() when
it has not been previously allocated. To do this it has to record and
switch into a different memory context than the per-tuple context. It
only allocates the streaming read object if it is a forwards scan. It
frees the streaming read object if the scan direction is later
changed.
- Only supports ForwardScanDirection and once the scan direction
changes, streaming is never supported again -- even if we return to
ForwardScanDirection
- Must maintain a "fallback" codepath that does not use the streaming read API

Option C) https://github.com/melanieplageman/postgres/tree/seqscan_pgsr_all_dir_stream
- Allocates the streaming read object in heapgettup[_pagemode]() when
it has not been previously allocated. To do this it has to record and
switch into a different memory context than the per-tuple context.
- All scan directions support streaming. To do this, the scan
direction has to be tracked and we must check if the direction has
changed on every heapgettup[_pagemode]() invocation to avoid returning
wrong results.
- No "fallback" codepath as all heap sequential scans will use the
streaming read API

As you can see, each option has pros and cons. I'm interested in what
others think about which we should choose.

- Melanie

Robert Haas

robertmhaas@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#4)

Re: Streaming read-ready sequential scan code

On Tue, Feb 20, 2024 at 4:35 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I've written three alternative implementations of the actual streaming
read user for sequential scan which handle the question of where to
allocate the streaming read object and how to handle changing scan
direction in different ways.

It's weird to me that the prospect of changing the scan direction
causes such complexity. I mean, why doesn't a streaming read object
have a forget_all_my_previous_requests() method or somesuch?

--
Robert Haas
EDB: http://www.enterprisedb.com

Melanie Plageman

melanieplageman@gmail.com

about 2 years ago

In reply to: Robert Haas (#5)

Re: Streaming read-ready sequential scan code

On Tue, Feb 20, 2024 at 6:13 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Feb 20, 2024 at 4:35 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I've written three alternative implementations of the actual streaming
read user for sequential scan which handle the question of where to
allocate the streaming read object and how to handle changing scan
direction in different ways.

It's weird to me that the prospect of changing the scan direction
causes such complexity. I mean, why doesn't a streaming read object
have a forget_all_my_previous_requests() method or somesuch?

Basically, that is what pg_streaming_read_free() does. It goes through
and releases the buffers it had pinned and frees any per-buffer data
allocated.

The complexity with the sequential scan streaming read user and scan
direction is just that it has to detect when the scan direction
changes and do the releasing/freeing and reallocation. The scan
direction is passed to heapgettup[_pagemode](), so this is something
that can change on a tuple-to-tuple basis.

It is less that doing this is complicated and more that it is annoying
and distracting to have to check for and handle a very unimportant and
uncommon case in the main path of the common case.

- Melanie

Melanie Plageman

melanieplageman@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#4)

Re: Streaming read-ready sequential scan code

On Mon, Feb 19, 2024 at 6:05 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Mon, Jan 29, 2024 at 4:17 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

There is an outstanding question about where to allocate the
PgStreamingRead object for sequential scans

I've written three alternative implementations of the actual streaming
read user for sequential scan which handle the question of where to
allocate the streaming read object and how to handle changing scan
direction in different ways.

Option A) https://github.com/melanieplageman/postgres/tree/seqscan_pgsr_initscan_allocation
- Allocates the streaming read object in initscan(). Since we do not
know the scan direction at this time, if the scan ends up not being a
forwards scan, the streaming read object must later be freed -- so
this will sometimes allocate a streaming read object it never uses.
- Only supports ForwardScanDirection and once the scan direction
changes, streaming is never supported again -- even if we return to
ForwardScanDirection
- Must maintain a "fallback" codepath that does not use the streaming read API

Attached is a version of this patch which implements a "reset"
function for the streaming read API which should be cheaper than the
full pg_streaming_read_free() on rescan. This can easily be ported to
work on any of my proposed implementations (A/B/C). I implemented it
on A as an example.

- Melanie

Melanie Plageman

melanieplageman@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#7)

Re: Streaming read-ready sequential scan code

On Mon, Feb 26, 2024 at 03:56:57PM -0500, Melanie Plageman wrote:

On Mon, Feb 19, 2024 at 6:05 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Mon, Jan 29, 2024 at 4:17 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

There is an outstanding question about where to allocate the
PgStreamingRead object for sequential scans

I've written three alternative implementations of the actual streaming
read user for sequential scan which handle the question of where to
allocate the streaming read object and how to handle changing scan
direction in different ways.

Option A) https://github.com/melanieplageman/postgres/tree/seqscan_pgsr_initscan_allocation
- Allocates the streaming read object in initscan(). Since we do not
know the scan direction at this time, if the scan ends up not being a
forwards scan, the streaming read object must later be freed -- so
this will sometimes allocate a streaming read object it never uses.
- Only supports ForwardScanDirection and once the scan direction
changes, streaming is never supported again -- even if we return to
ForwardScanDirection
- Must maintain a "fallback" codepath that does not use the streaming read API

Attached is a version of this patch which implements a "reset"
function for the streaming read API which should be cheaper than the
full pg_streaming_read_free() on rescan. This can easily be ported to
work on any of my proposed implementations (A/B/C). I implemented it
on A as an example.

Attached is the latest version of this patchset -- rebased in light of
Thomas' updatees to the streaming read API [1]/messages/by-id/CA+hUKGJtLyxcAEvLhVUhgD4fMQkOu3PDaj8Qb9SR_UsmzgsBpQ@mail.gmail.com. I have chosen the
approach I think we should go with. It is a hybrid of my previously
proposed approaches.

The streaming read is allocated in heap_beginscan() and then reset on
rescan and when the scan direction changes. I only check if the scan
direction changes when a new page is needed. This implementation means
no fallback method is needed, so we can remove the non-streaming read
code for heap sequential scans.

Because heapgettup() and heapgettup_pagemode() are also used for TID
range scans, this patch also happens to implement streaming reads for
TID range scans.

- Melanie

[1]: /messages/by-id/CA+hUKGJtLyxcAEvLhVUhgD4fMQkOu3PDaj8Qb9SR_UsmzgsBpQ@mail.gmail.com

Melanie Plageman

melanieplageman@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#8)

Re: Streaming read-ready sequential scan code

On Wed, Feb 28, 2024 at 12:30 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Mon, Feb 26, 2024 at 03:56:57PM -0500, Melanie Plageman wrote:

On Mon, Feb 19, 2024 at 6:05 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Mon, Jan 29, 2024 at 4:17 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

There is an outstanding question about where to allocate the
PgStreamingRead object for sequential scans

I've written three alternative implementations of the actual streaming
read user for sequential scan which handle the question of where to
allocate the streaming read object and how to handle changing scan
direction in different ways.

Option A) https://github.com/melanieplageman/postgres/tree/seqscan_pgsr_initscan_allocation
- Allocates the streaming read object in initscan(). Since we do not
know the scan direction at this time, if the scan ends up not being a
forwards scan, the streaming read object must later be freed -- so
this will sometimes allocate a streaming read object it never uses.
- Only supports ForwardScanDirection and once the scan direction
changes, streaming is never supported again -- even if we return to
ForwardScanDirection
- Must maintain a "fallback" codepath that does not use the streaming read API

Attached is a version of this patch which implements a "reset"
function for the streaming read API which should be cheaper than the
full pg_streaming_read_free() on rescan. This can easily be ported to
work on any of my proposed implementations (A/B/C). I implemented it
on A as an example.

Attached is the latest version of this patchset -- rebased in light of
Thomas' updatees to the streaming read API [1]. I have chosen the
approach I think we should go with. It is a hybrid of my previously
proposed approaches.

While investigating some performance concerns, Andres pointed out that
the members I add to HeapScanDescData in this patch push rs_cindex and
rs_ntuples to another cacheline and introduce a 4-byte hole. Attached
v4's HeapScanDescData is as well-packed as on master and its members
are reordered so that rs_cindex and rs_ntuples are back on the second
cacheline of the struct's data.

- Melanie

#10

Melanie Plageman

melanieplageman@gmail.com

about 2 years ago

In reply to: Melanie Plageman (#9)

Re: Streaming read-ready sequential scan code

On Sat, Mar 02, 2024 at 06:07:48PM -0500, Melanie Plageman wrote:

On Wed, Feb 28, 2024 at 12:30 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Mon, Feb 26, 2024 at 03:56:57PM -0500, Melanie Plageman wrote:

On Mon, Feb 19, 2024 at 6:05 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Mon, Jan 29, 2024 at 4:17 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

There is an outstanding question about where to allocate the
PgStreamingRead object for sequential scans

I've written three alternative implementations of the actual streaming
read user for sequential scan which handle the question of where to
allocate the streaming read object and how to handle changing scan
direction in different ways.

Option A) https://github.com/melanieplageman/postgres/tree/seqscan_pgsr_initscan_allocation
- Allocates the streaming read object in initscan(). Since we do not
know the scan direction at this time, if the scan ends up not being a
forwards scan, the streaming read object must later be freed -- so
this will sometimes allocate a streaming read object it never uses.
- Only supports ForwardScanDirection and once the scan direction
changes, streaming is never supported again -- even if we return to
ForwardScanDirection
- Must maintain a "fallback" codepath that does not use the streaming read API

Attached is a version of this patch which implements a "reset"
function for the streaming read API which should be cheaper than the
full pg_streaming_read_free() on rescan. This can easily be ported to
work on any of my proposed implementations (A/B/C). I implemented it
on A as an example.

Attached is the latest version of this patchset -- rebased in light of
Thomas' updatees to the streaming read API [1]. I have chosen the
approach I think we should go with. It is a hybrid of my previously
proposed approaches.

While investigating some performance concerns, Andres pointed out that
the members I add to HeapScanDescData in this patch push rs_cindex and
rs_ntuples to another cacheline and introduce a 4-byte hole. Attached
v4's HeapScanDescData is as well-packed as on master and its members
are reordered so that rs_cindex and rs_ntuples are back on the second
cacheline of the struct's data.

I did some additional profiling and realized that dropping the
unlikely() from the places we check rs_inited frequently was negatively
impacting performance. v5 adds those back and also makes a few other
very minor cleanups.

Note that this patch set has a not yet released version of Thomas
Munro's Streaming Read API with a new ramp-up logic which seems to fix a
performance issue I saw with my test case when all of the sequential
scan's blocks are in shared buffers. Once he sends the official new
version, I will rebase this and point to his explanation in that thread.

- Melanie

#11

Melanie Plageman

melanieplageman@gmail.com

almost 2 years ago

In reply to: Melanie Plageman (#10)

Re: Streaming read-ready sequential scan code

On Fri, Mar 8, 2024 at 4:56 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Sat, Mar 02, 2024 at 06:07:48PM -0500, Melanie Plageman wrote:

On Wed, Feb 28, 2024 at 12:30 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Mon, Feb 26, 2024 at 03:56:57PM -0500, Melanie Plageman wrote:

On Mon, Feb 19, 2024 at 6:05 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Mon, Jan 29, 2024 at 4:17 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

There is an outstanding question about where to allocate the
PgStreamingRead object for sequential scans

I've written three alternative implementations of the actual streaming
read user for sequential scan which handle the question of where to
allocate the streaming read object and how to handle changing scan
direction in different ways.

Option A) https://github.com/melanieplageman/postgres/tree/seqscan_pgsr_initscan_allocation
- Allocates the streaming read object in initscan(). Since we do not
know the scan direction at this time, if the scan ends up not being a
forwards scan, the streaming read object must later be freed -- so
this will sometimes allocate a streaming read object it never uses.
- Only supports ForwardScanDirection and once the scan direction
changes, streaming is never supported again -- even if we return to
ForwardScanDirection
- Must maintain a "fallback" codepath that does not use the streaming read API

Attached is a version of this patch which implements a "reset"
function for the streaming read API which should be cheaper than the
full pg_streaming_read_free() on rescan. This can easily be ported to
work on any of my proposed implementations (A/B/C). I implemented it
on A as an example.

Attached is the latest version of this patchset -- rebased in light of
Thomas' updatees to the streaming read API [1]. I have chosen the
approach I think we should go with. It is a hybrid of my previously
proposed approaches.

While investigating some performance concerns, Andres pointed out that
the members I add to HeapScanDescData in this patch push rs_cindex and
rs_ntuples to another cacheline and introduce a 4-byte hole. Attached
v4's HeapScanDescData is as well-packed as on master and its members
are reordered so that rs_cindex and rs_ntuples are back on the second
cacheline of the struct's data.

I did some additional profiling and realized that dropping the
unlikely() from the places we check rs_inited frequently was negatively
impacting performance. v5 adds those back and also makes a few other
very minor cleanups.

Note that this patch set has a not yet released version of Thomas
Munro's Streaming Read API with a new ramp-up logic which seems to fix a
performance issue I saw with my test case when all of the sequential
scan's blocks are in shared buffers. Once he sends the official new
version, I will rebase this and point to his explanation in that thread.

Attached v6 has the version of the streaming read API mentioned here
[1]: /messages/by-id/CA+hUKGJTwrS7F=uJPx3SeigMiQiW+LJaOkjGyZdCntwyMR=uAw@mail.gmail.com
investigated in that thread by Andres, Bilal, and Thomas.

The one outstanding item for the sequential scan streaming read user
is deciding how the BAS_BULKREAD buffer access strategy should
interact with the streaming read infrastructure. We discussed a bit
off-list, and it seems clear that the ring must be at least as large
as io_combine_limit. This should be no problem for BAS_BULKREAD
because its ring is 16 MB. The question is whether or not we need to
do anything right now to ensure there aren't adverse interactions
between io_combine_limit, max_ios, and the buffer access strategy ring
buffer size.

- Melanie

[1]: /messages/by-id/CA+hUKGJTwrS7F=uJPx3SeigMiQiW+LJaOkjGyZdCntwyMR=uAw@mail.gmail.com

#12

Melanie Plageman

melanieplageman@gmail.com

almost 2 years ago

In reply to: Melanie Plageman (#11)

Re: Streaming read-ready sequential scan code

On Wed, Mar 27, 2024 at 08:47:03PM -0400, Melanie Plageman wrote:

On Fri, Mar 8, 2024 at 4:56 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Sat, Mar 02, 2024 at 06:07:48PM -0500, Melanie Plageman wrote:

On Wed, Feb 28, 2024 at 12:30 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Mon, Feb 26, 2024 at 03:56:57PM -0500, Melanie Plageman wrote:

On Mon, Feb 19, 2024 at 6:05 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Mon, Jan 29, 2024 at 4:17 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

There is an outstanding question about where to allocate the
PgStreamingRead object for sequential scans

I've written three alternative implementations of the actual streaming
read user for sequential scan which handle the question of where to
allocate the streaming read object and how to handle changing scan
direction in different ways.

Option A) https://github.com/melanieplageman/postgres/tree/seqscan_pgsr_initscan_allocation
- Allocates the streaming read object in initscan(). Since we do not
know the scan direction at this time, if the scan ends up not being a
forwards scan, the streaming read object must later be freed -- so
this will sometimes allocate a streaming read object it never uses.
- Only supports ForwardScanDirection and once the scan direction
changes, streaming is never supported again -- even if we return to
ForwardScanDirection
- Must maintain a "fallback" codepath that does not use the streaming read API

Attached is a version of this patch which implements a "reset"
function for the streaming read API which should be cheaper than the
full pg_streaming_read_free() on rescan. This can easily be ported to
work on any of my proposed implementations (A/B/C). I implemented it
on A as an example.

Attached is the latest version of this patchset -- rebased in light of
Thomas' updatees to the streaming read API [1]. I have chosen the
approach I think we should go with. It is a hybrid of my previously
proposed approaches.

While investigating some performance concerns, Andres pointed out that
the members I add to HeapScanDescData in this patch push rs_cindex and
rs_ntuples to another cacheline and introduce a 4-byte hole. Attached
v4's HeapScanDescData is as well-packed as on master and its members
are reordered so that rs_cindex and rs_ntuples are back on the second
cacheline of the struct's data.

I did some additional profiling and realized that dropping the
unlikely() from the places we check rs_inited frequently was negatively
impacting performance. v5 adds those back and also makes a few other
very minor cleanups.

Note that this patch set has a not yet released version of Thomas
Munro's Streaming Read API with a new ramp-up logic which seems to fix a
performance issue I saw with my test case when all of the sequential
scan's blocks are in shared buffers. Once he sends the official new
version, I will rebase this and point to his explanation in that thread.

Attached v6 has the version of the streaming read API mentioned here
[1]. This resolved the fully-in-shared-buffers regressions
investigated in that thread by Andres, Bilal, and Thomas.

Attached v7 has version 14 of the streaming read API as well as a few
small tweaks to comments and code.

I noticed that 0001 in the set posed a small regression from master for
a sequential scan of a relation already in shared buffers. While
investigating this, I saw that heapfetchbuf() was still not being
inlined (compiled at -O2) and when I promoted heapfetchbuf() from static
inline to static pg_attribute_always_inline, most of the very small
regression I saw went away. I don't know if I squashed the issue
entirely, though.

- Melanie

#13

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 2 years ago

In reply to: Melanie Plageman (#12)

Re: Streaming read-ready sequential scan code

On 01/04/2024 22:58, Melanie Plageman wrote:

Attached v7 has version 14 of the streaming read API as well as a few
small tweaks to comments and code.

I saw benchmarks in this thread to show that there's no regression when
the data is in cache, but I didn't see any benchmarks demonstrating the
benefit of this. So I ran this quick test:

-- create table ~1 GB table with only 1 row per page.
CREATE TABLE giga (i int, filler text) with (fillfactor=10);
insert into giga select g, repeat('x', 900) from generate_series(1,
140000) g;
vacuum freeze giga;

\timing on
select count(*) from giga;

The SELECT takes about 390 ms on 'master', and 230 ms with the patch.

This is pretty much the best case for this patch, real world gains will
be much smaller. Nevertheless, nice speedup!

--
Heikki Linnakangas
Neon (https://neon.tech)

#14

Melanie Plageman

melanieplageman@gmail.com

almost 2 years ago

In reply to: Heikki Linnakangas (#13)

Re: Streaming read-ready sequential scan code

On Tue, Apr 2, 2024 at 1:10 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 01/04/2024 22:58, Melanie Plageman wrote:

Attached v7 has version 14 of the streaming read API as well as a few
small tweaks to comments and code.

I saw benchmarks in this thread to show that there's no regression when
the data is in cache, but I didn't see any benchmarks demonstrating the
benefit of this. So I ran this quick test:

Good point! It would be good to show why we would actually want this
patch. Attached v8 is rebased over current master (which now has the
streaming read API).

On the topic of BAS_BULKREAD buffer access strategy, I think the least
we could do is add an assert like this to read_stream_begin_relation()
after calculating max_pinned_buffers.

Assert(GetAccessStrategyBufferCount(strategy) > max_pinned_buffers);

Perhaps we should do more? I think with a ring size of 16 MB, large
SELECTs are safe for now. But I know future developers will make
changes and it would be good not to assume they will understand that
pinning more buffers than the size of the ring effectively invalidates
the ring.

- Melanie

Streaming read-ready sequential scan code

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: