BitmapHeapScan streaming read user and prelim refactoring

Started by Melanie Plagemanabout 2 years ago248 messages
Jump to latest
#1Melanie Plageman
melanieplageman@gmail.com

Hi,

Attached is a patch set which refactors BitmapHeapScan such that it
can use the streaming read API [1]/messages/by-id/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com. It also resolves the long-standing
FIXME in the BitmapHeapScan code suggesting that the skip fetch
optimization should be pushed into the table AMs. Additionally, it
moves table scan initialization to after the index scan and bitmap
initialization.

patches 0001-0002 are assorted cleanup needed later in the set.
patches 0003 moves the table scan initialization to after bitmap creation
patch 0004 is, I think, a bug fix. see [2]/messages/by-id/CAAKRu_bxrXeZ2rCnY8LyeC2Ls88KpjWrQ+opUrXDRXdcfwFZGA@mail.gmail.com.
patches 0005-0006 push the skip fetch optimization into the table AMs
patches 0007-0009 change the control flow of BitmapHeapNext() to match
that required by the streaming read API
patch 0010 is the streaming read code not yet in master
patch 0011 is the actual bitmapheapscan streaming read user.

patches 0001-0009 apply on top of master but 0010 and 0011 must be
applied on top of a commit before a 21d9c3ee4ef74e2 (until a rebased
version of the streaming read API is on the mailing list).

The caveat is that these patches introduce breaking changes to two
table AM functions for bitmapheapscan: table_scan_bitmap_next_block()
and table_scan_bitmap_next_tuple().

A TBMIterateResult used to be threaded through both of these functions
and used in BitmapHeapNext(). This patch set removes all references to
TBMIterateResults from BitmapHeapNext. Because the streaming read API
requires the callback to specify the next block, BitmapHeapNext() can
no longer pass a TBMIterateResult to table_scan_bitmap_next_block().

More subtly, table_scan_bitmap_next_block() used to return false if
there were no more visible tuples on the page or if the block that was
requested was not valid. With these changes,
table_scan_bitmap_next_block() will only return false when the bitmap
has been exhausted and the scan can end. In order to use the streaming
read API, the user must be able to request the blocks it needs without
requiring synchronous feedback per block. Thus, this table AM function
must change its meaning.

I think the way the patches are split up could be improved. I will
think more about this. There are also probably a few mistakes with
which comments are updated in which patches in the set.

- Melanie

[1]: /messages/by-id/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
[2]: /messages/by-id/CAAKRu_bxrXeZ2rCnY8LyeC2Ls88KpjWrQ+opUrXDRXdcfwFZGA@mail.gmail.com

Attachments:

v1-0003-BitmapHeapScan-begin-scan-after-bitmap-setup.patchtext/x-patch; charset=US-ASCII; name=v1-0003-BitmapHeapScan-begin-scan-after-bitmap-setup.patchDownload+30-18
v1-0002-BitmapHeapScan-set-can_skip_fetch-later.patchtext/x-patch; charset=US-ASCII; name=v1-0002-BitmapHeapScan-set-can_skip_fetch-later.patchDownload+11-11
v1-0001-Remove-table_scan_bitmap_next_tuple-parameter-tbm.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Remove-table_scan_bitmap_next_tuple-parameter-tbm.patchDownload+1-10
v1-0005-Update-BitmapAdjustPrefetchIterator-parameter-typ.patchtext/x-patch; charset=US-ASCII; name=v1-0005-Update-BitmapAdjustPrefetchIterator-parameter-typ.patchDownload+4-5
v1-0004-BitmapPrefetch-use-prefetch-block-recheck-for-ski.patchtext/x-patch; charset=US-ASCII; name=v1-0004-BitmapPrefetch-use-prefetch-block-recheck-for-ski.patchDownload+2-3
v1-0006-Push-BitmapHeapScan-skip-fetch-optimization-into-.patchtext/x-patch; charset=US-ASCII; name=v1-0006-Push-BitmapHeapScan-skip-fetch-optimization-into-.patchDownload+74-91
v1-0008-Reduce-scope-of-BitmapHeapScan-tbmiterator-local-.patchtext/x-patch; charset=US-ASCII; name=v1-0008-Reduce-scope-of-BitmapHeapScan-tbmiterator-local-.patchDownload+8-12
v1-0009-Make-table_scan_bitmap_next_block-async-friendly.patchtext/x-patch; charset=US-ASCII; name=v1-0009-Make-table_scan_bitmap_next_block-async-friendly.patchDownload+121-117
v1-0007-BitmapHeapScan-scan-desc-counts-lossy-and-exact-p.patchtext/x-patch; charset=US-ASCII; name=v1-0007-BitmapHeapScan-scan-desc-counts-lossy-and-exact-p.patchDownload+28-6
v1-0010-Streaming-Read-API.patchtext/x-patch; charset=US-ASCII; name=v1-0010-Streaming-Read-API.patchDownload+986-239
v1-0011-BitmapHeapScan-uses-streaming-read-API.patchtext/x-patch; charset=US-ASCII; name=v1-0011-BitmapHeapScan-uses-streaming-read-API.patchDownload+178-445
#2Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Melanie Plageman (#1)
Re: BitmapHeapScan streaming read user and prelim refactoring

On Feb 13, 2024, at 3:11 PM, Melanie Plageman <melanieplageman@gmail.com> wrote:

Thanks for the patch...

Attached is a patch set which refactors BitmapHeapScan such that it
can use the streaming read API [1]. It also resolves the long-standing
FIXME in the BitmapHeapScan code suggesting that the skip fetch
optimization should be pushed into the table AMs. Additionally, it
moves table scan initialization to after the index scan and bitmap
initialization.

patches 0001-0002 are assorted cleanup needed later in the set.
patches 0003 moves the table scan initialization to after bitmap creation
patch 0004 is, I think, a bug fix. see [2].
patches 0005-0006 push the skip fetch optimization into the table AMs
patches 0007-0009 change the control flow of BitmapHeapNext() to match
that required by the streaming read API
patch 0010 is the streaming read code not yet in master
patch 0011 is the actual bitmapheapscan streaming read user.

patches 0001-0009 apply on top of master but 0010 and 0011 must be
applied on top of a commit before a 21d9c3ee4ef74e2 (until a rebased
version of the streaming read API is on the mailing list).

I followed your lead and applied them to 6a8ffe812d194ba6f4f26791b6388a4837d17d6c. `make check` worked fine, though I expect you know that already.

The caveat is that these patches introduce breaking changes to two
table AM functions for bitmapheapscan: table_scan_bitmap_next_block()
and table_scan_bitmap_next_tuple().

You might want an independent perspective on how much of a hassle those breaking changes are, so I took a stab at that. Having written a custom proprietary TAM for postgresql 15 here at EDB, and having ported it and released it for postgresql 16, I thought I'd try porting it to the the above commit with your patches. Even without your patches, I already see breaking changes coming from commit f691f5b80a85c66d715b4340ffabb503eb19393e, which creates a similar amount of breakage for me as does your patches. Dealing with the combined breakage might amount to a day of work, including testing, half of which I think I've already finished. In other words, it doesn't seem like a big deal.

Were postgresql 17 shaping up to be compatible with TAMs written for 16, your patch would change that qualitatively, but since things are already incompatible, I think you're in the clear.

A TBMIterateResult used to be threaded through both of these functions
and used in BitmapHeapNext(). This patch set removes all references to
TBMIterateResults from BitmapHeapNext. Because the streaming read API
requires the callback to specify the next block, BitmapHeapNext() can
no longer pass a TBMIterateResult to table_scan_bitmap_next_block().

More subtly, table_scan_bitmap_next_block() used to return false if
there were no more visible tuples on the page or if the block that was
requested was not valid. With these changes,
table_scan_bitmap_next_block() will only return false when the bitmap
has been exhausted and the scan can end. In order to use the streaming
read API, the user must be able to request the blocks it needs without
requiring synchronous feedback per block. Thus, this table AM function
must change its meaning.

I think the way the patches are split up could be improved. I will
think more about this. There are also probably a few mistakes with
which comments are updated in which patches in the set.

I look forward to the next version of the patch set. Thanks again for working on this.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Melanie Plageman
melanieplageman@gmail.com
In reply to: Mark Dilger (#2)
Re: BitmapHeapScan streaming read user and prelim refactoring

On Tue, Feb 13, 2024 at 11:34 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

On Feb 13, 2024, at 3:11 PM, Melanie Plageman <melanieplageman@gmail.com> wrote:

Thanks for the patch...

Attached is a patch set which refactors BitmapHeapScan such that it
can use the streaming read API [1]. It also resolves the long-standing
FIXME in the BitmapHeapScan code suggesting that the skip fetch
optimization should be pushed into the table AMs. Additionally, it
moves table scan initialization to after the index scan and bitmap
initialization.

patches 0001-0002 are assorted cleanup needed later in the set.
patches 0003 moves the table scan initialization to after bitmap creation
patch 0004 is, I think, a bug fix. see [2].
patches 0005-0006 push the skip fetch optimization into the table AMs
patches 0007-0009 change the control flow of BitmapHeapNext() to match
that required by the streaming read API
patch 0010 is the streaming read code not yet in master
patch 0011 is the actual bitmapheapscan streaming read user.

patches 0001-0009 apply on top of master but 0010 and 0011 must be
applied on top of a commit before a 21d9c3ee4ef74e2 (until a rebased
version of the streaming read API is on the mailing list).

I followed your lead and applied them to 6a8ffe812d194ba6f4f26791b6388a4837d17d6c. `make check` worked fine, though I expect you know that already.

Thanks for taking a look!

The caveat is that these patches introduce breaking changes to two
table AM functions for bitmapheapscan: table_scan_bitmap_next_block()
and table_scan_bitmap_next_tuple().

You might want an independent perspective on how much of a hassle those breaking changes are, so I took a stab at that. Having written a custom proprietary TAM for postgresql 15 here at EDB, and having ported it and released it for postgresql 16, I thought I'd try porting it to the the above commit with your patches. Even without your patches, I already see breaking changes coming from commit f691f5b80a85c66d715b4340ffabb503eb19393e, which creates a similar amount of breakage for me as does your patches. Dealing with the combined breakage might amount to a day of work, including testing, half of which I think I've already finished. In other words, it doesn't seem like a big deal.

Were postgresql 17 shaping up to be compatible with TAMs written for 16, your patch would change that qualitatively, but since things are already incompatible, I think you're in the clear.

Oh, good to know! I'm very happy to have the perspective of a table AM
author. Just curious, did your table AM implement
table_scan_bitmap_next_block() and table_scan_bitmap_next_tuple(),
and, if so, did you use the TBMIterateResult? Since it is not used in
BitmapHeapNext() in my version, table AMs would have to change how
they use TBMIterateResults anyway. But I assume they could add it to a
table AM specific scan descriptor if they want access to a
TBMIterateResult of their own making in both
table_san_bitmap_next_block() and next_tuple()?

- Melanie

#4Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Melanie Plageman (#3)
Re: BitmapHeapScan streaming read user and prelim refactoring

On Feb 14, 2024, at 6:47 AM, Melanie Plageman <melanieplageman@gmail.com> wrote:

Just curious, did your table AM implement
table_scan_bitmap_next_block() and table_scan_bitmap_next_tuple(),
and, if so, did you use the TBMIterateResult? Since it is not used in
BitmapHeapNext() in my version, table AMs would have to change how
they use TBMIterateResults anyway. But I assume they could add it to a
table AM specific scan descriptor if they want access to a
TBMIterateResult of their own making in both
table_san_bitmap_next_block() and next_tuple()?

My table AM does implement those two functions and does use the TBMIterateResult *tbmres argument, yes. I would deal with the issue in very much the same way that your patches modify heapam. I don't really have any additional comments about that.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#1)
Re: BitmapHeapScan streaming read user and prelim refactoring

Hi,

On 2024-02-13 18:11:25 -0500, Melanie Plageman wrote:

Attached is a patch set which refactors BitmapHeapScan such that it
can use the streaming read API [1]. It also resolves the long-standing
FIXME in the BitmapHeapScan code suggesting that the skip fetch
optimization should be pushed into the table AMs. Additionally, it
moves table scan initialization to after the index scan and bitmap
initialization.

Thanks for working on this! While I have some quibbles with details, I think
this is quite a bit of progress in the right direction.

patches 0001-0002 are assorted cleanup needed later in the set.
patches 0003 moves the table scan initialization to after bitmap creation
patch 0004 is, I think, a bug fix. see [2].

I'd not quite call it a bugfix, it's not like it leads to wrong
behaviour. Seems more like an optimization. But whatever :)

The caveat is that these patches introduce breaking changes to two
table AM functions for bitmapheapscan: table_scan_bitmap_next_block()
and table_scan_bitmap_next_tuple().

That's to be expected, I don't think it's worth worrying about. Right now a
bunch of TAMs can't implement bitmap scans, this goes a fair bit towards
allowing that...

From d6dd6eb21dcfbc41208f87d1d81ffe3960130889 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 12 Feb 2024 18:50:29 -0500
Subject: [PATCH v1 03/11] BitmapHeapScan begin scan after bitmap setup

There is no reason for table_beginscan_bm() to begin the actual scan of
the underlying table in ExecInitBitmapHeapScan(). We can begin the
underlying table scan after the index scan has been completed and the
bitmap built.

The one use of the scan descriptor during initialization was
ExecBitmapHeapInitializeWorker(), which set the scan descriptor snapshot
with one from an array in the parallel state. This overwrote the
snapshot set in table_beginscan_bm().

By saving that worker snapshot as a member in the BitmapHeapScanState
during initialization, it can be restored in table_beginscan_bm() after
returning from the table AM specific begin scan function.

I don't understand what the point of passing two different snapshots to
table_beginscan_bm() is. What does that even mean? Why can't we just use the
correct snapshot initially?

From a3f62e4299663d418531ae61bb16ea39f0836fac Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 12 Feb 2024 19:03:24 -0500
Subject: [PATCH v1 04/11] BitmapPrefetch use prefetch block recheck for skip
fetch

Previously BitmapPrefetch() used the recheck flag for the current block
to determine whether or not it could skip prefetching the proposed
prefetch block. It makes more sense for it to use the recheck flag from
the TBMIterateResult for the prefetch block instead.

I'd mention the commit that introduced the current logic and link to the
the thread that you started about this.

From d56be7741765d93002649ef912ef4b8256a5b9af Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 12 Feb 2024 19:04:48 -0500
Subject: [PATCH v1 05/11] Update BitmapAdjustPrefetchIterator parameter type
to BlockNumber

BitmapAdjustPrefetchIterator() only used the blockno member of the
passed in TBMIterateResult to ensure that the prefetch iterator and
regular iterator stay in sync. Pass it the BlockNumber only. This will
allow us to move away from using the TBMIterateResult outside of table
AM specific code.

Hm - I'm not convinced this is a good direction - doesn't that arguably
*increase* TAM awareness? Perhaps it doesn't make much sense to use bitmap
heap scans in a TAM without blocks, but still.

From 202b16d3a381210e8dbee69e68a8310be8ee11d2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 12 Feb 2024 20:15:05 -0500
Subject: [PATCH v1 06/11] Push BitmapHeapScan skip fetch optimization into
table AM

This resolves the long-standing FIXME in BitmapHeapNext() which said that
the optmization to skip fetching blocks of the underlying table when
none of the column data was needed should be pushed into the table AM
specific code.

Long-standing? Sure, it's old enough to walk, but we have FIXMEs that are old
enough to drink, at least in some countries. :)

The table AM agnostic functions for prefetching still need to know if
skipping fetching is permitted for this scan. However, this dependency
will be removed when that prefetching code is removed in favor of the
upcoming streaming read API.

---
src/backend/access/heap/heapam.c | 10 +++
src/backend/access/heap/heapam_handler.c | 29 +++++++
src/backend/executor/nodeBitmapHeapscan.c | 100 ++++++----------------
src/include/access/heapam.h | 2 +
src/include/access/tableam.h | 17 ++--
src/include/nodes/execnodes.h | 6 --
6 files changed, 74 insertions(+), 90 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 707460a5364..7aae1ecf0a9 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -955,6 +955,8 @@ heap_beginscan(Relation relation, Snapshot snapshot,
scan->rs_base.rs_flags = flags;
scan->rs_base.rs_parallel = parallel_scan;
scan->rs_strategy = NULL;	/* set in initscan */
+	scan->vmbuffer = InvalidBuffer;
+	scan->empty_tuples = 0;

These don't follow the existing naming pattern for HeapScanDescData. While I
explicitly dislike the practice of adding prefixes to struct members, I don't
think mixing conventions within a single struct improves things.

I also think it'd be good to note in comments that the vm buffer currently is
only used for bitmap heap scans, otherwise one might think they'd also be used
for normal scans, where we don't need them, because of the page level flag.

Also, perhaps worth renaming "empty_tuples" to something indicating that it's
the number of empty tuples to be returned later? num_empty_tuples_pending or
such? Or the current "return_empty_tuples".

@@ -1043,6 +1045,10 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);

+	if (BufferIsValid(scan->vmbuffer))
+		ReleaseBuffer(scan->vmbuffer);
+	scan->vmbuffer = InvalidBuffer;

It does not matter one iota here, but personally I prefer moving the write
inside the if, as dirtying the cacheline after we just figured out whe

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 9372b49bfaa..c0fb06c9688 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -108,6 +108,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
*/
if (!node->initialized)
{
+		bool can_skip_fetch;
/*
* We can potentially skip fetching heap pages if we do not need any
* columns of the table, either for checking non-indexable quals or

Pretty sure pgindent will move this around.

+++ b/src/include/access/tableam.h
@@ -62,6 +62,7 @@ typedef enum ScanOptions

/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ SO_CAN_SKIP_FETCH = 1 << 10,
} ScanOptions;

Would be nice to add a comment explaining what this flag means.

From 500c84019b982a1e6c8b8dd40240c8510d83c287 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 13 Feb 2024 10:05:04 -0500
Subject: [PATCH v1 07/11] BitmapHeapScan scan desc counts lossy and exact
pages

Future commits will remove the TBMIterateResult from BitmapHeapNext(),
pushing it into the table AM-specific code. So we will have to keep
track of the number of lossy and exact pages in the scan descriptor.
Doing this change to lossy/exact page counting in a separate commit just
simplifies the diff.

---
src/backend/access/heap/heapam.c | 2 ++
src/backend/access/heap/heapam_handler.c | 9 +++++++++
src/backend/executor/nodeBitmapHeapscan.c | 18 +++++++++++++-----
src/include/access/relscan.h | 4 ++++
4 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7aae1ecf0a9..88b4aad5820 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -957,6 +957,8 @@ heap_beginscan(Relation relation, Snapshot snapshot,
scan->rs_strategy = NULL;	/* set in initscan */
scan->vmbuffer = InvalidBuffer;
scan->empty_tuples = 0;
+	scan->rs_base.lossy_pages = 0;
+	scan->rs_base.exact_pages = 0;
/*
* Disable page-at-a-time mode if it's not a MVCC-safe snapshot.
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index baba09c87c0..6e85ef7a946 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2242,6 +2242,15 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
Assert(ntup <= MaxHeapTuplesPerPage);
hscan->rs_ntuples = ntup;
+	/* Only count exact and lossy pages with visible tuples */
+	if (ntup > 0)
+	{
+		if (tbmres->ntuples >= 0)
+			scan->exact_pages++;
+		else
+			scan->lossy_pages++;
+	}
+
return ntup > 0;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index c0fb06c9688..19d115de06f 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -53,6 +53,8 @@
#include "utils/spccache.h"
static TupleTableSlot *BitmapHeapNext(BitmapHeapScanState *node);
+static inline void BitmapAccumCounters(BitmapHeapScanState *node,
+									   TableScanDesc scan);
static inline void BitmapDoneInitializingSharedState(ParallelBitmapHeapState *pstate);
static inline void BitmapAdjustPrefetchIterator(BitmapHeapScanState *node,
BlockNumber blockno);
@@ -234,11 +236,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
continue;
}

- if (tbmres->ntuples >= 0)
- node->exact_pages++;
- else
- node->lossy_pages++;
-
/* Adjust the prefetch target */
BitmapAdjustPrefetchTarget(node);
}
@@ -315,9 +312,20 @@ BitmapHeapNext(BitmapHeapScanState *node)
/*
* if we get here it means we are at the end of the scan..
*/
+ BitmapAccumCounters(node, scan);
return ExecClearTuple(slot);
}

+static inline void
+BitmapAccumCounters(BitmapHeapScanState *node,
+					TableScanDesc scan)
+{
+	node->exact_pages += scan->exact_pages;
+	scan->exact_pages = 0;
+	node->lossy_pages += scan->lossy_pages;
+	scan->lossy_pages = 0;
+}
+

I don't think this is quite right - you're calling BitmapAccumCounters() only
when the scan doesn't return anything anymore, but there's no guarantee
that'll ever be reached. E.g. a bitmap heap scan below a limit node. I think
this needs to be in a) ExecEndBitmapHeapScan() b) ExecReScanBitmapHeapScan()

/*
*	BitmapDoneInitializingSharedState - Shared state is initialized
*
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 521043304ab..b74e08dd745 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -40,6 +40,10 @@ typedef struct TableScanDescData
ItemPointerData rs_mintid;
ItemPointerData rs_maxtid;
+	/* Only used for Bitmap table scans */
+	long		exact_pages;
+	long		lossy_pages;
+
/*
* Information about type and behaviour of the scan, a bitmask of members
* of the ScanOptions enum (see tableam.h).

I wonder if this really is the best place for the data to be accumulated. This
requires the accounting to be implemented in each AM, which doesn't obviously
seem required. Why can't the accounting continue to live in
nodeBitmapHeapscan.c, to be done after each table_scan_bitmap_next_block()
call?

From 555743e4bc885609d20768f7f2990c6ba69b13a9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 13 Feb 2024 10:57:07 -0500
Subject: [PATCH v1 09/11] Make table_scan_bitmap_next_block() async friendly

table_scan_bitmap_next_block() previously returned false if we did not
wish to call table_scan_bitmap_next_tuple() on the tuples on the page.
This could happen when there were no visible tuples on the page or, due
to concurrent activity on the table, the block returned by the iterator
is past the known end of the table.

This sounds a bit like the block is actually past the end of the table,
but in reality this happens if the block is past the end of the table as it
was when the scan was started. Somehow that feels significant, but I don't
really know why I think that.

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 88b4aad5820..d8569373987 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -959,6 +959,8 @@ heap_beginscan(Relation relation, Snapshot snapshot,
scan->empty_tuples = 0;
scan->rs_base.lossy_pages = 0;
scan->rs_base.exact_pages = 0;
+	scan->rs_base.shared_tbmiterator = NULL;
+	scan->rs_base.tbmiterator = NULL;

/*
* Disable page-at-a-time mode if it's not a MVCC-safe snapshot.
@@ -1051,6 +1053,18 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
ReleaseBuffer(scan->vmbuffer);
scan->vmbuffer = InvalidBuffer;

+	if (scan->rs_base.rs_flags & SO_TYPE_BITMAPSCAN)
+	{
+		if (scan->rs_base.shared_tbmiterator)
+			tbm_end_shared_iterate(scan->rs_base.shared_tbmiterator);
+
+		if (scan->rs_base.tbmiterator)
+			tbm_end_iterate(scan->rs_base.tbmiterator);
+	}
+
+	scan->rs_base.shared_tbmiterator = NULL;
+	scan->rs_base.tbmiterator = NULL;
+
/*
* reinitialize scan descriptor
*/

If every AM would need to implement this, perhaps this shouldn't be done here,
but in generic code?

--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2114,17 +2114,49 @@ heapam_estimate_rel_size(Relation rel, int32 *attr_widths,
static bool
heapam_scan_bitmap_next_block(TableScanDesc scan,
-							  TBMIterateResult *tbmres)
+							  bool *recheck, BlockNumber *blockno)
{
HeapScanDesc hscan = (HeapScanDesc) scan;
-	BlockNumber block = tbmres->blockno;
+	BlockNumber block;
Buffer		buffer;
Snapshot	snapshot;
int			ntup;
+	TBMIterateResult *tbmres;

hscan->rs_cindex = 0;
hscan->rs_ntuples = 0;

+	*blockno = InvalidBlockNumber;
+	*recheck = true;
+
+	do
+	{
+		if (scan->shared_tbmiterator)
+			tbmres = tbm_shared_iterate(scan->shared_tbmiterator);
+		else
+			tbmres = tbm_iterate(scan->tbmiterator);
+
+		if (tbmres == NULL)
+		{
+			/* no more entries in the bitmap */
+			Assert(hscan->empty_tuples == 0);
+			return false;
+		}
+
+		/*
+		 * Ignore any claimed entries past what we think is the end of the
+		 * relation. It may have been extended after the start of our scan (we
+		 * only hold an AccessShareLock, and it could be inserts from this
+		 * backend).  We don't take this optimization in SERIALIZABLE
+		 * isolation though, as we need to examine all invisible tuples
+		 * reachable by the index.
+		 */
+	} while (!IsolationIsSerializable() && tbmres->blockno >= hscan->rs_nblocks);

Hm. Isn't it a problem that we have no CHECK_FOR_INTERRUPTS() in this loop?

@@ -2251,7 +2274,14 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
scan->lossy_pages++;
}

-	return ntup > 0;
+	/*
+	 * Return true to indicate that a valid block was found and the bitmap is
+	 * not exhausted. If there are no visible tuples on this page,
+	 * hscan->rs_ntuples will be 0 and heapam_scan_bitmap_next_tuple() will
+	 * return false returning control to this function to advance to the next
+	 * block in the bitmap.
+	 */
+	return true;
}

Why can't we fetch the next block immediately?

@@ -201,46 +197,23 @@ BitmapHeapNext(BitmapHeapScanState *node)
can_skip_fetch);
}

-		node->tbmiterator = tbmiterator;
-		node->shared_tbmiterator = shared_tbmiterator;
+		scan->tbmiterator = tbmiterator;
+		scan->shared_tbmiterator = shared_tbmiterator;

It seems a bit odd that this code modifies the scan descriptor, instead of
passing the iterator, or perhaps better the bitmap itself, to
table_beginscan_bm()?

diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b74e08dd745..bf7ee044268 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -16,6 +16,7 @@

#include "access/htup_details.h"
#include "access/itup.h"
+#include "nodes/tidbitmap.h"

I'd like to avoid exposing this to everything including relscan.h. I think we
could just forward declare the structs and use them here to avoid that?

From aac60985d6bc70bfedf77a77ee3c512da87bfcb1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 13 Feb 2024 14:27:57 -0500
Subject: [PATCH v1 11/11] BitmapHeapScan uses streaming read API

Remove all of the code to do prefetching from BitmapHeapScan code and
rely on the streaming read API prefetching. Heap table AM implements a
streaming read callback which uses the iterator to get the next valid
block that needs to be fetched for the streaming read API.
---
src/backend/access/gin/ginget.c | 15 +-
src/backend/access/gin/ginscan.c | 7 +
src/backend/access/heap/heapam.c | 71 +++++
src/backend/access/heap/heapam_handler.c | 78 +++--
src/backend/executor/nodeBitmapHeapscan.c | 328 +---------------------
src/backend/nodes/tidbitmap.c | 80 +++---
src/include/access/heapam.h | 2 +
src/include/access/tableam.h | 14 +-
src/include/nodes/execnodes.h | 19 --
src/include/nodes/tidbitmap.h | 8 +-
10 files changed, 178 insertions(+), 444 deletions(-)

diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 0b4f2ebadb6..3ce28078a6f 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -373,7 +373,10 @@ restartScanEntry:
if (entry->matchBitmap)
{
if (entry->matchIterator)
+				{
tbm_end_iterate(entry->matchIterator);
+					pfree(entry->matchResult);
+				}
entry->matchIterator = NULL;
tbm_free(entry->matchBitmap);
entry->matchBitmap = NULL;
@@ -386,6 +389,7 @@ restartScanEntry:
if (entry->matchBitmap && !tbm_is_empty(entry->matchBitmap))
{
entry->matchIterator = tbm_begin_iterate(entry->matchBitmap);
+			entry->matchResult = palloc0(TBM_ITERATE_RESULT_SIZE);

Do we actually have to use palloc0? TBM_ITERATE_RESULT_SIZE ain't small, so
zeroing all of it isn't free.

+static BlockNumber bitmapheap_pgsr_next_single(PgStreamingRead *pgsr, void *pgsr_private,
+							void *per_buffer_data);

Is it correct to have _single in the name here? Aren't we also using for
parallel scans?

+static BlockNumber
+bitmapheap_pgsr_next_single(PgStreamingRead *pgsr, void *pgsr_private,
+							void *per_buffer_data)
+{
+	TBMIterateResult *tbmres = per_buffer_data;
+	HeapScanDesc hdesc = (HeapScanDesc) pgsr_private;
+
+	for (;;)
+	{
+		if (hdesc->rs_base.shared_tbmiterator)
+			tbm_shared_iterate(hdesc->rs_base.shared_tbmiterator, tbmres);
+		else
+			tbm_iterate(hdesc->rs_base.tbmiterator, tbmres);
+
+		/* no more entries in the bitmap */
+		if (!BlockNumberIsValid(tbmres->blockno))
+			return InvalidBlockNumber;
+
+		/*
+		 * Ignore any claimed entries past what we think is the end of the
+		 * relation. It may have been extended after the start of our scan (we
+		 * only hold an AccessShareLock, and it could be inserts from this
+		 * backend).  We don't take this optimization in SERIALIZABLE
+		 * isolation though, as we need to examine all invisible tuples
+		 * reachable by the index.
+		 */
+		if (!IsolationIsSerializable() && tbmres->blockno >= hdesc->rs_nblocks)
+			continue;
+
+
+		if (hdesc->rs_base.rs_flags & SO_CAN_SKIP_FETCH &&
+			!tbmres->recheck &&
+			VM_ALL_VISIBLE(hdesc->rs_base.rs_rd, tbmres->blockno, &hdesc->vmbuffer))
+		{
+			hdesc->empty_tuples += tbmres->ntuples;
+			continue;
+		}
+
+		return tbmres->blockno;
+	}
+
+	/* not reachable */
+	Assert(false);
+}

Need to check for interrupts somewhere here.

@@ -124,15 +119,6 @@ BitmapHeapNext(BitmapHeapScanState *node)

There's still a comment in BitmapHeapNext talking about prefetching with two
iterators etc. That seems outdated now.

/*
* tbm_iterate - scan through next page of a TIDBitmap
*
- * Returns a TBMIterateResult representing one page, or NULL if there are
- * no more pages to scan.  Pages are guaranteed to be delivered in numerical
- * order.  If result->ntuples < 0, then the bitmap is "lossy" and failed to
- * remember the exact tuples to look at on this page --- the caller must
- * examine all tuples on the page and check if they meet the intended
- * condition.  If result->recheck is true, only the indicated tuples need
- * be examined, but the condition must be rechecked anyway.  (For ease of
- * testing, recheck is always set true when ntuples < 0.)
+ * Caller must pass in a TBMIterateResult to be filled.
+ *
+ * Pages are guaranteed to be delivered in numerical order.  tbmres->blockno is
+ * set to InvalidBlockNumber when there are no more pages to scan. If
+ * tbmres->ntuples < 0, then the bitmap is "lossy" and failed to remember the
+ * exact tuples to look at on this page --- the caller must examine all tuples
+ * on the page and check if they meet the intended condition.  If
+ * tbmres->recheck is true, only the indicated tuples need be examined, but the
+ * condition must be rechecked anyway.  (For ease of testing, recheck is always
+ * set true when ntuples < 0.)
*/
-TBMIterateResult *
-tbm_iterate(TBMIterator *iterator)
+void
+tbm_iterate(TBMIterator *iterator, TBMIterateResult *tbmres)

Hm - it seems a tad odd that we later have to find out if the scan is done
iterating by checking if blockno is valid, when tbm_iterate already knew. But
I guess the code would be a bit uglier if we needed the result of
tbm_[shared_]iterate(), due to the two functions.

Right now ExecEndBitmapHeapScan() frees the tbm before it does table_endscan()
- which seems problematic, as heap_endscan() will do stuff like
tbm_end_iterate(), which imo shouldn't be called after the tbm has been freed,
even if that works today.

It seems a bit confusing that your changes seem to treat
BitmapHeapScanState->initialized as separate from ->scan, even though afaict
scan should be NULL iff initialized is false and vice versa.

Independent of your patches, but brr, it's ugly that
BitmapShouldInitializeSharedState() blocks.

Greetings,

Andres Freund

#6Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#5)
Re: BitmapHeapScan streaming read user and prelim refactoring

Thank you so much for this thorough review!!!!

On Wed, Feb 14, 2024 at 2:42 PM Andres Freund <andres@anarazel.de> wrote:

On 2024-02-13 18:11:25 -0500, Melanie Plageman wrote:

From d6dd6eb21dcfbc41208f87d1d81ffe3960130889 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 12 Feb 2024 18:50:29 -0500
Subject: [PATCH v1 03/11] BitmapHeapScan begin scan after bitmap setup

There is no reason for table_beginscan_bm() to begin the actual scan of
the underlying table in ExecInitBitmapHeapScan(). We can begin the
underlying table scan after the index scan has been completed and the
bitmap built.

The one use of the scan descriptor during initialization was
ExecBitmapHeapInitializeWorker(), which set the scan descriptor snapshot
with one from an array in the parallel state. This overwrote the
snapshot set in table_beginscan_bm().

By saving that worker snapshot as a member in the BitmapHeapScanState
during initialization, it can be restored in table_beginscan_bm() after
returning from the table AM specific begin scan function.

I don't understand what the point of passing two different snapshots to
table_beginscan_bm() is. What does that even mean? Why can't we just use the
correct snapshot initially?

Indeed. Honestly, it was an unlabeled TODO for me. I wasn't quite sure
how to get the same behavior as in master. Fixed in attached v2.

Now the parallel worker still restores and registers that snapshot in
ExecBitmapHeapInitializeWorker() and then saves it in the
BitmapHeapScanState. We then pass SO_TEMP_SNAPSHOT as an extra flag
(to set rs_flags) to table_beginscan_bm() if there is a parallel
worker snapshot saved in the BitmapHeapScanState.

From a3f62e4299663d418531ae61bb16ea39f0836fac Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 12 Feb 2024 19:03:24 -0500
Subject: [PATCH v1 04/11] BitmapPrefetch use prefetch block recheck for skip
fetch

Previously BitmapPrefetch() used the recheck flag for the current block
to determine whether or not it could skip prefetching the proposed
prefetch block. It makes more sense for it to use the recheck flag from
the TBMIterateResult for the prefetch block instead.

I'd mention the commit that introduced the current logic and link to the
the thread that you started about this.

Done

From d56be7741765d93002649ef912ef4b8256a5b9af Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 12 Feb 2024 19:04:48 -0500
Subject: [PATCH v1 05/11] Update BitmapAdjustPrefetchIterator parameter type
to BlockNumber

BitmapAdjustPrefetchIterator() only used the blockno member of the
passed in TBMIterateResult to ensure that the prefetch iterator and
regular iterator stay in sync. Pass it the BlockNumber only. This will
allow us to move away from using the TBMIterateResult outside of table
AM specific code.

Hm - I'm not convinced this is a good direction - doesn't that arguably
*increase* TAM awareness? Perhaps it doesn't make much sense to use bitmap
heap scans in a TAM without blocks, but still.

This is removed in later commits and is an intermediate state to try
and move the TBMIterateResult out of BitmapHeapNext(). I can find
another way to achieve this if it is important.

From 202b16d3a381210e8dbee69e68a8310be8ee11d2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 12 Feb 2024 20:15:05 -0500
Subject: [PATCH v1 06/11] Push BitmapHeapScan skip fetch optimization into
table AM

This resolves the long-standing FIXME in BitmapHeapNext() which said that
the optmization to skip fetching blocks of the underlying table when
none of the column data was needed should be pushed into the table AM
specific code.

Long-standing? Sure, it's old enough to walk, but we have FIXMEs that are old
enough to drink, at least in some countries. :)

;) I've updated the commit message. Though it is longstanding in that
it predates Melanie + Postgres.

The table AM agnostic functions for prefetching still need to know if
skipping fetching is permitted for this scan. However, this dependency
will be removed when that prefetching code is removed in favor of the
upcoming streaming read API.

---
src/backend/access/heap/heapam.c | 10 +++
src/backend/access/heap/heapam_handler.c | 29 +++++++
src/backend/executor/nodeBitmapHeapscan.c | 100 ++++++----------------
src/include/access/heapam.h | 2 +
src/include/access/tableam.h | 17 ++--
src/include/nodes/execnodes.h | 6 --
6 files changed, 74 insertions(+), 90 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 707460a5364..7aae1ecf0a9 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -955,6 +955,8 @@ heap_beginscan(Relation relation, Snapshot snapshot,
scan->rs_base.rs_flags = flags;
scan->rs_base.rs_parallel = parallel_scan;
scan->rs_strategy = NULL;       /* set in initscan */
+     scan->vmbuffer = InvalidBuffer;
+     scan->empty_tuples = 0;

These don't follow the existing naming pattern for HeapScanDescData. While I
explicitly dislike the practice of adding prefixes to struct members, I don't
think mixing conventions within a single struct improves things.

I've updated the names. What does rs even stand for?

I also think it'd be good to note in comments that the vm buffer currently is
only used for bitmap heap scans, otherwise one might think they'd also be used
for normal scans, where we don't need them, because of the page level flag.

Done.

Also, perhaps worth renaming "empty_tuples" to something indicating that it's
the number of empty tuples to be returned later? num_empty_tuples_pending or
such? Or the current "return_empty_tuples".

Done.

@@ -1043,6 +1045,10 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);

+     if (BufferIsValid(scan->vmbuffer))
+             ReleaseBuffer(scan->vmbuffer);
+     scan->vmbuffer = InvalidBuffer;

It does not matter one iota here, but personally I prefer moving the write
inside the if, as dirtying the cacheline after we just figured out whe

I've now followed this convention throughout my patchset in the places
where I noticed it.

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 9372b49bfaa..c0fb06c9688 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -108,6 +108,7 @@ BitmapHeapNext(BitmapHeapScanState *node)
*/
if (!node->initialized)
{
+             bool can_skip_fetch;
/*
* We can potentially skip fetching heap pages if we do not need any
* columns of the table, either for checking non-indexable quals or

Pretty sure pgindent will move this around.

This is gone now, but I have pgindented all the commits so it
shouldn't be a problem again.

+++ b/src/include/access/tableam.h
@@ -62,6 +62,7 @@ typedef enum ScanOptions

/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ SO_CAN_SKIP_FETCH = 1 << 10,
} ScanOptions;

Would be nice to add a comment explaining what this flag means.

Done.

From 500c84019b982a1e6c8b8dd40240c8510d83c287 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 13 Feb 2024 10:05:04 -0500
Subject: [PATCH v1 07/11] BitmapHeapScan scan desc counts lossy and exact
pages

Future commits will remove the TBMIterateResult from BitmapHeapNext(),
pushing it into the table AM-specific code. So we will have to keep
track of the number of lossy and exact pages in the scan descriptor.
Doing this change to lossy/exact page counting in a separate commit just
simplifies the diff.

---
src/backend/access/heap/heapam.c | 2 ++
src/backend/access/heap/heapam_handler.c | 9 +++++++++
src/backend/executor/nodeBitmapHeapscan.c | 18 +++++++++++++-----
src/include/access/relscan.h | 4 ++++
4 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7aae1ecf0a9..88b4aad5820 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -957,6 +957,8 @@ heap_beginscan(Relation relation, Snapshot snapshot,
scan->rs_strategy = NULL;       /* set in initscan */
scan->vmbuffer = InvalidBuffer;
scan->empty_tuples = 0;
+     scan->rs_base.lossy_pages = 0;
+     scan->rs_base.exact_pages = 0;
/*
* Disable page-at-a-time mode if it's not a MVCC-safe snapshot.
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index baba09c87c0..6e85ef7a946 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2242,6 +2242,15 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
Assert(ntup <= MaxHeapTuplesPerPage);
hscan->rs_ntuples = ntup;
+     /* Only count exact and lossy pages with visible tuples */
+     if (ntup > 0)
+     {
+             if (tbmres->ntuples >= 0)
+                     scan->exact_pages++;
+             else
+                     scan->lossy_pages++;
+     }
+
return ntup > 0;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index c0fb06c9688..19d115de06f 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -53,6 +53,8 @@
#include "utils/spccache.h"
static TupleTableSlot *BitmapHeapNext(BitmapHeapScanState *node);
+static inline void BitmapAccumCounters(BitmapHeapScanState *node,
+                                                                        TableScanDesc scan);
static inline void BitmapDoneInitializingSharedState(ParallelBitmapHeapState *pstate);
static inline void BitmapAdjustPrefetchIterator(BitmapHeapScanState *node,
BlockNumber blockno);
@@ -234,11 +236,6 @@ BitmapHeapNext(BitmapHeapScanState *node)
continue;
}
-                     if (tbmres->ntuples >= 0)
-                             node->exact_pages++;
-                     else
-                             node->lossy_pages++;
-
/* Adjust the prefetch target */
BitmapAdjustPrefetchTarget(node);
}
@@ -315,9 +312,20 @@ BitmapHeapNext(BitmapHeapScanState *node)
/*
* if we get here it means we are at the end of the scan..
*/
+     BitmapAccumCounters(node, scan);
return ExecClearTuple(slot);
}
+static inline void
+BitmapAccumCounters(BitmapHeapScanState *node,
+                                     TableScanDesc scan)
+{
+     node->exact_pages += scan->exact_pages;
+     scan->exact_pages = 0;
+     node->lossy_pages += scan->lossy_pages;
+     scan->lossy_pages = 0;
+}
+

I don't think this is quite right - you're calling BitmapAccumCounters() only
when the scan doesn't return anything anymore, but there's no guarantee
that'll ever be reached. E.g. a bitmap heap scan below a limit node. I think
this needs to be in a) ExecEndBitmapHeapScan() b) ExecReScanBitmapHeapScan()

The scan descriptor isn't available in ExecEnd/ReScanBitmapHeapScan().
So, if we count in the scan descriptor we can't accumulate into the
BitmapHeapScanState there. The reason to count in the scan descriptor
is that it is in the table AM where we know if we have a lossy or
exact page -- and we only have the scan descriptor not the
BitmapHeapScanState in the table AM.

I added a call to BitmapAccumCounters before the tuple is returned for
correctness in this version (not ideal, I realize). See below for
thoughts about what we could do instead.

/*
*   BitmapDoneInitializingSharedState - Shared state is initialized
*
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 521043304ab..b74e08dd745 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -40,6 +40,10 @@ typedef struct TableScanDescData
ItemPointerData rs_mintid;
ItemPointerData rs_maxtid;
+     /* Only used for Bitmap table scans */
+     long            exact_pages;
+     long            lossy_pages;
+
/*
* Information about type and behaviour of the scan, a bitmask of members
* of the ScanOptions enum (see tableam.h).

I wonder if this really is the best place for the data to be accumulated. This
requires the accounting to be implemented in each AM, which doesn't obviously
seem required. Why can't the accounting continue to live in
nodeBitmapHeapscan.c, to be done after each table_scan_bitmap_next_block()
call?

Yes, I would really prefer not to do it in the table AM. But, we only
count exact and lossy pages for which at least one or more tuples were
visible (change this and you'll see tests fail). So, we need to decide
if we are going to increment the counters somewhere where we have
access to that information. In the case of heap, that is really only
once I have the value of ntup in heapam_scan_bitmap_next_block(). To
get that information back out to BitmapHeapNext(), I considered adding
another parameter to heapam_scan_bitmap_next_block() -- maybe an enum
like this:

/*
* BitmapHeapScans's bitmaps can choose to store per page information in a
* lossy or exact way. Exact pages in the bitmap have the individual tuple
* offsets that need to be visited while lossy pages in the bitmap have only the
* block number of the page.
*/
typedef enum BitmapBlockResolution
{
BITMAP_BLOCK_NO_VISIBLE,
BITMAP_BLOCK_LOSSY,
BITMAP_BLOCK_EXACT,
} BitmapBlockResolution;

which we then use to increment the counter. But while I was writing
this code, I found myself narrating in the comment that the reason
this had to be set inside of the table AM is that only the table AM
knows if it wants to count the block as lossy, exact, or not count it.
So, that made me question if it really should be in the
BitmapHeapScanState.

I also explored passing the table scan descriptor to
show_tidbitmap_info() -- but that had its own problems.

From 555743e4bc885609d20768f7f2990c6ba69b13a9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 13 Feb 2024 10:57:07 -0500
Subject: [PATCH v1 09/11] Make table_scan_bitmap_next_block() async friendly

table_scan_bitmap_next_block() previously returned false if we did not
wish to call table_scan_bitmap_next_tuple() on the tuples on the page.
This could happen when there were no visible tuples on the page or, due
to concurrent activity on the table, the block returned by the iterator
is past the known end of the table.

This sounds a bit like the block is actually past the end of the table,
but in reality this happens if the block is past the end of the table as it
was when the scan was started. Somehow that feels significant, but I don't
really know why I think that.

I have tried to update the commit message to make it clearer. I was
actually wondering: now that we do table_beginscan_bm() in
BitmapHeapNext() instead of ExecInitBitmapHeapScan(), have we reduced
or eliminated the opportunity for this to be true? initscan() sets
rs_nblocks and that now happens much later.

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 88b4aad5820..d8569373987 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -959,6 +959,8 @@ heap_beginscan(Relation relation, Snapshot snapshot,
scan->empty_tuples = 0;
scan->rs_base.lossy_pages = 0;
scan->rs_base.exact_pages = 0;
+     scan->rs_base.shared_tbmiterator = NULL;
+     scan->rs_base.tbmiterator = NULL;

/*
* Disable page-at-a-time mode if it's not a MVCC-safe snapshot.
@@ -1051,6 +1053,18 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
ReleaseBuffer(scan->vmbuffer);
scan->vmbuffer = InvalidBuffer;

+     if (scan->rs_base.rs_flags & SO_TYPE_BITMAPSCAN)
+     {
+             if (scan->rs_base.shared_tbmiterator)
+                     tbm_end_shared_iterate(scan->rs_base.shared_tbmiterator);
+
+             if (scan->rs_base.tbmiterator)
+                     tbm_end_iterate(scan->rs_base.tbmiterator);
+     }
+
+     scan->rs_base.shared_tbmiterator = NULL;
+     scan->rs_base.tbmiterator = NULL;
+
/*
* reinitialize scan descriptor
*/

If every AM would need to implement this, perhaps this shouldn't be done here,
but in generic code?

Fixed.

--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2114,17 +2114,49 @@ heapam_estimate_rel_size(Relation rel, int32 *attr_widths,
static bool
heapam_scan_bitmap_next_block(TableScanDesc scan,
-                                                       TBMIterateResult *tbmres)
+                                                       bool *recheck, BlockNumber *blockno)
{
HeapScanDesc hscan = (HeapScanDesc) scan;
-     BlockNumber block = tbmres->blockno;
+     BlockNumber block;
Buffer          buffer;
Snapshot        snapshot;
int                     ntup;
+     TBMIterateResult *tbmres;

hscan->rs_cindex = 0;
hscan->rs_ntuples = 0;

+     *blockno = InvalidBlockNumber;
+     *recheck = true;
+
+     do
+     {
+             if (scan->shared_tbmiterator)
+                     tbmres = tbm_shared_iterate(scan->shared_tbmiterator);
+             else
+                     tbmres = tbm_iterate(scan->tbmiterator);
+
+             if (tbmres == NULL)
+             {
+                     /* no more entries in the bitmap */
+                     Assert(hscan->empty_tuples == 0);
+                     return false;
+             }
+
+             /*
+              * Ignore any claimed entries past what we think is the end of the
+              * relation. It may have been extended after the start of our scan (we
+              * only hold an AccessShareLock, and it could be inserts from this
+              * backend).  We don't take this optimization in SERIALIZABLE
+              * isolation though, as we need to examine all invisible tuples
+              * reachable by the index.
+              */
+     } while (!IsolationIsSerializable() && tbmres->blockno >= hscan->rs_nblocks);

Hm. Isn't it a problem that we have no CHECK_FOR_INTERRUPTS() in this loop?

Yes. fixed.

@@ -2251,7 +2274,14 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
scan->lossy_pages++;
}

-     return ntup > 0;
+     /*
+      * Return true to indicate that a valid block was found and the bitmap is
+      * not exhausted. If there are no visible tuples on this page,
+      * hscan->rs_ntuples will be 0 and heapam_scan_bitmap_next_tuple() will
+      * return false returning control to this function to advance to the next
+      * block in the bitmap.
+      */
+     return true;
}

Why can't we fetch the next block immediately?

We don't know that we want another block until we've gone through this
page and seen there were no visible tuples, so we'd somehow have to
jump back up to the top of the function to get the next block -- which
is basically what is happening in my revised control flow. We call
heapam_scan_bitmap_next_tuple() and rs_ntuples is 0, so we end up
calling heapam_scan_bitmap_next_block() right away.

@@ -201,46 +197,23 @@ BitmapHeapNext(BitmapHeapScanState *node)
can_skip_fetch);
}

-             node->tbmiterator = tbmiterator;
-             node->shared_tbmiterator = shared_tbmiterator;
+             scan->tbmiterator = tbmiterator;
+             scan->shared_tbmiterator = shared_tbmiterator;

It seems a bit odd that this code modifies the scan descriptor, instead of
passing the iterator, or perhaps better the bitmap itself, to
table_beginscan_bm()?

On rescan we actually will have initialized = false and make new
iterators but have the old scan descriptor. So, we need to be able to
set the iterator in the scan to the new iterator.

diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b74e08dd745..bf7ee044268 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -16,6 +16,7 @@

#include "access/htup_details.h"
#include "access/itup.h"
+#include "nodes/tidbitmap.h"

I'd like to avoid exposing this to everything including relscan.h. I think we
could just forward declare the structs and use them here to avoid that?

Done

From aac60985d6bc70bfedf77a77ee3c512da87bfcb1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 13 Feb 2024 14:27:57 -0500
Subject: [PATCH v1 11/11] BitmapHeapScan uses streaming read API

Remove all of the code to do prefetching from BitmapHeapScan code and
rely on the streaming read API prefetching. Heap table AM implements a
streaming read callback which uses the iterator to get the next valid
block that needs to be fetched for the streaming read API.
---
src/backend/access/gin/ginget.c | 15 +-
src/backend/access/gin/ginscan.c | 7 +
src/backend/access/heap/heapam.c | 71 +++++
src/backend/access/heap/heapam_handler.c | 78 +++--
src/backend/executor/nodeBitmapHeapscan.c | 328 +---------------------
src/backend/nodes/tidbitmap.c | 80 +++---
src/include/access/heapam.h | 2 +
src/include/access/tableam.h | 14 +-
src/include/nodes/execnodes.h | 19 --
src/include/nodes/tidbitmap.h | 8 +-
10 files changed, 178 insertions(+), 444 deletions(-)

diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 0b4f2ebadb6..3ce28078a6f 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -373,7 +373,10 @@ restartScanEntry:
if (entry->matchBitmap)
{
if (entry->matchIterator)
+                             {
tbm_end_iterate(entry->matchIterator);
+                                     pfree(entry->matchResult);
+                             }
entry->matchIterator = NULL;
tbm_free(entry->matchBitmap);
entry->matchBitmap = NULL;
@@ -386,6 +389,7 @@ restartScanEntry:
if (entry->matchBitmap && !tbm_is_empty(entry->matchBitmap))
{
entry->matchIterator = tbm_begin_iterate(entry->matchBitmap);
+                     entry->matchResult = palloc0(TBM_ITERATE_RESULT_SIZE);

Do we actually have to use palloc0? TBM_ITERATE_RESULT_SIZE ain't small, so
zeroing all of it isn't free.

Tests actually did fail when I didn't use palloc0.

This code is different now though. There are a few new patches in v2
that 1) make the offsets array in the TBMIterateResult fixed size and
then this makes it possible to 2) make matchResult an inline member of
the GinScanEntry. I have a TODO in the code asking if setting blockno
in the TBMIterateResult to InvalidBlockNumber is sufficient
"resetting".

+static BlockNumber bitmapheap_pgsr_next_single(PgStreamingRead *pgsr, void *pgsr_private,
+                                                     void *per_buffer_data);

Is it correct to have _single in the name here? Aren't we also using for
parallel scans?

Right. I had a separate parallel version and then deleted it. This is now fixed.

+static BlockNumber
+bitmapheap_pgsr_next_single(PgStreamingRead *pgsr, void *pgsr_private,
+                                                     void *per_buffer_data)
+{
+     TBMIterateResult *tbmres = per_buffer_data;
+     HeapScanDesc hdesc = (HeapScanDesc) pgsr_private;
+
+     for (;;)
+     {
+             if (hdesc->rs_base.shared_tbmiterator)
+                     tbm_shared_iterate(hdesc->rs_base.shared_tbmiterator, tbmres);
+             else
+                     tbm_iterate(hdesc->rs_base.tbmiterator, tbmres);
+
+             /* no more entries in the bitmap */
+             if (!BlockNumberIsValid(tbmres->blockno))
+                     return InvalidBlockNumber;
+
+             /*
+              * Ignore any claimed entries past what we think is the end of the
+              * relation. It may have been extended after the start of our scan (we
+              * only hold an AccessShareLock, and it could be inserts from this
+              * backend).  We don't take this optimization in SERIALIZABLE
+              * isolation though, as we need to examine all invisible tuples
+              * reachable by the index.
+              */
+             if (!IsolationIsSerializable() && tbmres->blockno >= hdesc->rs_nblocks)
+                     continue;
+
+
+             if (hdesc->rs_base.rs_flags & SO_CAN_SKIP_FETCH &&
+                     !tbmres->recheck &&
+                     VM_ALL_VISIBLE(hdesc->rs_base.rs_rd, tbmres->blockno, &hdesc->vmbuffer))
+             {
+                     hdesc->empty_tuples += tbmres->ntuples;
+                     continue;
+             }
+
+             return tbmres->blockno;
+     }
+
+     /* not reachable */
+     Assert(false);
+}

Need to check for interrupts somewhere here.

Done.

@@ -124,15 +119,6 @@ BitmapHeapNext(BitmapHeapScanState *node)

There's still a comment in BitmapHeapNext talking about prefetching with two
iterators etc. That seems outdated now.

Fixed.

/*
* tbm_iterate - scan through next page of a TIDBitmap
*
- * Returns a TBMIterateResult representing one page, or NULL if there are
- * no more pages to scan.  Pages are guaranteed to be delivered in numerical
- * order.  If result->ntuples < 0, then the bitmap is "lossy" and failed to
- * remember the exact tuples to look at on this page --- the caller must
- * examine all tuples on the page and check if they meet the intended
- * condition.  If result->recheck is true, only the indicated tuples need
- * be examined, but the condition must be rechecked anyway.  (For ease of
- * testing, recheck is always set true when ntuples < 0.)
+ * Caller must pass in a TBMIterateResult to be filled.
+ *
+ * Pages are guaranteed to be delivered in numerical order.  tbmres->blockno is
+ * set to InvalidBlockNumber when there are no more pages to scan. If
+ * tbmres->ntuples < 0, then the bitmap is "lossy" and failed to remember the
+ * exact tuples to look at on this page --- the caller must examine all tuples
+ * on the page and check if they meet the intended condition.  If
+ * tbmres->recheck is true, only the indicated tuples need be examined, but the
+ * condition must be rechecked anyway.  (For ease of testing, recheck is always
+ * set true when ntuples < 0.)
*/
-TBMIterateResult *
-tbm_iterate(TBMIterator *iterator)
+void
+tbm_iterate(TBMIterator *iterator, TBMIterateResult *tbmres)

Hm - it seems a tad odd that we later have to find out if the scan is done
iterating by checking if blockno is valid, when tbm_iterate already knew. But
I guess the code would be a bit uglier if we needed the result of
tbm_[shared_]iterate(), due to the two functions.

Yes.

Right now ExecEndBitmapHeapScan() frees the tbm before it does table_endscan()
- which seems problematic, as heap_endscan() will do stuff like
tbm_end_iterate(), which imo shouldn't be called after the tbm has been freed,
even if that works today.

I've flipped the order -- I end the scan then free the bitmap.

It seems a bit confusing that your changes seem to treat
BitmapHeapScanState->initialized as separate from ->scan, even though afaict
scan should be NULL iff initialized is false and vice versa.

I thought so too, but it seems on rescan that the node->initialized is
set to false but the scan is reused. So, we want to only make a new
scan descriptor if it is truly the beginning of a new scan.

- Melanie

Attachments:

v2-0011-Separate-TBM-Shared-Iterator-and-TBMIterateResult.patchtext/x-patch; charset=US-ASCII; name=v2-0011-Separate-TBM-Shared-Iterator-and-TBMIterateResult.patchDownload+107-88
v2-0012-Streaming-Read-API.patchtext/x-patch; charset=US-ASCII; name=v2-0012-Streaming-Read-API.patchDownload+986-239
v2-0010-Hard-code-TBMIterateResult-offsets-array-size.patchtext/x-patch; charset=US-ASCII; name=v2-0010-Hard-code-TBMIterateResult-offsets-array-size.patchDownload+16-24
v2-0013-BitmapHeapScan-uses-streaming-read-API.patchtext/x-patch; charset=US-ASCII; name=v2-0013-BitmapHeapScan-uses-streaming-read-API.patchDownload+117-420
v2-0008-Reduce-scope-of-BitmapHeapScan-tbmiterator-local-.patchtext/x-patch; charset=US-ASCII; name=v2-0008-Reduce-scope-of-BitmapHeapScan-tbmiterator-local-.patchDownload+9-12
v2-0009-Make-table_scan_bitmap_next_block-async-friendly.patchtext/x-patch; charset=US-ASCII; name=v2-0009-Make-table_scan_bitmap_next_block-async-friendly.patchDownload+145-125
v2-0003-BitmapHeapScan-begin-scan-after-bitmap-setup.patchtext/x-patch; charset=US-ASCII; name=v2-0003-BitmapHeapScan-begin-scan-after-bitmap-setup.patchDownload+38-29
v2-0001-Remove-table_scan_bitmap_next_tuple-parameter-tbm.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Remove-table_scan_bitmap_next_tuple-parameter-tbm.patchDownload+1-10
v2-0004-BitmapPrefetch-use-prefetch-block-recheck-for-ski.patchtext/x-patch; charset=US-ASCII; name=v2-0004-BitmapPrefetch-use-prefetch-block-recheck-for-ski.patchDownload+2-3
v2-0002-BitmapHeapScan-set-can_skip_fetch-later.patchtext/x-patch; charset=US-ASCII; name=v2-0002-BitmapHeapScan-set-can_skip_fetch-later.patchDownload+11-11
v2-0005-Update-BitmapAdjustPrefetchIterator-parameter-typ.patchtext/x-patch; charset=US-ASCII; name=v2-0005-Update-BitmapAdjustPrefetchIterator-parameter-typ.patchDownload+4-5
v2-0007-BitmapHeapScan-scan-desc-counts-lossy-and-exact-p.patchtext/x-patch; charset=US-ASCII; name=v2-0007-BitmapHeapScan-scan-desc-counts-lossy-and-exact-p.patchDownload+32-7
v2-0006-Push-BitmapHeapScan-skip-fetch-optimization-into-.patchtext/x-patch; charset=US-ASCII; name=v2-0006-Push-BitmapHeapScan-skip-fetch-optimization-into-.patchDownload+95-98
#7Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#6)
Re: BitmapHeapScan streaming read user and prelim refactoring

In the attached v3, I've reordered the commits, updated some errant
comments, and improved the commit messages.

I've also made some updates to the TIDBitmap API that seem like a
clarity improvement to the API in general. These also reduce the diff
for GIN when separating the TBMIterateResult from the
TBM[Shared]Iterator. And these TIDBitmap API changes are now all in
their own commits (previously those were in the same commit as adding
the BitmapHeapScan streaming read user).

The three outstanding issues I see in the patch set are:
1) the lossy and exact page counters issue described in my previous
email
2) the TODO in the TIDBitmap API changes about being sure that setting
TBMIterateResult->blockno to InvalidBlockNumber is sufficient for
indicating an invalid TBMIterateResult (and an exhausted bitmap)
3) the streaming read API is not committed yet, so the last two patches
are not "done"

- Melanie

Attachments:

v3-0001-BitmapHeapScan-begin-scan-after-bitmap-creation.patchtext/x-diff; charset=us-asciiDownload+42-29
v3-0002-BitmapHeapScan-set-can_skip_fetch-later.patchtext/x-diff; charset=us-asciiDownload+11-11
v3-0003-Push-BitmapHeapScan-skip-fetch-optimization-into-.patchtext/x-diff; charset=us-asciiDownload+94-93
v3-0004-BitmapPrefetch-use-prefetch-block-recheck-for-ski.patchtext/x-diff; charset=us-asciiDownload+2-9
v3-0005-Update-BitmapAdjustPrefetchIterator-parameter-typ.patchtext/x-diff; charset=us-asciiDownload+4-5
v3-0006-BitmapHeapScan-scan-desc-counts-lossy-and-exact-p.patchtext/x-diff; charset=us-asciiDownload+32-7
v3-0007-Reduce-scope-of-BitmapHeapScan-tbmiterator-local-.patchtext/x-diff; charset=us-asciiDownload+9-12
v3-0008-Remove-table_scan_bitmap_next_tuple-parameter-tbm.patchtext/x-diff; charset=us-asciiDownload+2-14
v3-0009-Make-table_scan_bitmap_next_block-async-friendly.patchtext/x-diff; charset=us-asciiDownload+150-129
v3-0010-Hard-code-TBMIterateResult-offsets-array-size.patchtext/x-diff; charset=us-asciiDownload+17-25
v3-0011-Separate-TBM-Shared-Iterator-and-TBMIterateResult.patchtext/x-diff; charset=us-asciiDownload+107-88
v3-0012-Streaming-Read-API.patchtext/x-diff; charset=us-asciiDownload+986-239
v3-0013-BitmapHeapScan-uses-streaming-read-API.patchtext/x-diff; charset=us-asciiDownload+117-422
#8Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#7)
Re: BitmapHeapScan streaming read user and prelim refactoring

On Fri, Feb 16, 2024 at 12:35:59PM -0500, Melanie Plageman wrote:

In the attached v3, I've reordered the commits, updated some errant
comments, and improved the commit messages.

I've also made some updates to the TIDBitmap API that seem like a
clarity improvement to the API in general. These also reduce the diff
for GIN when separating the TBMIterateResult from the
TBM[Shared]Iterator. And these TIDBitmap API changes are now all in
their own commits (previously those were in the same commit as adding
the BitmapHeapScan streaming read user).

The three outstanding issues I see in the patch set are:
1) the lossy and exact page counters issue described in my previous
email

I've resolved this. I added a new patch to the set which starts counting
even pages with no visible tuples toward lossy and exact pages. After an
off-list conversation with Andres, it seems that this omission in master
may not have been intentional.

Once we have only two types of pages to differentiate between (lossy and
exact [no longer have to care about "has no visible tuples"]), it is
easy enough to pass a "lossy" boolean paramater to
table_scan_bitmap_next_block(). I've done this in the attached v4.

- Melanie

Attachments:

v4-0001-BitmapHeapScan-begin-scan-after-bitmap-creation.patchtext/x-diff; charset=us-asciiDownload+42-29
v4-0002-BitmapHeapScan-set-can_skip_fetch-later.patchtext/x-diff; charset=us-asciiDownload+11-11
v4-0003-Push-BitmapHeapScan-skip-fetch-optimization-into-.patchtext/x-diff; charset=us-asciiDownload+94-93
v4-0004-BitmapPrefetch-use-prefetch-block-recheck-for-ski.patchtext/x-diff; charset=us-asciiDownload+2-9
v4-0005-Update-BitmapAdjustPrefetchIterator-parameter-typ.patchtext/x-diff; charset=us-asciiDownload+4-5
v4-0006-EXPLAIN-Bitmap-table-scan-also-count-no-visible-t.patchtext/x-diff; charset=us-asciiDownload+13-7
v4-0007-table_scan_bitmap_next_block-returns-lossy-or-exa.patchtext/x-diff; charset=us-asciiDownload+19-11
v4-0008-Reduce-scope-of-BitmapHeapScan-tbmiterator-local-.patchtext/x-diff; charset=us-asciiDownload+9-12
v4-0009-Remove-table_scan_bitmap_next_tuple-parameter-tbm.patchtext/x-diff; charset=us-asciiDownload+2-14
v4-0010-Make-table_scan_bitmap_next_block-async-friendly.patchtext/x-diff; charset=us-asciiDownload+168-143
v4-0011-Hard-code-TBMIterateResult-offsets-array-size.patchtext/x-diff; charset=us-asciiDownload+17-25
v4-0012-Separate-TBM-Shared-Iterator-and-TBMIterateResult.patchtext/x-diff; charset=us-asciiDownload+107-88
v4-0013-Streaming-Read-API.patchtext/x-diff; charset=us-asciiDownload+986-239
v4-0014-BitmapHeapScan-uses-streaming-read-API.patchtext/x-diff; charset=us-asciiDownload+117-421
#9Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#8)
Re: BitmapHeapScan streaming read user and prelim refactoring

On Mon, Feb 26, 2024 at 08:50:28PM -0500, Melanie Plageman wrote:

On Fri, Feb 16, 2024 at 12:35:59PM -0500, Melanie Plageman wrote:

In the attached v3, I've reordered the commits, updated some errant
comments, and improved the commit messages.

I've also made some updates to the TIDBitmap API that seem like a
clarity improvement to the API in general. These also reduce the diff
for GIN when separating the TBMIterateResult from the
TBM[Shared]Iterator. And these TIDBitmap API changes are now all in
their own commits (previously those were in the same commit as adding
the BitmapHeapScan streaming read user).

The three outstanding issues I see in the patch set are:
1) the lossy and exact page counters issue described in my previous
email

I've resolved this. I added a new patch to the set which starts counting
even pages with no visible tuples toward lossy and exact pages. After an
off-list conversation with Andres, it seems that this omission in master
may not have been intentional.

Once we have only two types of pages to differentiate between (lossy and
exact [no longer have to care about "has no visible tuples"]), it is
easy enough to pass a "lossy" boolean paramater to
table_scan_bitmap_next_block(). I've done this in the attached v4.

Thomas posted a new version of the Streaming Read API [1]/messages/by-id/CA+hUKGJtLyxcAEvLhVUhgD4fMQkOu3PDaj8Qb9SR_UsmzgsBpQ@mail.gmail.com, so here is a
rebased v5. This should make it easier to review as it can be applied on
top of master.

- Melanie

[1]: /messages/by-id/CA+hUKGJtLyxcAEvLhVUhgD4fMQkOu3PDaj8Qb9SR_UsmzgsBpQ@mail.gmail.com

Attachments:

v5-0001-BitmapHeapScan-begin-scan-after-bitmap-creation.patchtext/x-diff; charset=us-asciiDownload+42-29
v5-0002-BitmapHeapScan-set-can_skip_fetch-later.patchtext/x-diff; charset=us-asciiDownload+11-11
v5-0003-Push-BitmapHeapScan-skip-fetch-optimization-into-.patchtext/x-diff; charset=us-asciiDownload+94-93
v5-0004-BitmapPrefetch-use-prefetch-block-recheck-for-ski.patchtext/x-diff; charset=us-asciiDownload+2-9
v5-0005-Update-BitmapAdjustPrefetchIterator-parameter-typ.patchtext/x-diff; charset=us-asciiDownload+4-5
v5-0006-EXPLAIN-Bitmap-table-scan-also-count-no-visible-t.patchtext/x-diff; charset=us-asciiDownload+13-7
v5-0007-table_scan_bitmap_next_block-returns-lossy-or-exa.patchtext/x-diff; charset=us-asciiDownload+19-11
v5-0008-Reduce-scope-of-BitmapHeapScan-tbmiterator-local-.patchtext/x-diff; charset=us-asciiDownload+9-12
v5-0009-Remove-table_scan_bitmap_next_tuple-parameter-tbm.patchtext/x-diff; charset=us-asciiDownload+2-14
v5-0010-Make-table_scan_bitmap_next_block-async-friendly.patchtext/x-diff; charset=us-asciiDownload+168-143
v5-0011-Hard-code-TBMIterateResult-offsets-array-size.patchtext/x-diff; charset=us-asciiDownload+17-25
v5-0012-Separate-TBM-Shared-Iterator-and-TBMIterateResult.patchtext/x-diff; charset=us-asciiDownload+107-88
v5-0013-Streaming-Read-API.patchtext/x-diff; charset=us-asciiDownload+1218-212
v5-0014-BitmapHeapScan-uses-streaming-read-API.patchtext/x-diff; charset=us-asciiDownload+117-421
#10Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#9)
Re: BitmapHeapScan streaming read user and prelim refactoring

Hi,

I haven't looked at the code very closely yet, but I decided to do some
basic benchmarks to see if/how this refactoring affects behavior.

Attached is a simple .sh script that

1) creates a table with one of a couple basic data distributions
(uniform, linear, ...), with an index on top

2) runs a simple query with a where condition matching a known fraction
of the table (0 - 100%), and measures duration

3) the query is forced to use bitmapscan by disabling other options

4) there's a couple parameters the script varies (work_mem, parallel
workers, ...), the script drops caches etc.

5) I only have results for table with 1M rows, which is ~320MB, so not
huge. I'm running this for larger data set, but that will take time.

I did this on my two "usual" machines - i5 and xeon. Both have flash
storage, although i5 is SATA and xeon has NVMe. I won't share the raw
results, because the CSV is like 5MB - ping me off-list if you need the
file, ofc.

Attached is PDF summarizing the results as a pivot table, with results
for "master" and "patched" builds. The interesting bit is the last
column, which shows whether the patch makes it faster (green) or slower
(red).

The results seem pretty mixed, on both machines. If you focus on the
uncached results (pages 4 and 8-9), there's both runs that are much
faster (by a factor of 2-5x) and slower (similar factor).

Of course, these results are with forced bitmap scans, so the question
is if those regressions even matter - maybe we'd use a different scan
type, making these changes less severe. So I logged "optimal plan" for
each run, tracking the scan type the optimizer would really pick without
all the enable_* GUCs. And the -optimal.pdf shows only results for the
runs where the optimal plan uses the bitmap scan. And yes, while the
impact of the changes (in either direction) is reduced, it's still very
much there.

What's a bit surprising to me is that these regressions affect runs with
effective_io_concurrency=0 in particular, which traditionally meant to
not do any prefetching / async stuff. I've perceived the patch mostly as
refactoring, so have not really expected such massive impact on these cases.

So I wonder if the refactoring means that we're actually doing some sort
amount of prefetching even with e_i_c=0. I'm not sure that'd be great, I
assume people have valid reasons to disable prefetching ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

bitmapscan-results.pdfapplication/pdf; name=bitmapscan-results.pdfDownload+10-13
bitmapscan-results-optimal.pdfapplication/pdf; name=bitmapscan-results-optimal.pdfDownload
bitmapscan-test.shapplication/x-shellscript; name=bitmapscan-test.shDownload
#11Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#10)
Re: BitmapHeapScan streaming read user and prelim refactoring

On Wed, Feb 28, 2024 at 8:22 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

I haven't looked at the code very closely yet, but I decided to do some
basic benchmarks to see if/how this refactoring affects behavior.

Attached is a simple .sh script that

1) creates a table with one of a couple basic data distributions
(uniform, linear, ...), with an index on top

2) runs a simple query with a where condition matching a known fraction
of the table (0 - 100%), and measures duration

3) the query is forced to use bitmapscan by disabling other options

4) there's a couple parameters the script varies (work_mem, parallel
workers, ...), the script drops caches etc.

5) I only have results for table with 1M rows, which is ~320MB, so not
huge. I'm running this for larger data set, but that will take time.

I did this on my two "usual" machines - i5 and xeon. Both have flash
storage, although i5 is SATA and xeon has NVMe. I won't share the raw
results, because the CSV is like 5MB - ping me off-list if you need the
file, ofc.

I haven't looked at your results in detail yet. I plan to dig into
this more later today. But, I was wondering if it was easy for you to
run the shorter tests on just the commits before the last
https://github.com/melanieplageman/postgres/tree/bhs_pgsr
i.e. patches 0001-0013. Patch 0014 implements the streaming read user
and removes all of the existing prefetch code. I would be interested
to know if the behavior with just the preliminary refactoring differs
at all.

- Melanie

#12Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#11)
Re: BitmapHeapScan streaming read user and prelim refactoring

On 2/28/24 15:38, Melanie Plageman wrote:

On Wed, Feb 28, 2024 at 8:22 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Hi,

I haven't looked at the code very closely yet, but I decided to do some
basic benchmarks to see if/how this refactoring affects behavior.

Attached is a simple .sh script that

1) creates a table with one of a couple basic data distributions
(uniform, linear, ...), with an index on top

2) runs a simple query with a where condition matching a known fraction
of the table (0 - 100%), and measures duration

3) the query is forced to use bitmapscan by disabling other options

4) there's a couple parameters the script varies (work_mem, parallel
workers, ...), the script drops caches etc.

5) I only have results for table with 1M rows, which is ~320MB, so not
huge. I'm running this for larger data set, but that will take time.

I did this on my two "usual" machines - i5 and xeon. Both have flash
storage, although i5 is SATA and xeon has NVMe. I won't share the raw
results, because the CSV is like 5MB - ping me off-list if you need the
file, ofc.

I haven't looked at your results in detail yet. I plan to dig into
this more later today. But, I was wondering if it was easy for you to
run the shorter tests on just the commits before the last
https://github.com/melanieplageman/postgres/tree/bhs_pgsr
i.e. patches 0001-0013. Patch 0014 implements the streaming read user
and removes all of the existing prefetch code. I would be interested
to know if the behavior with just the preliminary refactoring differs
at all.

Sure, I can do that. It'll take a couple hours to get the results, I'll
share them when I have them.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#12)
Re: BitmapHeapScan streaming read user and prelim refactoring

On 2/28/24 15:56, Tomas Vondra wrote:

...

Sure, I can do that. It'll take a couple hours to get the results, I'll
share them when I have them.

Here are the results with only patches 0001 - 0012 applied (i.e. without
the patch introducing the streaming read API, and the patch switching
the bitmap heap scan to use it).

The changes in performance don't disappear entirely, but the scale is
certainly much smaller - both in the complete results for all runs, and
for the "optimal" runs that would actually pick bitmapscan.

FWIW I'm not implying the patch must 100% maintain the current behavior,
or anything like that. At this point I'm more motivated to understand if
this change in behavior is expected and/or what this means for users.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

bitmapscan-refactoring-results-optimal.pdfapplication/pdf; name=bitmapscan-refactoring-results-optimal.pdfDownload+3-9
bitmapscan-refactoring-results.pdfapplication/pdf; name=bitmapscan-refactoring-results.pdfDownload+6-11
#14Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#13)
Re: BitmapHeapScan streaming read user and prelim refactoring

On Wed, Feb 28, 2024 at 2:23 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/28/24 15:56, Tomas Vondra wrote:

...

Sure, I can do that. It'll take a couple hours to get the results, I'll
share them when I have them.

Here are the results with only patches 0001 - 0012 applied (i.e. without
the patch introducing the streaming read API, and the patch switching
the bitmap heap scan to use it).

The changes in performance don't disappear entirely, but the scale is
certainly much smaller - both in the complete results for all runs, and
for the "optimal" runs that would actually pick bitmapscan.

Hmm. I'm trying to think how my refactor could have had this impact.
It seems like all the most notable regressions are with 4 parallel
workers. What do the numeric column labels mean across the top
(2,4,8,16...) -- are they related to "matches"? And if so, what does
that mean?

- Melanie

#15Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#14)
Re: BitmapHeapScan streaming read user and prelim refactoring

On 2/28/24 21:06, Melanie Plageman wrote:

On Wed, Feb 28, 2024 at 2:23 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/28/24 15:56, Tomas Vondra wrote:

...

Sure, I can do that. It'll take a couple hours to get the results, I'll
share them when I have them.

Here are the results with only patches 0001 - 0012 applied (i.e. without
the patch introducing the streaming read API, and the patch switching
the bitmap heap scan to use it).

The changes in performance don't disappear entirely, but the scale is
certainly much smaller - both in the complete results for all runs, and
for the "optimal" runs that would actually pick bitmapscan.

Hmm. I'm trying to think how my refactor could have had this impact.
It seems like all the most notable regressions are with 4 parallel
workers. What do the numeric column labels mean across the top
(2,4,8,16...) -- are they related to "matches"? And if so, what does
that mean?

That's the number of distinct values matched by the query, which should
be an approximation of the number of matching rows. The number of
distinct values in the data set differs by data set, but for 1M rows
it's roughly like this:

uniform: 10k
linear: 10k
cyclic: 100

So for example matches=128 means ~1% of rows for uniform/linear, and
100% for cyclic data sets.

As for the possible cause, I think it's clear most of the difference
comes from the last patch that actually switches bitmap heap scan to the
streaming read API. That's mostly expected/understandable, although we
probably need to look into the regressions or cases with e_i_c=0.

To analyze the 0001-0012 patches, maybe it'd be helpful to run tests for
individual patches. I can try doing that tomorrow. It'll have to be a
limited set of tests, to reduce the time, but might tell us whether it's
due to a single patch or multiple patches.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#15)
Re: BitmapHeapScan streaming read user and prelim refactoring

On Wed, Feb 28, 2024 at 6:17 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/28/24 21:06, Melanie Plageman wrote:

On Wed, Feb 28, 2024 at 2:23 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/28/24 15:56, Tomas Vondra wrote:

...

Sure, I can do that. It'll take a couple hours to get the results, I'll
share them when I have them.

Here are the results with only patches 0001 - 0012 applied (i.e. without
the patch introducing the streaming read API, and the patch switching
the bitmap heap scan to use it).

The changes in performance don't disappear entirely, but the scale is
certainly much smaller - both in the complete results for all runs, and
for the "optimal" runs that would actually pick bitmapscan.

Hmm. I'm trying to think how my refactor could have had this impact.
It seems like all the most notable regressions are with 4 parallel
workers. What do the numeric column labels mean across the top
(2,4,8,16...) -- are they related to "matches"? And if so, what does
that mean?

That's the number of distinct values matched by the query, which should
be an approximation of the number of matching rows. The number of
distinct values in the data set differs by data set, but for 1M rows
it's roughly like this:

uniform: 10k
linear: 10k
cyclic: 100

So for example matches=128 means ~1% of rows for uniform/linear, and
100% for cyclic data sets.

Ah, thank you for the explanation. I also looked at your script after
having sent this email and saw that it is clear in your script what
"matches" is.

As for the possible cause, I think it's clear most of the difference
comes from the last patch that actually switches bitmap heap scan to the
streaming read API. That's mostly expected/understandable, although we
probably need to look into the regressions or cases with e_i_c=0.

Right, I'm mostly surprised about the regressions for patches 0001-0012.

Re eic 0: Thomas Munro and I chatted off-list, and you bring up a
great point about eic 0. In old bitmapheapscan code eic 0 basically
disabled prefetching but with the streaming read API, it will still
issue fadvises when eic is 0. That is an easy one line fix. Thomas
prefers to fix it by always avoiding an fadvise for the last buffer in
a range before issuing a read (since we are about to read it anyway,
best not fadvise it too). This will fix eic 0 and also cut one system
call from each invocation of the streaming read machinery.

To analyze the 0001-0012 patches, maybe it'd be helpful to run tests for
individual patches. I can try doing that tomorrow. It'll have to be a
limited set of tests, to reduce the time, but might tell us whether it's
due to a single patch or multiple patches.

Yes, tomorrow I planned to start trying to repro some of the "red"
cases myself. Any one of the commits could cause a slight regression
but a 3.5x regression is quite surprising, so I might focus on trying
to repro that locally and then narrow down which patch causes it.

For the non-cached regressions, perhaps the commit to use the correct
recheck flag (0004) when prefetching could be the culprit. And for the
cached regressions, my money is on the commit which changes the whole
control flow of BitmapHeapNext() and the next_block() and next_tuple()
functions (0010).

- Melanie

#17Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#16)
Re: BitmapHeapScan streaming read user and prelim refactoring

On 2/29/24 00:40, Melanie Plageman wrote:

On Wed, Feb 28, 2024 at 6:17 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/28/24 21:06, Melanie Plageman wrote:

On Wed, Feb 28, 2024 at 2:23 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/28/24 15:56, Tomas Vondra wrote:

...

Sure, I can do that. It'll take a couple hours to get the results, I'll
share them when I have them.

Here are the results with only patches 0001 - 0012 applied (i.e. without
the patch introducing the streaming read API, and the patch switching
the bitmap heap scan to use it).

The changes in performance don't disappear entirely, but the scale is
certainly much smaller - both in the complete results for all runs, and
for the "optimal" runs that would actually pick bitmapscan.

Hmm. I'm trying to think how my refactor could have had this impact.
It seems like all the most notable regressions are with 4 parallel
workers. What do the numeric column labels mean across the top
(2,4,8,16...) -- are they related to "matches"? And if so, what does
that mean?

That's the number of distinct values matched by the query, which should
be an approximation of the number of matching rows. The number of
distinct values in the data set differs by data set, but for 1M rows
it's roughly like this:

uniform: 10k
linear: 10k
cyclic: 100

So for example matches=128 means ~1% of rows for uniform/linear, and
100% for cyclic data sets.

Ah, thank you for the explanation. I also looked at your script after
having sent this email and saw that it is clear in your script what
"matches" is.

As for the possible cause, I think it's clear most of the difference
comes from the last patch that actually switches bitmap heap scan to the
streaming read API. That's mostly expected/understandable, although we
probably need to look into the regressions or cases with e_i_c=0.

Right, I'm mostly surprised about the regressions for patches 0001-0012.

Re eic 0: Thomas Munro and I chatted off-list, and you bring up a
great point about eic 0. In old bitmapheapscan code eic 0 basically
disabled prefetching but with the streaming read API, it will still
issue fadvises when eic is 0. That is an easy one line fix. Thomas
prefers to fix it by always avoiding an fadvise for the last buffer in
a range before issuing a read (since we are about to read it anyway,
best not fadvise it too). This will fix eic 0 and also cut one system
call from each invocation of the streaming read machinery.

To analyze the 0001-0012 patches, maybe it'd be helpful to run tests for
individual patches. I can try doing that tomorrow. It'll have to be a
limited set of tests, to reduce the time, but might tell us whether it's
due to a single patch or multiple patches.

Yes, tomorrow I planned to start trying to repro some of the "red"
cases myself. Any one of the commits could cause a slight regression
but a 3.5x regression is quite surprising, so I might focus on trying
to repro that locally and then narrow down which patch causes it.

For the non-cached regressions, perhaps the commit to use the correct
recheck flag (0004) when prefetching could be the culprit. And for the
cached regressions, my money is on the commit which changes the whole
control flow of BitmapHeapNext() and the next_block() and next_tuple()
functions (0010).

I do have some partial results, comparing the patches. I only ran one of
the more affected workloads (cyclic) on the xeon, attached is a PDF
comparing master and the 0001-0014 patches. The percentages are timing
vs. the preceding patch (green - faster, red - slower).

This suggests only patches 0010 and 0014 affect performance, the rest is
just noise. I'll see if I can do more runs and get data from the other
machine (seems it's more significant on old SATA SSDs).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

patch-comparison.pdfapplication/pdf; name=patch-comparison.pdfDownload
#18Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#17)
Re: BitmapHeapScan streaming read user and prelim refactoring

On Thu, Feb 29, 2024 at 7:54 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/29/24 00:40, Melanie Plageman wrote:

On Wed, Feb 28, 2024 at 6:17 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/28/24 21:06, Melanie Plageman wrote:

On Wed, Feb 28, 2024 at 2:23 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/28/24 15:56, Tomas Vondra wrote:

...

Sure, I can do that. It'll take a couple hours to get the results, I'll
share them when I have them.

Here are the results with only patches 0001 - 0012 applied (i.e. without
the patch introducing the streaming read API, and the patch switching
the bitmap heap scan to use it).

The changes in performance don't disappear entirely, but the scale is
certainly much smaller - both in the complete results for all runs, and
for the "optimal" runs that would actually pick bitmapscan.

Hmm. I'm trying to think how my refactor could have had this impact.
It seems like all the most notable regressions are with 4 parallel
workers. What do the numeric column labels mean across the top
(2,4,8,16...) -- are they related to "matches"? And if so, what does
that mean?

That's the number of distinct values matched by the query, which should
be an approximation of the number of matching rows. The number of
distinct values in the data set differs by data set, but for 1M rows
it's roughly like this:

uniform: 10k
linear: 10k
cyclic: 100

So for example matches=128 means ~1% of rows for uniform/linear, and
100% for cyclic data sets.

Ah, thank you for the explanation. I also looked at your script after
having sent this email and saw that it is clear in your script what
"matches" is.

As for the possible cause, I think it's clear most of the difference
comes from the last patch that actually switches bitmap heap scan to the
streaming read API. That's mostly expected/understandable, although we
probably need to look into the regressions or cases with e_i_c=0.

Right, I'm mostly surprised about the regressions for patches 0001-0012.

Re eic 0: Thomas Munro and I chatted off-list, and you bring up a
great point about eic 0. In old bitmapheapscan code eic 0 basically
disabled prefetching but with the streaming read API, it will still
issue fadvises when eic is 0. That is an easy one line fix. Thomas
prefers to fix it by always avoiding an fadvise for the last buffer in
a range before issuing a read (since we are about to read it anyway,
best not fadvise it too). This will fix eic 0 and also cut one system
call from each invocation of the streaming read machinery.

To analyze the 0001-0012 patches, maybe it'd be helpful to run tests for
individual patches. I can try doing that tomorrow. It'll have to be a
limited set of tests, to reduce the time, but might tell us whether it's
due to a single patch or multiple patches.

Yes, tomorrow I planned to start trying to repro some of the "red"
cases myself. Any one of the commits could cause a slight regression
but a 3.5x regression is quite surprising, so I might focus on trying
to repro that locally and then narrow down which patch causes it.

For the non-cached regressions, perhaps the commit to use the correct
recheck flag (0004) when prefetching could be the culprit. And for the
cached regressions, my money is on the commit which changes the whole
control flow of BitmapHeapNext() and the next_block() and next_tuple()
functions (0010).

I do have some partial results, comparing the patches. I only ran one of
the more affected workloads (cyclic) on the xeon, attached is a PDF
comparing master and the 0001-0014 patches. The percentages are timing
vs. the preceding patch (green - faster, red - slower).

Just confirming: the results are for uncached?

- Melanie

#19Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#18)
Re: BitmapHeapScan streaming read user and prelim refactoring

On 2/29/24 22:19, Melanie Plageman wrote:

On Thu, Feb 29, 2024 at 7:54 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/29/24 00:40, Melanie Plageman wrote:

On Wed, Feb 28, 2024 at 6:17 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/28/24 21:06, Melanie Plageman wrote:

On Wed, Feb 28, 2024 at 2:23 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/28/24 15:56, Tomas Vondra wrote:

...

Sure, I can do that. It'll take a couple hours to get the results, I'll
share them when I have them.

Here are the results with only patches 0001 - 0012 applied (i.e. without
the patch introducing the streaming read API, and the patch switching
the bitmap heap scan to use it).

The changes in performance don't disappear entirely, but the scale is
certainly much smaller - both in the complete results for all runs, and
for the "optimal" runs that would actually pick bitmapscan.

Hmm. I'm trying to think how my refactor could have had this impact.
It seems like all the most notable regressions are with 4 parallel
workers. What do the numeric column labels mean across the top
(2,4,8,16...) -- are they related to "matches"? And if so, what does
that mean?

That's the number of distinct values matched by the query, which should
be an approximation of the number of matching rows. The number of
distinct values in the data set differs by data set, but for 1M rows
it's roughly like this:

uniform: 10k
linear: 10k
cyclic: 100

So for example matches=128 means ~1% of rows for uniform/linear, and
100% for cyclic data sets.

Ah, thank you for the explanation. I also looked at your script after
having sent this email and saw that it is clear in your script what
"matches" is.

As for the possible cause, I think it's clear most of the difference
comes from the last patch that actually switches bitmap heap scan to the
streaming read API. That's mostly expected/understandable, although we
probably need to look into the regressions or cases with e_i_c=0.

Right, I'm mostly surprised about the regressions for patches 0001-0012.

Re eic 0: Thomas Munro and I chatted off-list, and you bring up a
great point about eic 0. In old bitmapheapscan code eic 0 basically
disabled prefetching but with the streaming read API, it will still
issue fadvises when eic is 0. That is an easy one line fix. Thomas
prefers to fix it by always avoiding an fadvise for the last buffer in
a range before issuing a read (since we are about to read it anyway,
best not fadvise it too). This will fix eic 0 and also cut one system
call from each invocation of the streaming read machinery.

To analyze the 0001-0012 patches, maybe it'd be helpful to run tests for
individual patches. I can try doing that tomorrow. It'll have to be a
limited set of tests, to reduce the time, but might tell us whether it's
due to a single patch or multiple patches.

Yes, tomorrow I planned to start trying to repro some of the "red"
cases myself. Any one of the commits could cause a slight regression
but a 3.5x regression is quite surprising, so I might focus on trying
to repro that locally and then narrow down which patch causes it.

For the non-cached regressions, perhaps the commit to use the correct
recheck flag (0004) when prefetching could be the culprit. And for the
cached regressions, my money is on the commit which changes the whole
control flow of BitmapHeapNext() and the next_block() and next_tuple()
functions (0010).

I do have some partial results, comparing the patches. I only ran one of
the more affected workloads (cyclic) on the xeon, attached is a PDF
comparing master and the 0001-0014 patches. The percentages are timing
vs. the preceding patch (green - faster, red - slower).

Just confirming: the results are for uncached?

Yes, cyclic data set, uncached case. I picked this because it seemed
like one of the most affected cases. Do you want me to test some other
cases too?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#19)
Re: BitmapHeapScan streaming read user and prelim refactoring

On 2/29/24 23:44, Tomas Vondra wrote:

...

I do have some partial results, comparing the patches. I only ran one of
the more affected workloads (cyclic) on the xeon, attached is a PDF
comparing master and the 0001-0014 patches. The percentages are timing
vs. the preceding patch (green - faster, red - slower).

Just confirming: the results are for uncached?

Yes, cyclic data set, uncached case. I picked this because it seemed
like one of the most affected cases. Do you want me to test some other
cases too?

BTW I decided to look at the data from a slightly different angle and
compare the behavior with increasing effective_io_concurrency. Attached
are charts for three "uncached" cases:

* uniform, work_mem=4MB, workers_per_gather=0
* linear-fuzz, work_mem=4MB, workers_per_gather=0
* uniform, work_mem=4MB, workers_per_gather=4

Each page has charts for master and patched build (with all patches). I
think there's a pretty obvious difference in how increasing e_i_c
affects the two builds:

1) On master there's clear difference between eic=0 and eic=1 cases, but
on the patched build there's literally no difference - for example the
"uniform" distribution is clearly not great for prefetching, but eic=0
regresses to eic=1 poor behavior).

Note: This is where the the "red bands" in the charts come from.

2) For some reason, the prefetching with eic>1 perform much better with
the patches, except for with very low selectivity values (close to 0%).
Not sure why this is happening - either the overhead is much lower
(which would matter on these "adversarial" data distribution, but how
could that be when fadvise is not free), or it ends up not doing any
prefetching (but then what about (1)?).

3) I'm not sure about the linear-fuzz case, the only explanation I have
we're able to skip almost all of the prefetches (and read-ahead likely
works pretty well here).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

prefetch-charts.pdfapplication/pdf; name=prefetch-charts.pdfDownload+6-3
#21Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#19)
#22Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#20)
#23Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#22)
#24Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#23)
#25Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#24)
#26Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#21)
#27Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#25)
#28Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#26)
#29Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#27)
#30Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#29)
#31Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#28)
#32Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#30)
#33Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#31)
#34Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#24)
#35Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#34)
#36Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#33)
#37Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Melanie Plageman (#9)
#38Dilip Kumar
dilipbalaut@gmail.com
In reply to: Heikki Linnakangas (#37)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#38)
#40Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Heikki Linnakangas (#37)
#41Thomas Munro
thomas.munro@gmail.com
In reply to: Tomas Vondra (#30)
#42Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#39)
#43Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Dilip Kumar (#42)
#44Dilip Kumar
dilipbalaut@gmail.com
In reply to: Heikki Linnakangas (#43)
#45Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#43)
#46Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#45)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#46)
#48Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Dilip Kumar (#44)
#49Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Thomas Munro (#41)
#50Melanie Plageman
melanieplageman@gmail.com
In reply to: Heikki Linnakangas (#48)
#51Thomas Munro
thomas.munro@gmail.com
In reply to: Tomas Vondra (#49)
#52Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#50)
#53Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#52)
#54Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#53)
#55Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#54)
#56Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#55)
#57Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#56)
#58Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#57)
#59Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#53)
#60Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#58)
#61Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#60)
#62Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#5)
#63Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#59)
#64Melanie Plageman
melanieplageman@gmail.com
In reply to: Heikki Linnakangas (#62)
#65Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#63)
#66Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Melanie Plageman (#64)
#67Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#65)
#68Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#64)
#69Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#50)
#70Melanie Plageman
melanieplageman@gmail.com
In reply to: Heikki Linnakangas (#66)
#71Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#70)
#72Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#71)
#73Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#72)
#74Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#73)
#75Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#74)
#76Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#75)
#77Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#76)
#78Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#77)
#79Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#78)
#80Thomas Munro
thomas.munro@gmail.com
In reply to: Melanie Plageman (#79)
#81Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Thomas Munro (#80)
#82Thomas Munro
thomas.munro@gmail.com
In reply to: Tomas Vondra (#81)
#83Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#79)
#84Thomas Munro
thomas.munro@gmail.com
In reply to: Tomas Vondra (#83)
#85Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Thomas Munro (#84)
#86Thomas Munro
thomas.munro@gmail.com
In reply to: Tomas Vondra (#85)
#87Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#86)
#88Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Thomas Munro (#87)
#89Thomas Munro
thomas.munro@gmail.com
In reply to: Tomas Vondra (#88)
#90Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#89)
#91Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Thomas Munro (#90)
#92Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Thomas Munro (#89)
#93Thomas Munro
thomas.munro@gmail.com
In reply to: Tomas Vondra (#92)
#94Thomas Munro
thomas.munro@gmail.com
In reply to: Tomas Vondra (#91)
#95Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#85)
#96Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#95)
#97Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#96)
#98Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#97)
#99Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#98)
#100Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#99)
#101Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#100)
#102Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#100)
#103Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#101)
#104Tom Lane
tgl@sss.pgh.pa.us
In reply to: Melanie Plageman (#101)
#105Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#103)
#106Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#105)
#107Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#106)
#108Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#107)
#109Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#108)
#110Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#109)
#111Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#110)
#112Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#109)
#113Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#111)
#114Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#112)
#115Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#114)
#116Richard Guo
guofenglinux@gmail.com
In reply to: Tomas Vondra (#114)
#117Michael Paquier
michael@paquier.xyz
In reply to: Melanie Plageman (#115)
#118Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Michael Paquier (#117)
#119Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#118)
#120Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#119)
#121Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#120)
#122Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#121)
#123Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#122)
#124Tom Lane
tgl@sss.pgh.pa.us
In reply to: Melanie Plageman (#123)
#125Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#124)
#126Melanie Plageman
melanieplageman@gmail.com
In reply to: Tom Lane (#125)
#127Daniel Gustafsson
daniel@yesql.se
In reply to: Melanie Plageman (#126)
#128Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Daniel Gustafsson (#127)
#129Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#122)
#130Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#129)
#131Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#130)
#132Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#131)
#133Michael Paquier
michael@paquier.xyz
In reply to: Melanie Plageman (#132)
#134Melanie Plageman
melanieplageman@gmail.com
In reply to: Michael Paquier (#133)
#135Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#134)
#136Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#135)
#137Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#136)
#138Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#137)
#139Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#135)
#140Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#139)
#141Melanie Plageman
melanieplageman@gmail.com
In reply to: Alvaro Herrera (#139)
#142Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#141)
#143Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Melanie Plageman (#141)
#144Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#108)
#145Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#144)
#146Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#145)
#147Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#146)
#148Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#147)
#149Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Melanie Plageman (#144)
#150Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#148)
#151Melanie Plageman
melanieplageman@gmail.com
In reply to: Alvaro Herrera (#149)
#152Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Melanie Plageman (#147)
#153Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#151)
#154Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#153)
#155Melanie Plageman
melanieplageman@gmail.com
In reply to: Tom Lane (#154)
#156Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#150)
#157Thomas Munro
thomas.munro@gmail.com
In reply to: Melanie Plageman (#156)
#158Melanie Plageman
melanieplageman@gmail.com
In reply to: Thomas Munro (#157)
#159Melanie Plageman
melanieplageman@gmail.com
In reply to: Heikki Linnakangas (#152)
#160Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#159)
#161Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#160)
#162Dilip Kumar
dilipbalaut@gmail.com
In reply to: Melanie Plageman (#161)
#163Richard Guo
guofenglinux@gmail.com
In reply to: Melanie Plageman (#161)
#164Richard Guo
guofenglinux@gmail.com
In reply to: Richard Guo (#163)
#165Melanie Plageman
melanieplageman@gmail.com
In reply to: Richard Guo (#164)
#166Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#165)
#167Melanie Plageman
melanieplageman@gmail.com
In reply to: Dilip Kumar (#162)
#168Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#167)
#169Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Melanie Plageman (#168)
#170Melanie Plageman
melanieplageman@gmail.com
In reply to: Nazir Bilal Yavuz (#169)
#171Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#67)
#172Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#170)
#173Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#172)
#174Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#173)
#175Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#173)
#176Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#174)
#177Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#176)
#178Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#176)
#179Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#175)
#180Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#177)
#181Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#180)
#182Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#181)
#183Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#172)
#184Thomas Munro
thomas.munro@gmail.com
In reply to: Melanie Plageman (#183)
#185Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#183)
#186Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Thomas Munro (#184)
#187Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#185)
#188Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#179)
#189Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#188)
#190Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#189)
#191Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#190)
#192Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#179)
#193Thomas Munro
thomas.munro@gmail.com
In reply to: Melanie Plageman (#191)
#194Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#193)
#195Jakub Wartak
jakub.wartak@enterprisedb.com
In reply to: Robert Haas (#182)
#196Andres Freund
andres@anarazel.de
In reply to: Jakub Wartak (#195)
#197Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#179)
#198Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#197)
#199Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#196)
#200Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Melanie Plageman (#198)
#201Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#199)
#202Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#197)
#203Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#200)
#204Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#203)
#205Jakub Wartak
jakub.wartak@enterprisedb.com
In reply to: Andres Freund (#201)
#206Melanie Plageman
melanieplageman@gmail.com
In reply to: Thomas Munro (#193)
#207Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#192)
#208Melanie Plageman
melanieplageman@gmail.com
In reply to: Thomas Munro (#207)
#209Melanie Plageman
melanieplageman@gmail.com
In reply to: Tomas Vondra (#204)
#210Melanie Plageman
melanieplageman@gmail.com
In reply to: Jakub Wartak (#205)
#211Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#206)
#212Thomas Munro
thomas.munro@gmail.com
In reply to: Melanie Plageman (#211)
#213Melanie Plageman
melanieplageman@gmail.com
In reply to: Thomas Munro (#212)
#214Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#213)
#215Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#214)
#216Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#215)
#217Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#216)
#218Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#216)
#219Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#218)
#220Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#219)
#221Jakub Wartak
jakub.wartak@enterprisedb.com
In reply to: Melanie Plageman (#220)
#222Melanie Plageman
melanieplageman@gmail.com
In reply to: Jakub Wartak (#221)
#223Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#218)
#224Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#223)
#225Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#224)
#226Jakub Wartak
jakub.wartak@enterprisedb.com
In reply to: Melanie Plageman (#222)
#227Melanie Plageman
melanieplageman@gmail.com
In reply to: Jakub Wartak (#226)
#228Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#227)
#229Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#228)
#230Jakub Wartak
jakub.wartak@enterprisedb.com
In reply to: Melanie Plageman (#229)
#231Alexander Lakhin
exclusion@gmail.com
In reply to: Melanie Plageman (#224)
#232Andres Freund
andres@anarazel.de
In reply to: Alexander Lakhin (#231)
#233Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#232)
#234Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#233)
#235Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#234)
#236Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#235)
#237Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#236)
#238Thomas Munro
thomas.munro@gmail.com
In reply to: Melanie Plageman (#183)
#239James Hunter
james.hunter.pg@gmail.com
In reply to: Thomas Munro (#238)
#240Thomas Munro
thomas.munro@gmail.com
In reply to: James Hunter (#239)
#241James Hunter
james.hunter.pg@gmail.com
In reply to: Thomas Munro (#240)
#242Thomas Munro
thomas.munro@gmail.com
In reply to: James Hunter (#241)
#243James Hunter
james.hunter.pg@gmail.com
In reply to: Thomas Munro (#242)
#244Andres Freund
andres@anarazel.de
In reply to: James Hunter (#243)
#245Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#242)
#246James Hunter
james.hunter.pg@gmail.com
In reply to: Andres Freund (#244)
#247Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#245)
#248Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#247)