Batch TIDs lookup in ambulkdelete

Started by Masahiko Sawada9 months ago13 messages
#1Masahiko Sawada
sawada.mshk@gmail.com
3 attachment(s)

Hi all,

During index bulk-deletion in lazy vacuum, we currently check the
deletability of each index tuple individually using the
vac_tid_reaped() function. The attached proof-of-concept patches
propose batching multiple TID lookups for deletability checks to
reduce overhead. This optimization aims to minimize redundant function
calls and repeated TidStore entry retrievals for TIDs on the same
page. I have conducted benchmarks across several scenarios to evaluate
the performance impact.

# Case-1 (btree tuples are regular tuples and dead TIDs are concentrated):

create unlogged table test (c int) with (autovacuum_enabled = off);
insert into test select generate_series(1, ${NROWS});
create index on test (c);
delete from test where c < ${NROWS} * 0.3;

# Case-2 (btree tuples are regular tuples and dead TIDs are sparsed):

create unlogged table test (c int) with (autovacuum_enabled = off);
insert into test select generate_series(1, ${NROWS});
create index on test (c);
delete from test where random() < 0.3;

# Case-3 (btree tuples are deduplicated tuples):

create unlogged table test (c int) with (autovacuum_enabled = off);
insert into test select c % 1000 from generate_series(1, ${NROWS}) c;
create index on test (c);
select pg_relation_size('test') / 8192 as relpages \gset
delete from test where (ctid::text::point)[0] < ((:'relpages')::int * 0.3);

# Case-4 (btree tuples are deduplicated tuples and table is clustered):

create unlogged table test (c int) with (autovacuum_enabled = off);
insert into test select c % 1000 from generate_series(1, ${NROWS}) c;
create index on test (c);
cluster test using test_c_idx;
select pg_relation_size('test') / 8192 as relpages \gset
delete from test where (ctid::text::point)[0] < ((:'relpages')::int * 0.3);

# Case-5 (creating btree index on UUID column)

create unlogged table test (c uuid) with (autovacuum_enabled = off);
insert into test select uuidv4() from generate_series(1, ${NROWS}) c;
create index on test (c);
select pg_relation_size('test') / 8192 as relpages \gset
delete from test where (ctid::text::point)[0] < ((:'relpages')::int * 0.3);

Here are the results (NROWS = 50000000):

HEAD PATCHED DIFF
case-1: 3,021 ms 2.818 ms 93.29%
case-2: 5, 697 ms 5.545 ms 97.34%
case-3: 2,833 ms 2.790 ms 98.48%
case-4: 2,564 ms 2.279 ms 88.86%
case-5: 4,657 ms 4.706 ms 101.04%

I've measured 6 ~ 11% improvement in btree bulk-deletion.

Here are the summary of each attached patch:

0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
for multiple TIDs in one function call. If the given TIDs are sorted
(at least in block number), we can save radix tree lookup for the same
page entry.

0002: Convert IndexBUlkDeleteCallback() to batched operation.

0003: Use batch TIDs lookup in btree index bulk-deletion.

In patch 0003, we implement batch TID lookups for both each
deduplicated index tuple and remaining all regular index tuples, which
seems to be the most straightforward approach. While further
optimizations are possible, such as performing batch TID lookups for
all index tuples on a single page, these could introduce additional
overhead from sorting and re-sorting TIDs. Moreover, considering that
radix tree lookups are relatively inexpensive, the benefits of sorting
TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
these potential optimizations warrant further evaluation to determine
their actual impact on performance.

Also, the patch includes the batch TIDs lookup support only for btree
indexes but we potentially can improve other index AMs too.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0002-Convert-IndexBulkDeleteCallback-to-process-TIDs-i.patchapplication/x-patch; name=v1-0002-Convert-IndexBulkDeleteCallback-to-process-TIDs-i.patchDownload
From e693adff50f09fe791c87a444950b193b642aabb Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Apr 2025 22:41:33 -0700
Subject: [PATCH v1 2/3] Convert IndexBulkDeleteCallback to process TIDs in
 batch.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
 contrib/bloom/blvacuum.c              |  4 +++-
 src/backend/access/gin/ginvacuum.c    |  3 ++-
 src/backend/access/gist/gistvacuum.c  |  3 ++-
 src/backend/access/hash/hash.c        |  3 ++-
 src/backend/access/nbtree/nbtree.c    |  4 ++--
 src/backend/access/spgist/spgvacuum.c | 13 ++++++++-----
 src/backend/catalog/index.c           | 21 ++++++++++++++-------
 src/backend/commands/vacuum.c         |  8 ++++----
 src/include/access/genam.h            |  3 ++-
 9 files changed, 39 insertions(+), 23 deletions(-)

diff --git a/contrib/bloom/blvacuum.c b/contrib/bloom/blvacuum.c
index 86b15a75f6f..93089456855 100644
--- a/contrib/bloom/blvacuum.c
+++ b/contrib/bloom/blvacuum.c
@@ -83,8 +83,10 @@ blbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 									OffsetNumberNext(BloomPageGetMaxOffset(page)));
 		while (itup < itupEnd)
 		{
+			bool	dead;
+
 			/* Do we have to delete this tuple? */
-			if (callback(&itup->heapPtr, callback_state))
+			if (callback(&itup->heapPtr, 1, &dead, callback_state) > 0)
 			{
 				/* Yes; adjust count of tuples that will be left on page */
 				BloomPageGetOpaque(page)->maxoff--;
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index fbbe3a6dd70..336f100261b 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -56,7 +56,8 @@ ginVacuumItemPointers(GinVacuumState *gvs, ItemPointerData *items,
 	 */
 	for (i = 0; i < nitem; i++)
 	{
-		if (gvs->callback(items + i, gvs->callback_state))
+		bool	deletable;
+		if (gvs->callback(items + i, 1, &deletable, gvs->callback_state) > 0)
 		{
 			gvs->result->tuples_removed += 1;
 			if (!tmpitems)
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index 6a359c98c60..0abefb7b664 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -383,8 +383,9 @@ restart:
 			{
 				ItemId		iid = PageGetItemId(page, off);
 				IndexTuple	idxtuple = (IndexTuple) PageGetItem(page, iid);
+				bool	deletable;
 
-				if (callback(&(idxtuple->t_tid), callback_state))
+				if (callback(&(idxtuple->t_tid), 1, &deletable, callback_state) > 0)
 					todelete[ntodelete++] = off;
 			}
 		}
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 53061c819fb..e0fffadf698 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -734,6 +734,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf,
 			IndexTuple	itup;
 			Bucket		bucket;
 			bool		kill_tuple = false;
+			bool		dead;
 
 			itup = (IndexTuple) PageGetItem(page,
 											PageGetItemId(page, offno));
@@ -743,7 +744,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf,
 			 * To remove the dead tuples, we strictly want to rely on results
 			 * of callback function.  refer btvacuumpage for detailed reason.
 			 */
-			if (callback && callback(htup, callback_state))
+			if (callback && callback(htup, 1, &dead, callback_state) > 0)
 			{
 				kill_tuple = true;
 				if (tuples_removed)
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index accc7fe8bbe..821e16d5691 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -1486,7 +1486,7 @@ backtrack:
 				if (!BTreeTupleIsPosting(itup))
 				{
 					/* Regular tuple, standard table TID representation */
-					if (callback(&itup->t_tid, callback_state))
+					if (callback(&itup->t_tid, 1, NULL, callback_state) > 0)
 					{
 						deletable[ndeletable++] = offnum;
 						nhtidsdead++;
@@ -1670,7 +1670,7 @@ btreevacuumposting(BTVacState *vstate, IndexTuple posting,
 
 	for (int i = 0; i < nitem; i++)
 	{
-		if (!vstate->callback(items + i, vstate->callback_state))
+		if (vstate->callback(items + i, 1, NULL, vstate->callback_state) == 0)
 		{
 			/* Live table TID */
 			live++;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 81171f35451..a9a61fa152e 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -153,9 +153,11 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
 										   PageGetItemId(page, i));
 		if (lt->tupstate == SPGIST_LIVE)
 		{
+			bool	dead;
+
 			Assert(ItemPointerIsValid(&lt->heapPtr));
 
-			if (bds->callback(&lt->heapPtr, bds->callback_state))
+			if (bds->callback(&lt->heapPtr, 1, &dead, bds->callback_state) > 0)
 			{
 				bds->stats->tuples_removed += 1;
 				deletable[i] = true;
@@ -425,9 +427,10 @@ vacuumLeafRoot(spgBulkDeleteState *bds, Relation index, Buffer buffer)
 										   PageGetItemId(page, i));
 		if (lt->tupstate == SPGIST_LIVE)
 		{
+			bool	dead;
 			Assert(ItemPointerIsValid(&lt->heapPtr));
 
-			if (bds->callback(&lt->heapPtr, bds->callback_state))
+			if (bds->callback(&lt->heapPtr, 1, &dead, bds->callback_state) > 0)
 			{
 				bds->stats->tuples_removed += 1;
 				toDelete[xlrec.nDelete] = i;
@@ -965,10 +968,10 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 }
 
 /* Dummy callback to delete no tuples during spgvacuumcleanup */
-static bool
-dummy_callback(ItemPointer itemptr, void *state)
+static int
+dummy_callback(ItemPointer itemptrs, int nitem, bool *deletable, void *state)
 {
-	return false;
+	return 0;
 }
 
 /*
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 739a92bdcc1..fc40317728f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -126,7 +126,8 @@ static void index_update_stats(Relation rel,
 static void IndexCheckExclusion(Relation heapRelation,
 								Relation indexRelation,
 								IndexInfo *indexInfo);
-static bool validate_index_callback(ItemPointer itemptr, void *opaque);
+static int validate_index_callback(ItemPointer itemptrs, int nitem, bool *deletable,
+								   void *opaque);
 static bool ReindexIsCurrentlyProcessingIndex(Oid indexOid);
 static void SetReindexProcessing(Oid heapOid, Oid indexOid);
 static void ResetReindexProcessing(void);
@@ -3479,15 +3480,21 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 /*
  * validate_index_callback - bulkdelete callback to collect the index TIDs
  */
-static bool
-validate_index_callback(ItemPointer itemptr, void *opaque)
+static int
+validate_index_callback(ItemPointer itemptrs, int nitem, bool *deletable,
+						void *opaque)
 {
 	ValidateIndexState *state = (ValidateIndexState *) opaque;
-	int64		encoded = itemptr_encode(itemptr);
 
-	tuplesort_putdatum(state->tuplesort, Int64GetDatum(encoded), false);
-	state->itups += 1;
-	return false;				/* never actually delete anything */
+	for (int i = 0; i < nitem; i++)
+	{
+		int64		encoded = itemptr_encode(&(itemptrs[i]));
+
+		tuplesort_putdatum(state->tuplesort, Int64GetDatum(encoded), false);
+	}
+	state->itups += nitem;
+
+	return 0;				/* never actually delete anything */
 }
 
 /*
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 33a33bf6b1c..cc8d458ea89 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -127,7 +127,7 @@ static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params,
 					   BufferAccessStrategy bstrategy);
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
-static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static int vac_tid_reaped(ItemPointer itemptrs, int nitem, bool *deletable, void *state);
 
 /*
  * GUC check function to ensure GUC value specified is within the allowable
@@ -2654,10 +2654,10 @@ vac_cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
  *
  *		This has the right signature to be an IndexBulkDeleteCallback.
  */
-static bool
-vac_tid_reaped(ItemPointer itemptr, void *state)
+static int
+vac_tid_reaped(ItemPointer itemptrs, int nitem, bool *deletable, void *state)
 {
 	TidStore   *dead_items = (TidStore *) state;
 
-	return TidStoreIsMember(dead_items, itemptr);
+	return TidStoreIsMemberMulti(dead_items, itemptrs, nitem, deletable);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..6f559a1209d 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -107,7 +107,8 @@ typedef struct IndexBulkDeleteResult
 } IndexBulkDeleteResult;
 
 /* Typedef for callback function to determine if a tuple is bulk-deletable */
-typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, void *state);
+typedef int (*IndexBulkDeleteCallback) (ItemPointer itemptr, int nitem,
+										bool *deletable, void *state);
 
 /* struct definitions appear in relscan.h */
 typedef struct IndexScanDescData *IndexScanDesc;
-- 
2.43.5

v1-0003-Use-batch-TIDs-lookup-in-btree-index-bulk-deletio.patchapplication/x-patch; name=v1-0003-Use-batch-TIDs-lookup-in-btree-index-bulk-deletio.patchDownload
From ba4e7dc1f06f3dceab896b6ee4bf68ff23db1c09 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 22 Apr 2025 12:25:11 -0700
Subject: [PATCH v1 3/3] Use batch TIDs lookup in btree index bulk-deletion.

TIDs in the postlist are sorted. But TIDs of the gathered regular
index tuples are not sorted.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
 src/backend/access/nbtree/nbtree.c | 107 +++++++++++++++++++----------
 1 file changed, 71 insertions(+), 36 deletions(-)

diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 821e16d5691..c92efecc076 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -23,6 +23,7 @@
 #include "access/stratnum.h"
 #include "commands/progress.h"
 #include "commands/vacuum.h"
+#include "common/int.h"
 #include "nodes/execnodes.h"
 #include "pgstat.h"
 #include "storage/bulk_write.h"
@@ -1309,6 +1310,13 @@ btvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 		IndexFreeSpaceMapVacuum(rel);
 }
 
+/* qsort comparator for sorting OffsetNumbers */
+static int
+cmpOffsetNumbers(const void *a, const void *b)
+{
+	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
+}
+
 /*
  * btvacuumpage --- VACUUM one page
  *
@@ -1472,6 +1480,10 @@ backtrack:
 		nhtidslive = 0;
 		if (callback)
 		{
+			OffsetNumber workbuf_offs[MaxIndexTuplesPerPage];
+			ItemPointerData workbuf_htids[MaxIndexTuplesPerPage];
+			int			workbuf_nitem = 0;
+
 			/* btbulkdelete callback tells us what to delete (or update) */
 			for (offnum = minoff;
 				 offnum <= maxoff;
@@ -1485,14 +1497,13 @@ backtrack:
 				Assert(!BTreeTupleIsPivot(itup));
 				if (!BTreeTupleIsPosting(itup))
 				{
-					/* Regular tuple, standard table TID representation */
-					if (callback(&itup->t_tid, 1, NULL, callback_state) > 0)
-					{
-						deletable[ndeletable++] = offnum;
-						nhtidsdead++;
-					}
-					else
-						nhtidslive++;
+					/*
+					 * Regular tuple, standard table TID representation. Will
+					 * verify them as a whole later.
+					 */
+					workbuf_offs[workbuf_nitem] = offnum;
+					workbuf_htids[workbuf_nitem] = itup->t_tid;
+					workbuf_nitem++;
 				}
 				else
 				{
@@ -1539,6 +1550,38 @@ backtrack:
 					nhtidslive += nremaining;
 				}
 			}
+
+			if (workbuf_nitem > 0)
+			{
+				bool		workbuf_deletable[MaxIndexTuplesPerPage];
+				bool		need_sort;
+				int			ndels;
+
+				/*
+				 * We will sort the deletable array if there are existing
+				 * offsets as it should be sorted in ascending order (see
+				 * _bt_delitems_vacuum()).
+				 */
+				need_sort = (ndeletable > 0);
+
+				ndels = callback(workbuf_htids, workbuf_nitem, workbuf_deletable,
+								 callback_state);
+				if (ndels > 0)
+				{
+					for (int i = 0; i < workbuf_nitem; i++)
+					{
+						if (workbuf_deletable[i])
+							deletable[ndeletable++] = workbuf_offs[i];
+					}
+
+					if (need_sort)
+						qsort(deletable, ndeletable, sizeof(OffsetNumber), cmpOffsetNumbers);
+
+					nhtidsdead += ndels;
+				}
+
+				nhtidslive += workbuf_nitem - ndels;
+			}
 		}
 
 		/*
@@ -1663,43 +1706,35 @@ static BTVacuumPosting
 btreevacuumposting(BTVacState *vstate, IndexTuple posting,
 				   OffsetNumber updatedoffset, int *nremaining)
 {
-	int			live = 0;
+	int			ndeletable;
 	int			nitem = BTreeTupleGetNPosting(posting);
 	ItemPointer items = BTreeTupleGetPosting(posting);
+	bool		deletable[MaxIndexTuplesPerPage];
 	BTVacuumPosting vacposting = NULL;
 
-	for (int i = 0; i < nitem; i++)
+	ndeletable = vstate->callback(items, nitem, deletable, vstate->callback_state);
+
+	/* All items are live */
+	if (ndeletable == 0)
 	{
-		if (vstate->callback(items + i, 1, NULL, vstate->callback_state) == 0)
-		{
-			/* Live table TID */
-			live++;
-		}
-		else if (vacposting == NULL)
-		{
-			/*
-			 * First dead table TID encountered.
-			 *
-			 * It's now clear that we need to delete one or more dead table
-			 * TIDs, so start maintaining metadata describing how to update
-			 * existing posting list tuple.
-			 */
-			vacposting = palloc(offsetof(BTVacuumPostingData, deletetids) +
-								nitem * sizeof(uint16));
+		*nremaining = nitem;
+		return NULL;
+	}
 
-			vacposting->itup = posting;
-			vacposting->updatedoffset = updatedoffset;
-			vacposting->ndeletedtids = 0;
-			vacposting->deletetids[vacposting->ndeletedtids++] = i;
-		}
-		else
-		{
-			/* Second or subsequent dead table TID */
+	vacposting = palloc(offsetof(BTVacuumPostingData, deletetids) +
+						ndeletable * sizeof(uint16));
+	vacposting->itup = posting;
+	vacposting->updatedoffset = updatedoffset;
+	vacposting->ndeletedtids = 0;
+
+	for (int i = 0; i < nitem; i++)
+	{
+		if (deletable[i])
 			vacposting->deletetids[vacposting->ndeletedtids++] = i;
-		}
 	}
+	Assert(ndeletable == vacposting->ndeletedtids);
 
-	*nremaining = live;
+	*nremaining = nitem - ndeletable;
 	return vacposting;
 }
 
-- 
2.43.5

v1-0001-tidstore.c-introduce-TidStoreIsMemberMulti.patchapplication/x-patch; name=v1-0001-tidstore.c-introduce-TidStoreIsMemberMulti.patchDownload
From 236764cdfc211e04e9404ee8a38bd0d4fac9b823 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Apr 2025 22:40:57 -0700
Subject: [PATCH v1 1/3] tidstore.c: introduce TidStoreIsMemberMulti.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
 src/backend/access/common/tidstore.c | 63 +++++++++++++++++++++++-----
 src/include/access/tidstore.h        |  2 +
 2 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/common/tidstore.c b/src/backend/access/common/tidstore.c
index 5bd75fb499c..c3ee9db30cc 100644
--- a/src/backend/access/common/tidstore.c
+++ b/src/backend/access/common/tidstore.c
@@ -416,20 +416,11 @@ TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets,
 		local_ts_set(ts->tree.local, blkno, page);
 }
 
-/* Return true if the given TID is present in the TidStore */
-bool
-TidStoreIsMember(TidStore *ts, ItemPointer tid)
+static pg_attribute_always_inline int
+tidstore_is_member_page(BlocktableEntry *page, OffsetNumber off)
 {
 	int			wordnum;
 	int			bitnum;
-	BlocktableEntry *page;
-	BlockNumber blk = ItemPointerGetBlockNumber(tid);
-	OffsetNumber off = ItemPointerGetOffsetNumber(tid);
-
-	if (TidStoreIsShared(ts))
-		page = shared_ts_find(ts->tree.shared, blk);
-	else
-		page = local_ts_find(ts->tree.local, blk);
 
 	/* no entry for the blk */
 	if (page == NULL)
@@ -458,6 +449,56 @@ TidStoreIsMember(TidStore *ts, ItemPointer tid)
 	}
 }
 
+/* Return true if the given TID is present in the TidStore */
+bool
+TidStoreIsMember(TidStore *ts, ItemPointer tid)
+{
+	BlocktableEntry *page;
+	BlockNumber blk = ItemPointerGetBlockNumber(tid);
+	OffsetNumber off = ItemPointerGetOffsetNumber(tid);
+
+	if (TidStoreIsShared(ts))
+		page = shared_ts_find(ts->tree.shared, blk);
+	else
+		page = local_ts_find(ts->tree.local, blk);
+
+	return tidstore_is_member_page(page, off);
+}
+
+/*
+ * Batched operation of TidStoreIsMember().
+ */
+int
+TidStoreIsMemberMulti(TidStore *ts, ItemPointer tids, int ntids, bool *ismembers)
+{
+	BlocktableEntry *page = NULL;
+	BlockNumber last_blk = InvalidBlockNumber;
+	int		nmembers = 0;
+
+	for (int i = 0; i < ntids; i++)
+	{
+		ItemPointer	tid = &(tids[i]);
+		BlockNumber	blk = ItemPointerGetBlockNumber(tid);
+		OffsetNumber off = ItemPointerGetOffsetNumber(tid);
+
+		if (blk != last_blk)
+		{
+			if (TidStoreIsShared(ts))
+				page = shared_ts_find(ts->tree.shared, blk);
+			else
+				page = local_ts_find(ts->tree.local, blk);
+		}
+
+		ismembers[i] = tidstore_is_member_page(page, off);
+		if (ismembers[i])
+			nmembers++;
+
+		last_blk = blk;
+	}
+
+	return nmembers;
+}
+
 /*
  * Prepare to iterate through a TidStore.
  *
diff --git a/src/include/access/tidstore.h b/src/include/access/tidstore.h
index 041091df278..c3545d3d0fd 100644
--- a/src/include/access/tidstore.h
+++ b/src/include/access/tidstore.h
@@ -41,6 +41,8 @@ extern void TidStoreDestroy(TidStore *ts);
 extern void TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets,
 									int num_offsets);
 extern bool TidStoreIsMember(TidStore *ts, ItemPointer tid);
+extern int TidStoreIsMemberMulti(TidStore *ts, ItemPointer tids, int ntids,
+								 bool *ismembers);
 extern TidStoreIter *TidStoreBeginIterate(TidStore *ts);
 extern TidStoreIterResult *TidStoreIterateNext(TidStoreIter *iter);
 extern int	TidStoreGetBlockOffsets(TidStoreIterResult *result,
-- 
2.43.5

#2Matheus Alcantara
matheusssilv97@gmail.com
In reply to: Masahiko Sawada (#1)
Re: Batch TIDs lookup in ambulkdelete

Hi,

On 30/04/25 19:36, Masahiko Sawada wrote:

Here are the summary of each attached patch:

0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
for multiple TIDs in one function call. If the given TIDs are sorted
(at least in block number), we can save radix tree lookup for the same
page entry.

0002: Convert IndexBUlkDeleteCallback() to batched operation.

0003: Use batch TIDs lookup in btree index bulk-deletion.

In patch 0003, we implement batch TID lookups for both each
deduplicated index tuple and remaining all regular index tuples, which
seems to be the most straightforward approach. While further
optimizations are possible, such as performing batch TID lookups for
all index tuples on a single page, these could introduce additional
overhead from sorting and re-sorting TIDs. Moreover, considering that
radix tree lookups are relatively inexpensive, the benefits of sorting
TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
these potential optimizations warrant further evaluation to determine
their actual impact on performance.

Also, the patch includes the batch TIDs lookup support only for btree
indexes but we potentially can improve other index AMs too.

The code looks good and also +1 for the idea. I just have some small
points:
- Maybe it would be good to mention somewhere that
IndexBulkDeleteCallback() callback returns the number of tids
members found on TidStore?
- The vac_tid_reaped() docs may need to be updated?

I also executed meson tests for each patch individually and the 0002
patch is not passing on my machine (MacOs).

Ok: 39
Expected Fail: 0
Fail: 271
Unexpected Pass: 0
Skipped: 22
Timeout: 0

One behaviour that I found by executing the 0002 tests is that it may be
leaking some shared memory segments. I notice that because after
executing the tests I tried to re-execute based on master and all tests
were failing with the "Failed system call was shmget(key=97530599,
size=56, 03600)" error. I also checked the shared memory segments using
"ipcs -m" and it returns some segments which is don't returned when I
execute the tests on master (after cleaning the leaked memory segments)
and it also doesn't occur when executing based on 0001 or 0003.

~/d/p/batch-tids-lookup-ambulkdelete ❯❯❯ ipcs -m
IPC status from <running system> as of Tue May 13 18:19:14 -03 2025
T ID KEY MODE OWNER GROUP
Shared Memory:
m 18087936 0x05f873bf --rw------- matheus staff
m 15925250 0x05f966fe --rw------- matheus staff
m 24248325 0x05f9677e --rw------- matheus staff
....

Note that the the 0003 patch don't have this issue so at the end we will
not have problem with this I think, but it maybe be good to mention that
although the patches are separate, there is a dependency between them,
which may cause issues on buildfarm?

--
Matheus Alcantara

#3John Naylor
johncnaylorls@gmail.com
In reply to: Masahiko Sawada (#1)
Re: Batch TIDs lookup in ambulkdelete

On Thu, May 1, 2025 at 5:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

HEAD PATCHED DIFF
case-1: 3,021 ms 2.818 ms 93.29%
case-2: 5, 697 ms 5.545 ms 97.34%
case-3: 2,833 ms 2.790 ms 98.48%
case-4: 2,564 ms 2.279 ms 88.86%
case-5: 4,657 ms 4.706 ms 101.04%

1 and 4 look significant -- do the other cases have reproducible
differences or is it just noise?

Here are the summary of each attached patch:

0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
for multiple TIDs in one function call. If the given TIDs are sorted
(at least in block number), we can save radix tree lookup for the same
page entry.

My only comment is that TidStoreIsMember() is now unused in core (or
maybe just the tests?). It seems like we could just change the API for
it rather than introduce a new function?

0003: Use batch TIDs lookup in btree index bulk-deletion.

In patch 0003, we implement batch TID lookups for both each
deduplicated index tuple and remaining all regular index tuples, which
seems to be the most straightforward approach.

Seems like a good approach. btvacuumpage() needs to sort if there is a
mix of posting tuples and regular index tuples. Was that covered by
any of the tests above?

While further
optimizations are possible, such as performing batch TID lookups for
all index tuples on a single page, these could introduce additional
overhead from sorting and re-sorting TIDs. Moreover, considering that
radix tree lookups are relatively inexpensive, the benefits of sorting
TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
these potential optimizations warrant further evaluation to determine
their actual impact on performance.

My guess is that always sorting by TID and than back by index tuple
offset is too much overhead to be worth it, but I'm not sure.

--
John Naylor
Amazon Web Services

#4Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Matheus Alcantara (#2)
4 attachment(s)
Re: Batch TIDs lookup in ambulkdelete

On Tue, May 13, 2025 at 2:26 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:

Hi,

On 30/04/25 19:36, Masahiko Sawada wrote:

Here are the summary of each attached patch:

0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
for multiple TIDs in one function call. If the given TIDs are sorted
(at least in block number), we can save radix tree lookup for the same
page entry.

0002: Convert IndexBUlkDeleteCallback() to batched operation.

0003: Use batch TIDs lookup in btree index bulk-deletion.

In patch 0003, we implement batch TID lookups for both each
deduplicated index tuple and remaining all regular index tuples, which
seems to be the most straightforward approach. While further
optimizations are possible, such as performing batch TID lookups for
all index tuples on a single page, these could introduce additional
overhead from sorting and re-sorting TIDs. Moreover, considering that
radix tree lookups are relatively inexpensive, the benefits of sorting
TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
these potential optimizations warrant further evaluation to determine
their actual impact on performance.

Also, the patch includes the batch TIDs lookup support only for btree
indexes but we potentially can improve other index AMs too.

The code looks good and also +1 for the idea. I just have some small
points:
- Maybe it would be good to mention somewhere that
IndexBulkDeleteCallback() callback returns the number of tids
members found on TidStore?
- The vac_tid_reaped() docs may need to be updated?

Thank you for looking at the patches. I agree with the above comments.

I also executed meson tests for each patch individually and the 0002
patch is not passing on my machine (MacOs).

Ok: 39
Expected Fail: 0
Fail: 271
Unexpected Pass: 0
Skipped: 22
Timeout: 0

One behaviour that I found by executing the 0002 tests is that it may be
leaking some shared memory segments. I notice that because after
executing the tests I tried to re-execute based on master and all tests
were failing with the "Failed system call was shmget(key=97530599,
size=56, 03600)" error. I also checked the shared memory segments using
"ipcs -m" and it returns some segments which is don't returned when I
execute the tests on master (after cleaning the leaked memory segments)
and it also doesn't occur when executing based on 0001 or 0003.

~/d/p/batch-tids-lookup-ambulkdelete ❯❯❯ ipcs -m
IPC status from <running system> as of Tue May 13 18:19:14 -03 2025
T ID KEY MODE OWNER GROUP
Shared Memory:
m 18087936 0x05f873bf --rw------- matheus staff
m 15925250 0x05f966fe --rw------- matheus staff
m 24248325 0x05f9677e --rw------- matheus staff
....

Note that the the 0003 patch don't have this issue so at the end we will
not have problem with this I think, but it maybe be good to mention that
although the patches are separate, there is a dependency between them,
which may cause issues on buildfarm?

Thank you for the report. With the 0001 and 0002 patches, I got a
SEGV. I've fixed this issue in the attached updated version patches.
I've confirmed that the patches pass CI tests but I'm not sure it
fixes the shared memory segment leak problem you reported. The
attached patches incorporated the comments[1]/messages/by-id/CANWCAZbiJcwSBCczLfbfiPe1mET+V9PjTZv5VvUBwarLvx1Hfg@mail.gmail.com from John as well.

BTW I found that the constant 'maxblkno' in test_tidstore.sql actually
equals to InvalidBlockNumber, but not MaxBlockNumber. I think it
doesn't make sense that TidStore uses InvalidBlockNumber as the key.
The attached 0001 patch fixes it. I think we can fix it separately on
HEAD as well as back branches.

Regards,

[1]: /messages/by-id/CANWCAZbiJcwSBCczLfbfiPe1mET+V9PjTZv5VvUBwarLvx1Hfg@mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0004-Use-batch-TIDs-lookup-in-btree-index-bulk-deletio.patchapplication/octet-stream; name=v2-0004-Use-batch-TIDs-lookup-in-btree-index-bulk-deletio.patchDownload
From d5fb9a861b77710c7de91b20d5b18066cba1ca73 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 22 Apr 2025 12:25:11 -0700
Subject: [PATCH v2 4/4] Use batch TIDs lookup in btree index bulk-deletion.

TIDs in the postlist are sorted. But TIDs of the gathered regular
index tuples are not sorted.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
 src/backend/access/nbtree/nbtree.c | 109 ++++++++++++++++++-----------
 1 file changed, 70 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 342dad0ef91..6df855874eb 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -23,6 +23,7 @@
 #include "access/stratnum.h"
 #include "commands/progress.h"
 #include "commands/vacuum.h"
+#include "common/int.h"
 #include "nodes/execnodes.h"
 #include "pgstat.h"
 #include "storage/bulk_write.h"
@@ -1310,6 +1311,13 @@ btvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 		IndexFreeSpaceMapVacuum(rel);
 }
 
+/* qsort comparator for sorting OffsetNumbers */
+static int
+cmpOffsetNumbers(const void *a, const void *b)
+{
+	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
+}
+
 /*
  * btvacuumpage --- VACUUM one page
  *
@@ -1473,6 +1481,10 @@ backtrack:
 		nhtidslive = 0;
 		if (callback)
 		{
+			OffsetNumber workbuf_offs[MaxIndexTuplesPerPage];
+			ItemPointerData workbuf_htids[MaxIndexTuplesPerPage];
+			int			workbuf_nitem = 0;
+
 			/* btbulkdelete callback tells us what to delete (or update) */
 			for (offnum = minoff;
 				 offnum <= maxoff;
@@ -1486,16 +1498,13 @@ backtrack:
 				Assert(!BTreeTupleIsPivot(itup));
 				if (!BTreeTupleIsPosting(itup))
 				{
-					bool		dead;
-
-					/* Regular tuple, standard table TID representation */
-					if (callback(&itup->t_tid, 1, &dead, callback_state) > 0)
-					{
-						deletable[ndeletable++] = offnum;
-						nhtidsdead++;
-					}
-					else
-						nhtidslive++;
+					/*
+					 * Regular tuple, standard table TID representation. Will
+					 * verify them as a whole later.
+					 */
+					workbuf_offs[workbuf_nitem] = offnum;
+					workbuf_htids[workbuf_nitem] = itup->t_tid;
+					workbuf_nitem++;
 				}
 				else
 				{
@@ -1542,6 +1551,38 @@ backtrack:
 					nhtidslive += nremaining;
 				}
 			}
+
+			if (workbuf_nitem > 0)
+			{
+				bool		workbuf_deletable[MaxIndexTuplesPerPage];
+				bool		need_sort;
+				int			ndels;
+
+				/*
+				 * We will sort the deletable array if there are existing
+				 * offsets as it should be sorted in ascending order (see
+				 * _bt_delitems_vacuum()).
+				 */
+				need_sort = (ndeletable > 0);
+
+				ndels = callback(workbuf_htids, workbuf_nitem, workbuf_deletable,
+								 callback_state);
+				if (ndels > 0)
+				{
+					for (int i = 0; i < workbuf_nitem; i++)
+					{
+						if (workbuf_deletable[i])
+							deletable[ndeletable++] = workbuf_offs[i];
+					}
+
+					if (need_sort)
+						qsort(deletable, ndeletable, sizeof(OffsetNumber), cmpOffsetNumbers);
+
+					nhtidsdead += ndels;
+				}
+
+				nhtidslive += workbuf_nitem - ndels;
+			}
 		}
 
 		/*
@@ -1666,45 +1707,35 @@ static BTVacuumPosting
 btreevacuumposting(BTVacState *vstate, IndexTuple posting,
 				   OffsetNumber updatedoffset, int *nremaining)
 {
-	int			live = 0;
+	int			ndeletable;
 	int			nitem = BTreeTupleGetNPosting(posting);
 	ItemPointer items = BTreeTupleGetPosting(posting);
+	bool		deletable[MaxIndexTuplesPerPage];
 	BTVacuumPosting vacposting = NULL;
 
-	for (int i = 0; i < nitem; i++)
+	ndeletable = vstate->callback(items, nitem, deletable, vstate->callback_state);
+
+	/* All items are live */
+	if (ndeletable == 0)
 	{
-		bool		dead;
+		*nremaining = nitem;
+		return NULL;
+	}
 
-		if (vstate->callback(items + i, 1, &dead, vstate->callback_state) == 0)
-		{
-			/* Live table TID */
-			live++;
-		}
-		else if (vacposting == NULL)
-		{
-			/*
-			 * First dead table TID encountered.
-			 *
-			 * It's now clear that we need to delete one or more dead table
-			 * TIDs, so start maintaining metadata describing how to update
-			 * existing posting list tuple.
-			 */
-			vacposting = palloc(offsetof(BTVacuumPostingData, deletetids) +
-								nitem * sizeof(uint16));
+	vacposting = palloc(offsetof(BTVacuumPostingData, deletetids) +
+						ndeletable * sizeof(uint16));
+	vacposting->itup = posting;
+	vacposting->updatedoffset = updatedoffset;
+	vacposting->ndeletedtids = 0;
 
-			vacposting->itup = posting;
-			vacposting->updatedoffset = updatedoffset;
-			vacposting->ndeletedtids = 0;
-			vacposting->deletetids[vacposting->ndeletedtids++] = i;
-		}
-		else
-		{
-			/* Second or subsequent dead table TID */
+	for (int i = 0; i < nitem; i++)
+	{
+		if (deletable[i])
 			vacposting->deletetids[vacposting->ndeletedtids++] = i;
-		}
 	}
+	Assert(ndeletable == vacposting->ndeletedtids);
 
-	*nremaining = live;
+	*nremaining = nitem - ndeletable;
 	return vacposting;
 }
 
-- 
2.43.5

v2-0001-Fix-maxblkno-constant-to-use-MaxBlockNumber-in-te.patchapplication/octet-stream; name=v2-0001-Fix-maxblkno-constant-to-use-MaxBlockNumber-in-te.patchDownload
From 288cffa984e5a5a4350becd6ef91709bb61d78c3 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 5 Jun 2025 16:28:06 -0700
Subject: [PATCH v2 1/4] Fix maxblkno constant to use MaxBlockNumber in
 test_tidstore.sql.

---
 src/test/modules/test_tidstore/expected/test_tidstore.out | 2 +-
 src/test/modules/test_tidstore/sql/test_tidstore.sql      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/test/modules/test_tidstore/expected/test_tidstore.out b/src/test/modules/test_tidstore/expected/test_tidstore.out
index cbcacfd26e1..13993bbfe7b 100644
--- a/src/test/modules/test_tidstore/expected/test_tidstore.out
+++ b/src/test/modules/test_tidstore/expected/test_tidstore.out
@@ -2,7 +2,7 @@ CREATE EXTENSION test_tidstore;
 -- To hide the output of do_set_block_offsets()
 CREATE TEMP TABLE hideblocks(blockno bigint);
 -- Constant values used in the tests.
-\set maxblkno 4294967295
+\set maxblkno 0xFFFFFFFE
 -- The maximum number of heap tuples (MaxHeapTuplesPerPage) in 8kB block is 291.
 -- We use a higher number to test tidstore.
 \set maxoffset 512
diff --git a/src/test/modules/test_tidstore/sql/test_tidstore.sql b/src/test/modules/test_tidstore/sql/test_tidstore.sql
index a29e4ec1c55..86770ba316f 100644
--- a/src/test/modules/test_tidstore/sql/test_tidstore.sql
+++ b/src/test/modules/test_tidstore/sql/test_tidstore.sql
@@ -4,7 +4,7 @@ CREATE EXTENSION test_tidstore;
 CREATE TEMP TABLE hideblocks(blockno bigint);
 
 -- Constant values used in the tests.
-\set maxblkno 4294967295
+\set maxblkno 0xFFFFFFFE
 -- The maximum number of heap tuples (MaxHeapTuplesPerPage) in 8kB block is 291.
 -- We use a higher number to test tidstore.
 \set maxoffset 512
-- 
2.43.5

v2-0002-tidstore.c-introduce-TidStoreIsMemberMulti.patchapplication/octet-stream; name=v2-0002-tidstore.c-introduce-TidStoreIsMemberMulti.patchDownload
From 5e5a089bc14eb29df8b16da8ebec38344f7fad1c Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Apr 2025 22:40:57 -0700
Subject: [PATCH v2 2/4] tidstore.c: introduce TidStoreIsMemberMulti.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
 src/backend/access/common/tidstore.c          | 62 +++++++++++++------
 src/backend/commands/vacuum.c                 |  3 +-
 src/include/access/tidstore.h                 |  3 +-
 .../modules/test_tidstore/test_tidstore.c     | 11 +++-
 4 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/common/tidstore.c b/src/backend/access/common/tidstore.c
index 5bd75fb499c..270ef260630 100644
--- a/src/backend/access/common/tidstore.c
+++ b/src/backend/access/common/tidstore.c
@@ -416,20 +416,11 @@ TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets,
 		local_ts_set(ts->tree.local, blkno, page);
 }
 
-/* Return true if the given TID is present in the TidStore */
-bool
-TidStoreIsMember(TidStore *ts, ItemPointer tid)
+static pg_attribute_always_inline int
+tidstore_is_member_page(BlocktableEntry *page, OffsetNumber off)
 {
 	int			wordnum;
 	int			bitnum;
-	BlocktableEntry *page;
-	BlockNumber blk = ItemPointerGetBlockNumber(tid);
-	OffsetNumber off = ItemPointerGetOffsetNumber(tid);
-
-	if (TidStoreIsShared(ts))
-		page = shared_ts_find(ts->tree.shared, blk);
-	else
-		page = local_ts_find(ts->tree.local, blk);
 
 	/* no entry for the blk */
 	if (page == NULL)
@@ -443,19 +434,54 @@ TidStoreIsMember(TidStore *ts, ItemPointer tid)
 			if (page->header.full_offsets[i] == off)
 				return true;
 		}
+
 		return false;
 	}
-	else
+
+	wordnum = WORDNUM(off);
+	bitnum = BITNUM(off);
+
+	/* no bitmap for the off */
+	if (wordnum >= page->header.nwords)
+		return false;
+
+	return (page->words[wordnum] & ((bitmapword) 1 << bitnum)) != 0;
+}
+
+/*
+ * Check if the given TIDs are present in the TidStore. Return the number
+ * of TIDs found in the TidStore. *ismember must have enough space for
+ * up to 'ntids' elements.
+ */
+int
+TidStoreIsMemberMulti(TidStore *ts, ItemPointer tids, int ntids, bool *ismembers)
+{
+	BlocktableEntry *page = NULL;
+	BlockNumber last_blk = InvalidBlockNumber;
+	int			nmembers = 0;
+
+	for (int i = 0; i < ntids; i++)
 	{
-		wordnum = WORDNUM(off);
-		bitnum = BITNUM(off);
+		ItemPointer tid = &(tids[i]);
+		BlockNumber blk = ItemPointerGetBlockNumber(tid);
+		OffsetNumber off = ItemPointerGetOffsetNumber(tid);
 
-		/* no bitmap for the off */
-		if (wordnum >= page->header.nwords)
-			return false;
+		if (blk != last_blk)
+		{
+			if (TidStoreIsShared(ts))
+				page = shared_ts_find(ts->tree.shared, blk);
+			else
+				page = local_ts_find(ts->tree.local, blk);
+		}
 
-		return (page->words[wordnum] & ((bitmapword) 1 << bitnum)) != 0;
+		ismembers[i] = tidstore_is_member_page(page, off);
+		if (ismembers[i])
+			nmembers++;
+
+		last_blk = blk;
 	}
+
+	return nmembers;
 }
 
 /*
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 33a33bf6b1c..e47b61772b1 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2658,6 +2658,7 @@ static bool
 vac_tid_reaped(ItemPointer itemptr, void *state)
 {
 	TidStore   *dead_items = (TidStore *) state;
+	bool		isdead;
 
-	return TidStoreIsMember(dead_items, itemptr);
+	return TidStoreIsMemberMulti(dead_items, itemptr, 1, &isdead) > 0;
 }
diff --git a/src/include/access/tidstore.h b/src/include/access/tidstore.h
index 041091df278..b6137e78780 100644
--- a/src/include/access/tidstore.h
+++ b/src/include/access/tidstore.h
@@ -40,7 +40,8 @@ extern void TidStoreUnlock(TidStore *ts);
 extern void TidStoreDestroy(TidStore *ts);
 extern void TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets,
 									int num_offsets);
-extern bool TidStoreIsMember(TidStore *ts, ItemPointer tid);
+extern int	TidStoreIsMemberMulti(TidStore *ts, ItemPointer tids, int ntids,
+								  bool *ismembers);
 extern TidStoreIter *TidStoreBeginIterate(TidStore *ts);
 extern TidStoreIterResult *TidStoreIterateNext(TidStoreIter *iter);
 extern int	TidStoreGetBlockOffsets(TidStoreIterResult *result,
diff --git a/src/test/modules/test_tidstore/test_tidstore.c b/src/test/modules/test_tidstore/test_tidstore.c
index eb16e0fbfa6..277d437e237 100644
--- a/src/test/modules/test_tidstore/test_tidstore.c
+++ b/src/test/modules/test_tidstore/test_tidstore.c
@@ -222,16 +222,22 @@ check_set_block_offsets(PG_FUNCTION_ARGS)
 	TidStoreIterResult *iter_result;
 	int			num_iter_tids = 0;
 	int			num_lookup_tids = 0;
+	bool	   *ismembers;
 	BlockNumber prevblkno = 0;
 
 	check_tidstore_available();
 
 	/* lookup each member in the verification array */
+	ismembers = palloc(sizeof(bool) * items.num_tids);
+	TidStoreIsMemberMulti(tidstore, items.insert_tids, items.num_tids, ismembers);
 	for (int i = 0; i < items.num_tids; i++)
-		if (!TidStoreIsMember(tidstore, &items.insert_tids[i]))
+	{
+		if (!ismembers[i])
 			elog(ERROR, "missing TID with block %u, offset %u",
 				 ItemPointerGetBlockNumber(&items.insert_tids[i]),
 				 ItemPointerGetOffsetNumber(&items.insert_tids[i]));
+	}
+	pfree(ismembers);
 
 	/*
 	 * Lookup all possible TIDs for each distinct block in the verification
@@ -248,11 +254,12 @@ check_set_block_offsets(PG_FUNCTION_ARGS)
 		for (OffsetNumber offset = FirstOffsetNumber; offset < MaxOffsetNumber; offset++)
 		{
 			ItemPointerData tid;
+			bool		ismember;
 
 			ItemPointerSet(&tid, blkno, offset);
 
 			TidStoreLockShare(tidstore);
-			if (TidStoreIsMember(tidstore, &tid))
+			if (TidStoreIsMemberMulti(tidstore, &tid, 1, &ismember) > 0)
 				ItemPointerSet(&items.lookup_tids[num_lookup_tids++], blkno, offset);
 			TidStoreUnlock(tidstore);
 		}
-- 
2.43.5

v2-0003-Convert-IndexBulkDeleteCallback-to-process-TIDs-i.patchapplication/octet-stream; name=v2-0003-Convert-IndexBulkDeleteCallback-to-process-TIDs-i.patchDownload
From 4d5eb7a2dc80542cf8c04fecfdce90fa6f553265 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Apr 2025 22:41:33 -0700
Subject: [PATCH v2 3/4] Convert IndexBulkDeleteCallback to process TIDs in
 batch.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
 contrib/bloom/blvacuum.c              |  4 +++-
 src/backend/access/gin/ginvacuum.c    |  3 ++-
 src/backend/access/gist/gistvacuum.c  |  3 ++-
 src/backend/access/hash/hash.c        |  3 ++-
 src/backend/access/nbtree/nbtree.c    |  8 ++++++--
 src/backend/access/spgist/spgvacuum.c | 13 ++++++++-----
 src/backend/catalog/index.c           | 21 ++++++++++++++-------
 src/backend/commands/vacuum.c         | 11 +++++------
 src/include/access/genam.h            |  8 ++++++--
 9 files changed, 48 insertions(+), 26 deletions(-)

diff --git a/contrib/bloom/blvacuum.c b/contrib/bloom/blvacuum.c
index 86b15a75f6f..93089456855 100644
--- a/contrib/bloom/blvacuum.c
+++ b/contrib/bloom/blvacuum.c
@@ -83,8 +83,10 @@ blbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 									OffsetNumberNext(BloomPageGetMaxOffset(page)));
 		while (itup < itupEnd)
 		{
+			bool	dead;
+
 			/* Do we have to delete this tuple? */
-			if (callback(&itup->heapPtr, callback_state))
+			if (callback(&itup->heapPtr, 1, &dead, callback_state) > 0)
 			{
 				/* Yes; adjust count of tuples that will be left on page */
 				BloomPageGetOpaque(page)->maxoff--;
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index fbbe3a6dd70..336f100261b 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -56,7 +56,8 @@ ginVacuumItemPointers(GinVacuumState *gvs, ItemPointerData *items,
 	 */
 	for (i = 0; i < nitem; i++)
 	{
-		if (gvs->callback(items + i, gvs->callback_state))
+		bool	deletable;
+		if (gvs->callback(items + i, 1, &deletable, gvs->callback_state) > 0)
 		{
 			gvs->result->tuples_removed += 1;
 			if (!tmpitems)
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index dca236b6e57..ab2dfa7fa86 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -384,8 +384,9 @@ restart:
 			{
 				ItemId		iid = PageGetItemId(page, off);
 				IndexTuple	idxtuple = (IndexTuple) PageGetItem(page, iid);
+				bool	deletable;
 
-				if (callback(&(idxtuple->t_tid), callback_state))
+				if (callback(&(idxtuple->t_tid), 1, &deletable, callback_state) > 0)
 					todelete[ntodelete++] = off;
 			}
 		}
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 53061c819fb..e0fffadf698 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -734,6 +734,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf,
 			IndexTuple	itup;
 			Bucket		bucket;
 			bool		kill_tuple = false;
+			bool		dead;
 
 			itup = (IndexTuple) PageGetItem(page,
 											PageGetItemId(page, offno));
@@ -743,7 +744,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf,
 			 * To remove the dead tuples, we strictly want to rely on results
 			 * of callback function.  refer btvacuumpage for detailed reason.
 			 */
-			if (callback && callback(htup, callback_state))
+			if (callback && callback(htup, 1, &dead, callback_state) > 0)
 			{
 				kill_tuple = true;
 				if (tuples_removed)
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 765659887af..342dad0ef91 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -1486,8 +1486,10 @@ backtrack:
 				Assert(!BTreeTupleIsPivot(itup));
 				if (!BTreeTupleIsPosting(itup))
 				{
+					bool		dead;
+
 					/* Regular tuple, standard table TID representation */
-					if (callback(&itup->t_tid, callback_state))
+					if (callback(&itup->t_tid, 1, &dead, callback_state) > 0)
 					{
 						deletable[ndeletable++] = offnum;
 						nhtidsdead++;
@@ -1671,7 +1673,9 @@ btreevacuumposting(BTVacState *vstate, IndexTuple posting,
 
 	for (int i = 0; i < nitem; i++)
 	{
-		if (!vstate->callback(items + i, vstate->callback_state))
+		bool		dead;
+
+		if (vstate->callback(items + i, 1, &dead, vstate->callback_state) == 0)
 		{
 			/* Live table TID */
 			live++;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2678f7ab782..1af1dd8ff01 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -153,9 +153,11 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
 										   PageGetItemId(page, i));
 		if (lt->tupstate == SPGIST_LIVE)
 		{
+			bool	dead;
+
 			Assert(ItemPointerIsValid(&lt->heapPtr));
 
-			if (bds->callback(&lt->heapPtr, bds->callback_state))
+			if (bds->callback(&lt->heapPtr, 1, &dead, bds->callback_state) > 0)
 			{
 				bds->stats->tuples_removed += 1;
 				deletable[i] = true;
@@ -425,9 +427,10 @@ vacuumLeafRoot(spgBulkDeleteState *bds, Relation index, Buffer buffer)
 										   PageGetItemId(page, i));
 		if (lt->tupstate == SPGIST_LIVE)
 		{
+			bool	dead;
 			Assert(ItemPointerIsValid(&lt->heapPtr));
 
-			if (bds->callback(&lt->heapPtr, bds->callback_state))
+			if (bds->callback(&lt->heapPtr, 1, &dead, bds->callback_state) > 0)
 			{
 				bds->stats->tuples_removed += 1;
 				toDelete[xlrec.nDelete] = i;
@@ -966,10 +969,10 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 }
 
 /* Dummy callback to delete no tuples during spgvacuumcleanup */
-static bool
-dummy_callback(ItemPointer itemptr, void *state)
+static int
+dummy_callback(ItemPointer itemptrs, int nitem, bool *deletable, void *state)
 {
-	return false;
+	return 0;
 }
 
 /*
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 739a92bdcc1..fc40317728f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -126,7 +126,8 @@ static void index_update_stats(Relation rel,
 static void IndexCheckExclusion(Relation heapRelation,
 								Relation indexRelation,
 								IndexInfo *indexInfo);
-static bool validate_index_callback(ItemPointer itemptr, void *opaque);
+static int validate_index_callback(ItemPointer itemptrs, int nitem, bool *deletable,
+								   void *opaque);
 static bool ReindexIsCurrentlyProcessingIndex(Oid indexOid);
 static void SetReindexProcessing(Oid heapOid, Oid indexOid);
 static void ResetReindexProcessing(void);
@@ -3479,15 +3480,21 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 /*
  * validate_index_callback - bulkdelete callback to collect the index TIDs
  */
-static bool
-validate_index_callback(ItemPointer itemptr, void *opaque)
+static int
+validate_index_callback(ItemPointer itemptrs, int nitem, bool *deletable,
+						void *opaque)
 {
 	ValidateIndexState *state = (ValidateIndexState *) opaque;
-	int64		encoded = itemptr_encode(itemptr);
 
-	tuplesort_putdatum(state->tuplesort, Int64GetDatum(encoded), false);
-	state->itups += 1;
-	return false;				/* never actually delete anything */
+	for (int i = 0; i < nitem; i++)
+	{
+		int64		encoded = itemptr_encode(&(itemptrs[i]));
+
+		tuplesort_putdatum(state->tuplesort, Int64GetDatum(encoded), false);
+	}
+	state->itups += nitem;
+
+	return 0;				/* never actually delete anything */
 }
 
 /*
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index e47b61772b1..f93ea87f1c3 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -127,7 +127,7 @@ static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params,
 					   BufferAccessStrategy bstrategy);
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
-static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static int vac_tid_reaped(ItemPointer itemptrs, int nitem, bool *deletable, void *state);
 
 /*
  * GUC check function to ensure GUC value specified is within the allowable
@@ -2650,15 +2650,14 @@ vac_cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
 }
 
 /*
- *	vac_tid_reaped() -- is a particular tid deletable?
+ *	vac_tid_reaped() -- are the given TIDs deletable?
  *
  *		This has the right signature to be an IndexBulkDeleteCallback.
  */
-static bool
-vac_tid_reaped(ItemPointer itemptr, void *state)
+static int
+vac_tid_reaped(ItemPointer itemptrs, int nitem, bool *deletable, void *state)
 {
 	TidStore   *dead_items = (TidStore *) state;
-	bool		isdead;
 
-	return TidStoreIsMemberMulti(dead_items, itemptr, 1, &isdead) > 0;
+	return TidStoreIsMemberMulti(dead_items, itemptrs, nitem, deletable);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..f96619cf658 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -106,8 +106,12 @@ typedef struct IndexBulkDeleteResult
 	BlockNumber pages_free;		/* # pages available for reuse */
 } IndexBulkDeleteResult;
 
-/* Typedef for callback function to determine if a tuple is bulk-deletable */
-typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, void *state);
+/*
+ * Typedef for callback function to determine if tuples are bulk-deletable.
+ * The function returns the number of deletable tuples.
+ */
+typedef int (*IndexBulkDeleteCallback) (ItemPointer itemptr, int nitem,
+										bool *deletable, void *state);
 
 /* struct definitions appear in relscan.h */
 typedef struct IndexScanDescData *IndexScanDesc;
-- 
2.43.5

#5Masahiko Sawada
sawada.mshk@gmail.com
In reply to: John Naylor (#3)
Re: Batch TIDs lookup in ambulkdelete

On Sun, Jun 1, 2025 at 11:01 PM John Naylor <johncnaylorls@gmail.com> wrote:

On Thu, May 1, 2025 at 5:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

HEAD PATCHED DIFF
case-1: 3,021 ms 2.818 ms 93.29%
case-2: 5, 697 ms 5.545 ms 97.34%
case-3: 2,833 ms 2.790 ms 98.48%
case-4: 2,564 ms 2.279 ms 88.86%
case-5: 4,657 ms 4.706 ms 101.04%

1 and 4 look significant -- do the other cases have reproducible
differences or is it just noise?

These results are the average of 3 times executions, so they are
reproducible differences.

Here are the summary of each attached patch:

0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
for multiple TIDs in one function call. If the given TIDs are sorted
(at least in block number), we can save radix tree lookup for the same
page entry.

My only comment is that TidStoreIsMember() is now unused in core (or
maybe just the tests?). It seems like we could just change the API for
it rather than introduce a new function?

Good point, changed in the latest patch I posted[1]/messages/by-id/CAD21AoAv55DhJ+19zaemx-_eO7z+u4gtFmeADsMBFqtHhyUySQ@mail.gmail.com.

0003: Use batch TIDs lookup in btree index bulk-deletion.

In patch 0003, we implement batch TID lookups for both each
deduplicated index tuple and remaining all regular index tuples, which
seems to be the most straightforward approach.

Seems like a good approach. btvacuumpage() needs to sort if there is a
mix of posting tuples and regular index tuples. Was that covered by
any of the tests above?

Good point, I think this case was not covered. I've measured the
performance with the following queries:

# Case-6:
create unlogged table test (c int) with (autovacuum_enabled = off);
insert into test select (2 * c - 1) from generate_series(1, ${NROWS}) c;
insert into test select c from generate_series(1, ${NROWS}) c;
create index on test (c);
select pg_relation_size('test') / 8192 as relpages \gset
delete from test where c < (${NROWS} * 0.3)::int
vacuum test;

And here is the result:

HEAD PATCHED DIFF
case-6: 3,320 ms 3,617 ms 108.94%

I'll consider how to deal with overheads of sorting TIDs.

While further
optimizations are possible, such as performing batch TID lookups for
all index tuples on a single page, these could introduce additional
overhead from sorting and re-sorting TIDs. Moreover, considering that
radix tree lookups are relatively inexpensive, the benefits of sorting
TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
these potential optimizations warrant further evaluation to determine
their actual impact on performance.

My guess is that always sorting by TID and than back by index tuple
offset is too much overhead to be worth it, but I'm not sure.

Agreed. Given the above test results, it's unlikely always sorting the
array helps speedups.

Regards,

[1]: /messages/by-id/CAD21AoAv55DhJ+19zaemx-_eO7z+u4gtFmeADsMBFqtHhyUySQ@mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In reply to: Masahiko Sawada (#5)
Re: Batch TIDs lookup in ambulkdelete

On Fri, Jun 6, 2025 at 6:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Agreed. Given the above test results, it's unlikely always sorting the
array helps speedups.

Did you try specializing the sort? In my experience, it makes a big difference.

--
Peter Geoghegan

#7Junwang Zhao
zhjwpku@gmail.com
In reply to: Masahiko Sawada (#4)
Re: Batch TIDs lookup in ambulkdelete

Hi Masahiko,

On Sat, Jun 7, 2025 at 5:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, May 13, 2025 at 2:26 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:

Hi,

On 30/04/25 19:36, Masahiko Sawada wrote:

Here are the summary of each attached patch:

0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
for multiple TIDs in one function call. If the given TIDs are sorted
(at least in block number), we can save radix tree lookup for the same
page entry.

0002: Convert IndexBUlkDeleteCallback() to batched operation.

0003: Use batch TIDs lookup in btree index bulk-deletion.

In patch 0003, we implement batch TID lookups for both each
deduplicated index tuple and remaining all regular index tuples, which
seems to be the most straightforward approach. While further
optimizations are possible, such as performing batch TID lookups for
all index tuples on a single page, these could introduce additional
overhead from sorting and re-sorting TIDs. Moreover, considering that
radix tree lookups are relatively inexpensive, the benefits of sorting
TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
these potential optimizations warrant further evaluation to determine
their actual impact on performance.

Also, the patch includes the batch TIDs lookup support only for btree
indexes but we potentially can improve other index AMs too.

The code looks good and also +1 for the idea. I just have some small
points:
- Maybe it would be good to mention somewhere that
IndexBulkDeleteCallback() callback returns the number of tids
members found on TidStore?
- The vac_tid_reaped() docs may need to be updated?

Thank you for looking at the patches. I agree with the above comments.

I also executed meson tests for each patch individually and the 0002
patch is not passing on my machine (MacOs).

Ok: 39
Expected Fail: 0
Fail: 271
Unexpected Pass: 0
Skipped: 22
Timeout: 0

One behaviour that I found by executing the 0002 tests is that it may be
leaking some shared memory segments. I notice that because after
executing the tests I tried to re-execute based on master and all tests
were failing with the "Failed system call was shmget(key=97530599,
size=56, 03600)" error. I also checked the shared memory segments using
"ipcs -m" and it returns some segments which is don't returned when I
execute the tests on master (after cleaning the leaked memory segments)
and it also doesn't occur when executing based on 0001 or 0003.

~/d/p/batch-tids-lookup-ambulkdelete ❯❯❯ ipcs -m
IPC status from <running system> as of Tue May 13 18:19:14 -03 2025
T ID KEY MODE OWNER GROUP
Shared Memory:
m 18087936 0x05f873bf --rw------- matheus staff
m 15925250 0x05f966fe --rw------- matheus staff
m 24248325 0x05f9677e --rw------- matheus staff
....

Note that the the 0003 patch don't have this issue so at the end we will
not have problem with this I think, but it maybe be good to mention that
although the patches are separate, there is a dependency between them,
which may cause issues on buildfarm?

Thank you for the report. With the 0001 and 0002 patches, I got a
SEGV. I've fixed this issue in the attached updated version patches.
I've confirmed that the patches pass CI tests but I'm not sure it
fixes the shared memory segment leak problem you reported. The
attached patches incorporated the comments[1] from John as well.

BTW I found that the constant 'maxblkno' in test_tidstore.sql actually
equals to InvalidBlockNumber, but not MaxBlockNumber. I think it
doesn't make sense that TidStore uses InvalidBlockNumber as the key.
The attached 0001 patch fixes it. I think we can fix it separately on
HEAD as well as back branches.

Regards,

[1] /messages/by-id/CANWCAZbiJcwSBCczLfbfiPe1mET+V9PjTZv5VvUBwarLvx1Hfg@mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

+ /*
+ * We will sort the deletable array if there are existing
+ * offsets as it should be sorted in ascending order (see
+ * _bt_delitems_vacuum()).
+ */
+ need_sort = (ndeletable > 0);
+
+ ndels = callback(workbuf_htids, workbuf_nitem, workbuf_deletable,
+ callback_state);
+ if (ndels > 0)
+ {
+ for (int i = 0; i < workbuf_nitem; i++)
+ {
+ if (workbuf_deletable[i])
+ deletable[ndeletable++] = workbuf_offs[i];
+ }
+
+ if (need_sort)
+ qsort(deletable, ndeletable, sizeof(OffsetNumber), cmpOffsetNumbers);
+
+ nhtidsdead += ndels;
+ }

I think the need_sort should be calculated after the callback? Maybe just:

if (ndeletable > 1)
qsort(deletable, ndeletable, sizeof(OffsetNumber), cmpOffsetNumbers);

I think there is no need to sort when ndeletable is 1, so here compare to 1.

--
Regards
Junwang Zhao

#8Junwang Zhao
zhjwpku@gmail.com
In reply to: Junwang Zhao (#7)
Re: Batch TIDs lookup in ambulkdelete

On Sat, Jun 7, 2025 at 8:46 PM Junwang Zhao <zhjwpku@gmail.com> wrote:

Hi Masahiko,

On Sat, Jun 7, 2025 at 5:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, May 13, 2025 at 2:26 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:

Hi,

On 30/04/25 19:36, Masahiko Sawada wrote:

Here are the summary of each attached patch:

0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
for multiple TIDs in one function call. If the given TIDs are sorted
(at least in block number), we can save radix tree lookup for the same
page entry.

0002: Convert IndexBUlkDeleteCallback() to batched operation.

0003: Use batch TIDs lookup in btree index bulk-deletion.

In patch 0003, we implement batch TID lookups for both each
deduplicated index tuple and remaining all regular index tuples, which
seems to be the most straightforward approach. While further
optimizations are possible, such as performing batch TID lookups for
all index tuples on a single page, these could introduce additional
overhead from sorting and re-sorting TIDs. Moreover, considering that
radix tree lookups are relatively inexpensive, the benefits of sorting
TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
these potential optimizations warrant further evaluation to determine
their actual impact on performance.

Also, the patch includes the batch TIDs lookup support only for btree
indexes but we potentially can improve other index AMs too.

The code looks good and also +1 for the idea. I just have some small
points:
- Maybe it would be good to mention somewhere that
IndexBulkDeleteCallback() callback returns the number of tids
members found on TidStore?
- The vac_tid_reaped() docs may need to be updated?

Thank you for looking at the patches. I agree with the above comments.

I also executed meson tests for each patch individually and the 0002
patch is not passing on my machine (MacOs).

Ok: 39
Expected Fail: 0
Fail: 271
Unexpected Pass: 0
Skipped: 22
Timeout: 0

One behaviour that I found by executing the 0002 tests is that it may be
leaking some shared memory segments. I notice that because after
executing the tests I tried to re-execute based on master and all tests
were failing with the "Failed system call was shmget(key=97530599,
size=56, 03600)" error. I also checked the shared memory segments using
"ipcs -m" and it returns some segments which is don't returned when I
execute the tests on master (after cleaning the leaked memory segments)
and it also doesn't occur when executing based on 0001 or 0003.

~/d/p/batch-tids-lookup-ambulkdelete ❯❯❯ ipcs -m
IPC status from <running system> as of Tue May 13 18:19:14 -03 2025
T ID KEY MODE OWNER GROUP
Shared Memory:
m 18087936 0x05f873bf --rw------- matheus staff
m 15925250 0x05f966fe --rw------- matheus staff
m 24248325 0x05f9677e --rw------- matheus staff
....

Note that the the 0003 patch don't have this issue so at the end we will
not have problem with this I think, but it maybe be good to mention that
although the patches are separate, there is a dependency between them,
which may cause issues on buildfarm?

Thank you for the report. With the 0001 and 0002 patches, I got a
SEGV. I've fixed this issue in the attached updated version patches.
I've confirmed that the patches pass CI tests but I'm not sure it
fixes the shared memory segment leak problem you reported. The
attached patches incorporated the comments[1] from John as well.

BTW I found that the constant 'maxblkno' in test_tidstore.sql actually
equals to InvalidBlockNumber, but not MaxBlockNumber. I think it
doesn't make sense that TidStore uses InvalidBlockNumber as the key.
The attached 0001 patch fixes it. I think we can fix it separately on
HEAD as well as back branches.

Regards,

[1] /messages/by-id/CANWCAZbiJcwSBCczLfbfiPe1mET+V9PjTZv5VvUBwarLvx1Hfg@mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

+ /*
+ * We will sort the deletable array if there are existing
+ * offsets as it should be sorted in ascending order (see
+ * _bt_delitems_vacuum()).
+ */
+ need_sort = (ndeletable > 0);
+
+ ndels = callback(workbuf_htids, workbuf_nitem, workbuf_deletable,
+ callback_state);
+ if (ndels > 0)
+ {
+ for (int i = 0; i < workbuf_nitem; i++)
+ {
+ if (workbuf_deletable[i])
+ deletable[ndeletable++] = workbuf_offs[i];
+ }
+
+ if (need_sort)
+ qsort(deletable, ndeletable, sizeof(OffsetNumber), cmpOffsetNumbers);
+
+ nhtidsdead += ndels;
+ }

I think the need_sort should be calculated after the callback? Maybe just:

if (ndeletable > 1)
qsort(deletable, ndeletable, sizeof(OffsetNumber), cmpOffsetNumbers);

I think there is no need to sort when ndeletable is 1, so here compare to 1.

After further study of the patch, I realize I misunderstood the logic
here, qsort
is only needed when there are other existing items in deletable before the
callback, so the code LGTM ;)

--
Regards
Junwang Zhao

--
Regards
Junwang Zhao

#9John Naylor
johncnaylorls@gmail.com
In reply to: Masahiko Sawada (#4)
Re: Batch TIDs lookup in ambulkdelete

On Sat, Jun 7, 2025 at 4:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

BTW I found that the constant 'maxblkno' in test_tidstore.sql actually
equals to InvalidBlockNumber, but not MaxBlockNumber. I think it
doesn't make sense that TidStore uses InvalidBlockNumber as the key.
The attached 0001 patch fixes it. I think we can fix it separately on
HEAD as well as back branches.

I don't see a bug here, so I don't see the need for a backpatch -- the
block numbers in the tests are just numbers, they don't refer to
actual relations. I understand the desire to make it closer to
reality, but it seems cosmetic.

--
John Naylor
Amazon Web Services

#10Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Peter Geoghegan (#6)
4 attachment(s)
Re: Batch TIDs lookup in ambulkdelete

On Fri, Jun 6, 2025 at 4:28 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Fri, Jun 6, 2025 at 6:58 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Agreed. Given the above test results, it's unlikely always sorting the
array helps speedups.

Did you try specializing the sort? In my experience, it makes a big difference.

Thank you for the suggestion. I've done the same performance test with
the suggestion. The difference seems to be within an acceptable range.

HEAD PATCHED DIFF
case-6: 3,320 ms 3,374 ms 101.63%

I've attached the patches I used for this evaluation.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-Fix-maxblkno-constant-to-use-MaxBlockNumber-in-te.patchapplication/octet-stream; name=v3-0001-Fix-maxblkno-constant-to-use-MaxBlockNumber-in-te.patchDownload
From 288cffa984e5a5a4350becd6ef91709bb61d78c3 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Thu, 5 Jun 2025 16:28:06 -0700
Subject: [PATCH v3 1/4] Fix maxblkno constant to use MaxBlockNumber in
 test_tidstore.sql.

---
 src/test/modules/test_tidstore/expected/test_tidstore.out | 2 +-
 src/test/modules/test_tidstore/sql/test_tidstore.sql      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/test/modules/test_tidstore/expected/test_tidstore.out b/src/test/modules/test_tidstore/expected/test_tidstore.out
index cbcacfd26e1..13993bbfe7b 100644
--- a/src/test/modules/test_tidstore/expected/test_tidstore.out
+++ b/src/test/modules/test_tidstore/expected/test_tidstore.out
@@ -2,7 +2,7 @@ CREATE EXTENSION test_tidstore;
 -- To hide the output of do_set_block_offsets()
 CREATE TEMP TABLE hideblocks(blockno bigint);
 -- Constant values used in the tests.
-\set maxblkno 4294967295
+\set maxblkno 0xFFFFFFFE
 -- The maximum number of heap tuples (MaxHeapTuplesPerPage) in 8kB block is 291.
 -- We use a higher number to test tidstore.
 \set maxoffset 512
diff --git a/src/test/modules/test_tidstore/sql/test_tidstore.sql b/src/test/modules/test_tidstore/sql/test_tidstore.sql
index a29e4ec1c55..86770ba316f 100644
--- a/src/test/modules/test_tidstore/sql/test_tidstore.sql
+++ b/src/test/modules/test_tidstore/sql/test_tidstore.sql
@@ -4,7 +4,7 @@ CREATE EXTENSION test_tidstore;
 CREATE TEMP TABLE hideblocks(blockno bigint);
 
 -- Constant values used in the tests.
-\set maxblkno 4294967295
+\set maxblkno 0xFFFFFFFE
 -- The maximum number of heap tuples (MaxHeapTuplesPerPage) in 8kB block is 291.
 -- We use a higher number to test tidstore.
 \set maxoffset 512
-- 
2.43.5

v3-0003-Convert-IndexBulkDeleteCallback-to-process-TIDs-i.patchapplication/octet-stream; name=v3-0003-Convert-IndexBulkDeleteCallback-to-process-TIDs-i.patchDownload
From 4d5eb7a2dc80542cf8c04fecfdce90fa6f553265 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Apr 2025 22:41:33 -0700
Subject: [PATCH v3 3/4] Convert IndexBulkDeleteCallback to process TIDs in
 batch.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
 contrib/bloom/blvacuum.c              |  4 +++-
 src/backend/access/gin/ginvacuum.c    |  3 ++-
 src/backend/access/gist/gistvacuum.c  |  3 ++-
 src/backend/access/hash/hash.c        |  3 ++-
 src/backend/access/nbtree/nbtree.c    |  8 ++++++--
 src/backend/access/spgist/spgvacuum.c | 13 ++++++++-----
 src/backend/catalog/index.c           | 21 ++++++++++++++-------
 src/backend/commands/vacuum.c         | 11 +++++------
 src/include/access/genam.h            |  8 ++++++--
 9 files changed, 48 insertions(+), 26 deletions(-)

diff --git a/contrib/bloom/blvacuum.c b/contrib/bloom/blvacuum.c
index 86b15a75f6f..93089456855 100644
--- a/contrib/bloom/blvacuum.c
+++ b/contrib/bloom/blvacuum.c
@@ -83,8 +83,10 @@ blbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 									OffsetNumberNext(BloomPageGetMaxOffset(page)));
 		while (itup < itupEnd)
 		{
+			bool	dead;
+
 			/* Do we have to delete this tuple? */
-			if (callback(&itup->heapPtr, callback_state))
+			if (callback(&itup->heapPtr, 1, &dead, callback_state) > 0)
 			{
 				/* Yes; adjust count of tuples that will be left on page */
 				BloomPageGetOpaque(page)->maxoff--;
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index fbbe3a6dd70..336f100261b 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -56,7 +56,8 @@ ginVacuumItemPointers(GinVacuumState *gvs, ItemPointerData *items,
 	 */
 	for (i = 0; i < nitem; i++)
 	{
-		if (gvs->callback(items + i, gvs->callback_state))
+		bool	deletable;
+		if (gvs->callback(items + i, 1, &deletable, gvs->callback_state) > 0)
 		{
 			gvs->result->tuples_removed += 1;
 			if (!tmpitems)
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index dca236b6e57..ab2dfa7fa86 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -384,8 +384,9 @@ restart:
 			{
 				ItemId		iid = PageGetItemId(page, off);
 				IndexTuple	idxtuple = (IndexTuple) PageGetItem(page, iid);
+				bool	deletable;
 
-				if (callback(&(idxtuple->t_tid), callback_state))
+				if (callback(&(idxtuple->t_tid), 1, &deletable, callback_state) > 0)
 					todelete[ntodelete++] = off;
 			}
 		}
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 53061c819fb..e0fffadf698 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -734,6 +734,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf,
 			IndexTuple	itup;
 			Bucket		bucket;
 			bool		kill_tuple = false;
+			bool		dead;
 
 			itup = (IndexTuple) PageGetItem(page,
 											PageGetItemId(page, offno));
@@ -743,7 +744,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf,
 			 * To remove the dead tuples, we strictly want to rely on results
 			 * of callback function.  refer btvacuumpage for detailed reason.
 			 */
-			if (callback && callback(htup, callback_state))
+			if (callback && callback(htup, 1, &dead, callback_state) > 0)
 			{
 				kill_tuple = true;
 				if (tuples_removed)
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 765659887af..342dad0ef91 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -1486,8 +1486,10 @@ backtrack:
 				Assert(!BTreeTupleIsPivot(itup));
 				if (!BTreeTupleIsPosting(itup))
 				{
+					bool		dead;
+
 					/* Regular tuple, standard table TID representation */
-					if (callback(&itup->t_tid, callback_state))
+					if (callback(&itup->t_tid, 1, &dead, callback_state) > 0)
 					{
 						deletable[ndeletable++] = offnum;
 						nhtidsdead++;
@@ -1671,7 +1673,9 @@ btreevacuumposting(BTVacState *vstate, IndexTuple posting,
 
 	for (int i = 0; i < nitem; i++)
 	{
-		if (!vstate->callback(items + i, vstate->callback_state))
+		bool		dead;
+
+		if (vstate->callback(items + i, 1, &dead, vstate->callback_state) == 0)
 		{
 			/* Live table TID */
 			live++;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 2678f7ab782..1af1dd8ff01 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -153,9 +153,11 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
 										   PageGetItemId(page, i));
 		if (lt->tupstate == SPGIST_LIVE)
 		{
+			bool	dead;
+
 			Assert(ItemPointerIsValid(&lt->heapPtr));
 
-			if (bds->callback(&lt->heapPtr, bds->callback_state))
+			if (bds->callback(&lt->heapPtr, 1, &dead, bds->callback_state) > 0)
 			{
 				bds->stats->tuples_removed += 1;
 				deletable[i] = true;
@@ -425,9 +427,10 @@ vacuumLeafRoot(spgBulkDeleteState *bds, Relation index, Buffer buffer)
 										   PageGetItemId(page, i));
 		if (lt->tupstate == SPGIST_LIVE)
 		{
+			bool	dead;
 			Assert(ItemPointerIsValid(&lt->heapPtr));
 
-			if (bds->callback(&lt->heapPtr, bds->callback_state))
+			if (bds->callback(&lt->heapPtr, 1, &dead, bds->callback_state) > 0)
 			{
 				bds->stats->tuples_removed += 1;
 				toDelete[xlrec.nDelete] = i;
@@ -966,10 +969,10 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 }
 
 /* Dummy callback to delete no tuples during spgvacuumcleanup */
-static bool
-dummy_callback(ItemPointer itemptr, void *state)
+static int
+dummy_callback(ItemPointer itemptrs, int nitem, bool *deletable, void *state)
 {
-	return false;
+	return 0;
 }
 
 /*
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 739a92bdcc1..fc40317728f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -126,7 +126,8 @@ static void index_update_stats(Relation rel,
 static void IndexCheckExclusion(Relation heapRelation,
 								Relation indexRelation,
 								IndexInfo *indexInfo);
-static bool validate_index_callback(ItemPointer itemptr, void *opaque);
+static int validate_index_callback(ItemPointer itemptrs, int nitem, bool *deletable,
+								   void *opaque);
 static bool ReindexIsCurrentlyProcessingIndex(Oid indexOid);
 static void SetReindexProcessing(Oid heapOid, Oid indexOid);
 static void ResetReindexProcessing(void);
@@ -3479,15 +3480,21 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 /*
  * validate_index_callback - bulkdelete callback to collect the index TIDs
  */
-static bool
-validate_index_callback(ItemPointer itemptr, void *opaque)
+static int
+validate_index_callback(ItemPointer itemptrs, int nitem, bool *deletable,
+						void *opaque)
 {
 	ValidateIndexState *state = (ValidateIndexState *) opaque;
-	int64		encoded = itemptr_encode(itemptr);
 
-	tuplesort_putdatum(state->tuplesort, Int64GetDatum(encoded), false);
-	state->itups += 1;
-	return false;				/* never actually delete anything */
+	for (int i = 0; i < nitem; i++)
+	{
+		int64		encoded = itemptr_encode(&(itemptrs[i]));
+
+		tuplesort_putdatum(state->tuplesort, Int64GetDatum(encoded), false);
+	}
+	state->itups += nitem;
+
+	return 0;				/* never actually delete anything */
 }
 
 /*
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index e47b61772b1..f93ea87f1c3 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -127,7 +127,7 @@ static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params,
 					   BufferAccessStrategy bstrategy);
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
-static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static int vac_tid_reaped(ItemPointer itemptrs, int nitem, bool *deletable, void *state);
 
 /*
  * GUC check function to ensure GUC value specified is within the allowable
@@ -2650,15 +2650,14 @@ vac_cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
 }
 
 /*
- *	vac_tid_reaped() -- is a particular tid deletable?
+ *	vac_tid_reaped() -- are the given TIDs deletable?
  *
  *		This has the right signature to be an IndexBulkDeleteCallback.
  */
-static bool
-vac_tid_reaped(ItemPointer itemptr, void *state)
+static int
+vac_tid_reaped(ItemPointer itemptrs, int nitem, bool *deletable, void *state)
 {
 	TidStore   *dead_items = (TidStore *) state;
-	bool		isdead;
 
-	return TidStoreIsMemberMulti(dead_items, itemptr, 1, &isdead) > 0;
+	return TidStoreIsMemberMulti(dead_items, itemptrs, nitem, deletable);
 }
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..f96619cf658 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -106,8 +106,12 @@ typedef struct IndexBulkDeleteResult
 	BlockNumber pages_free;		/* # pages available for reuse */
 } IndexBulkDeleteResult;
 
-/* Typedef for callback function to determine if a tuple is bulk-deletable */
-typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, void *state);
+/*
+ * Typedef for callback function to determine if tuples are bulk-deletable.
+ * The function returns the number of deletable tuples.
+ */
+typedef int (*IndexBulkDeleteCallback) (ItemPointer itemptr, int nitem,
+										bool *deletable, void *state);
 
 /* struct definitions appear in relscan.h */
 typedef struct IndexScanDescData *IndexScanDesc;
-- 
2.43.5

v3-0002-tidstore.c-introduce-TidStoreIsMemberMulti.patchapplication/octet-stream; name=v3-0002-tidstore.c-introduce-TidStoreIsMemberMulti.patchDownload
From 5e5a089bc14eb29df8b16da8ebec38344f7fad1c Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 21 Apr 2025 22:40:57 -0700
Subject: [PATCH v3 2/4] tidstore.c: introduce TidStoreIsMemberMulti.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
 src/backend/access/common/tidstore.c          | 62 +++++++++++++------
 src/backend/commands/vacuum.c                 |  3 +-
 src/include/access/tidstore.h                 |  3 +-
 .../modules/test_tidstore/test_tidstore.c     | 11 +++-
 4 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/common/tidstore.c b/src/backend/access/common/tidstore.c
index 5bd75fb499c..270ef260630 100644
--- a/src/backend/access/common/tidstore.c
+++ b/src/backend/access/common/tidstore.c
@@ -416,20 +416,11 @@ TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets,
 		local_ts_set(ts->tree.local, blkno, page);
 }
 
-/* Return true if the given TID is present in the TidStore */
-bool
-TidStoreIsMember(TidStore *ts, ItemPointer tid)
+static pg_attribute_always_inline int
+tidstore_is_member_page(BlocktableEntry *page, OffsetNumber off)
 {
 	int			wordnum;
 	int			bitnum;
-	BlocktableEntry *page;
-	BlockNumber blk = ItemPointerGetBlockNumber(tid);
-	OffsetNumber off = ItemPointerGetOffsetNumber(tid);
-
-	if (TidStoreIsShared(ts))
-		page = shared_ts_find(ts->tree.shared, blk);
-	else
-		page = local_ts_find(ts->tree.local, blk);
 
 	/* no entry for the blk */
 	if (page == NULL)
@@ -443,19 +434,54 @@ TidStoreIsMember(TidStore *ts, ItemPointer tid)
 			if (page->header.full_offsets[i] == off)
 				return true;
 		}
+
 		return false;
 	}
-	else
+
+	wordnum = WORDNUM(off);
+	bitnum = BITNUM(off);
+
+	/* no bitmap for the off */
+	if (wordnum >= page->header.nwords)
+		return false;
+
+	return (page->words[wordnum] & ((bitmapword) 1 << bitnum)) != 0;
+}
+
+/*
+ * Check if the given TIDs are present in the TidStore. Return the number
+ * of TIDs found in the TidStore. *ismember must have enough space for
+ * up to 'ntids' elements.
+ */
+int
+TidStoreIsMemberMulti(TidStore *ts, ItemPointer tids, int ntids, bool *ismembers)
+{
+	BlocktableEntry *page = NULL;
+	BlockNumber last_blk = InvalidBlockNumber;
+	int			nmembers = 0;
+
+	for (int i = 0; i < ntids; i++)
 	{
-		wordnum = WORDNUM(off);
-		bitnum = BITNUM(off);
+		ItemPointer tid = &(tids[i]);
+		BlockNumber blk = ItemPointerGetBlockNumber(tid);
+		OffsetNumber off = ItemPointerGetOffsetNumber(tid);
 
-		/* no bitmap for the off */
-		if (wordnum >= page->header.nwords)
-			return false;
+		if (blk != last_blk)
+		{
+			if (TidStoreIsShared(ts))
+				page = shared_ts_find(ts->tree.shared, blk);
+			else
+				page = local_ts_find(ts->tree.local, blk);
+		}
 
-		return (page->words[wordnum] & ((bitmapword) 1 << bitnum)) != 0;
+		ismembers[i] = tidstore_is_member_page(page, off);
+		if (ismembers[i])
+			nmembers++;
+
+		last_blk = blk;
 	}
+
+	return nmembers;
 }
 
 /*
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 33a33bf6b1c..e47b61772b1 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2658,6 +2658,7 @@ static bool
 vac_tid_reaped(ItemPointer itemptr, void *state)
 {
 	TidStore   *dead_items = (TidStore *) state;
+	bool		isdead;
 
-	return TidStoreIsMember(dead_items, itemptr);
+	return TidStoreIsMemberMulti(dead_items, itemptr, 1, &isdead) > 0;
 }
diff --git a/src/include/access/tidstore.h b/src/include/access/tidstore.h
index 041091df278..b6137e78780 100644
--- a/src/include/access/tidstore.h
+++ b/src/include/access/tidstore.h
@@ -40,7 +40,8 @@ extern void TidStoreUnlock(TidStore *ts);
 extern void TidStoreDestroy(TidStore *ts);
 extern void TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets,
 									int num_offsets);
-extern bool TidStoreIsMember(TidStore *ts, ItemPointer tid);
+extern int	TidStoreIsMemberMulti(TidStore *ts, ItemPointer tids, int ntids,
+								  bool *ismembers);
 extern TidStoreIter *TidStoreBeginIterate(TidStore *ts);
 extern TidStoreIterResult *TidStoreIterateNext(TidStoreIter *iter);
 extern int	TidStoreGetBlockOffsets(TidStoreIterResult *result,
diff --git a/src/test/modules/test_tidstore/test_tidstore.c b/src/test/modules/test_tidstore/test_tidstore.c
index eb16e0fbfa6..277d437e237 100644
--- a/src/test/modules/test_tidstore/test_tidstore.c
+++ b/src/test/modules/test_tidstore/test_tidstore.c
@@ -222,16 +222,22 @@ check_set_block_offsets(PG_FUNCTION_ARGS)
 	TidStoreIterResult *iter_result;
 	int			num_iter_tids = 0;
 	int			num_lookup_tids = 0;
+	bool	   *ismembers;
 	BlockNumber prevblkno = 0;
 
 	check_tidstore_available();
 
 	/* lookup each member in the verification array */
+	ismembers = palloc(sizeof(bool) * items.num_tids);
+	TidStoreIsMemberMulti(tidstore, items.insert_tids, items.num_tids, ismembers);
 	for (int i = 0; i < items.num_tids; i++)
-		if (!TidStoreIsMember(tidstore, &items.insert_tids[i]))
+	{
+		if (!ismembers[i])
 			elog(ERROR, "missing TID with block %u, offset %u",
 				 ItemPointerGetBlockNumber(&items.insert_tids[i]),
 				 ItemPointerGetOffsetNumber(&items.insert_tids[i]));
+	}
+	pfree(ismembers);
 
 	/*
 	 * Lookup all possible TIDs for each distinct block in the verification
@@ -248,11 +254,12 @@ check_set_block_offsets(PG_FUNCTION_ARGS)
 		for (OffsetNumber offset = FirstOffsetNumber; offset < MaxOffsetNumber; offset++)
 		{
 			ItemPointerData tid;
+			bool		ismember;
 
 			ItemPointerSet(&tid, blkno, offset);
 
 			TidStoreLockShare(tidstore);
-			if (TidStoreIsMember(tidstore, &tid))
+			if (TidStoreIsMemberMulti(tidstore, &tid, 1, &ismember) > 0)
 				ItemPointerSet(&items.lookup_tids[num_lookup_tids++], blkno, offset);
 			TidStoreUnlock(tidstore);
 		}
-- 
2.43.5

v3-0004-Use-batch-TIDs-lookup-in-btree-index-bulk-deletio.patchapplication/octet-stream; name=v3-0004-Use-batch-TIDs-lookup-in-btree-index-bulk-deletio.patchDownload
From 5216df17cfdba975ab3264ad484dc5d0830a9188 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 22 Apr 2025 12:25:11 -0700
Subject: [PATCH v3 4/4] Use batch TIDs lookup in btree index bulk-deletion.

TIDs in the postlist are sorted. But TIDs of the gathered regular
index tuples are not sorted.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch-through:
---
 src/backend/access/nbtree/nbtree.c | 116 +++++++++++++++++++----------
 1 file changed, 77 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 342dad0ef91..a459c700446 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -23,6 +23,7 @@
 #include "access/stratnum.h"
 #include "commands/progress.h"
 #include "commands/vacuum.h"
+#include "common/int.h"
 #include "nodes/execnodes.h"
 #include "pgstat.h"
 #include "storage/bulk_write.h"
@@ -1310,6 +1311,20 @@ btvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 		IndexFreeSpaceMapVacuum(rel);
 }
 
+/* comparator for sorting OffsetNumbers */
+static inline int
+cmpOffsetNumbers(const void *a, const void *b)
+{
+	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
+}
+
+#define ST_SORT sort_vacuumpage_offnum
+#define ST_ELEMENT_TYPE OffsetNumber
+#define ST_COMPARE(a, b) cmpOffsetNumbers(a, b)
+#define ST_SCOPE static
+#define ST_DEFINE
+#include "lib/sort_template.h"
+
 /*
  * btvacuumpage --- VACUUM one page
  *
@@ -1473,6 +1488,10 @@ backtrack:
 		nhtidslive = 0;
 		if (callback)
 		{
+			OffsetNumber workbuf_offs[MaxIndexTuplesPerPage];
+			ItemPointerData workbuf_htids[MaxIndexTuplesPerPage];
+			int			workbuf_nitem = 0;
+
 			/* btbulkdelete callback tells us what to delete (or update) */
 			for (offnum = minoff;
 				 offnum <= maxoff;
@@ -1486,16 +1505,13 @@ backtrack:
 				Assert(!BTreeTupleIsPivot(itup));
 				if (!BTreeTupleIsPosting(itup))
 				{
-					bool		dead;
-
-					/* Regular tuple, standard table TID representation */
-					if (callback(&itup->t_tid, 1, &dead, callback_state) > 0)
-					{
-						deletable[ndeletable++] = offnum;
-						nhtidsdead++;
-					}
-					else
-						nhtidslive++;
+					/*
+					 * Regular tuple, standard table TID representation. Will
+					 * verify them as a whole later.
+					 */
+					workbuf_offs[workbuf_nitem] = offnum;
+					workbuf_htids[workbuf_nitem] = itup->t_tid;
+					workbuf_nitem++;
 				}
 				else
 				{
@@ -1542,6 +1558,38 @@ backtrack:
 					nhtidslive += nremaining;
 				}
 			}
+
+			if (workbuf_nitem > 0)
+			{
+				bool		workbuf_deletable[MaxIndexTuplesPerPage];
+				bool		need_sort;
+				int			ndels;
+
+				/*
+				 * We will sort the deletable array if there are existing
+				 * offsets as it should be sorted in ascending order (see
+				 * _bt_delitems_vacuum()).
+				 */
+				need_sort = (ndeletable > 0);
+
+				ndels = callback(workbuf_htids, workbuf_nitem, workbuf_deletable,
+								 callback_state);
+				if (ndels > 0)
+				{
+					for (int i = 0; i < workbuf_nitem; i++)
+					{
+						if (workbuf_deletable[i])
+							deletable[ndeletable++] = workbuf_offs[i];
+					}
+
+					if (need_sort)
+						sort_vacuumpage_offnum(deletable, ndeletable);
+
+					nhtidsdead += ndels;
+				}
+
+				nhtidslive += workbuf_nitem - ndels;
+			}
 		}
 
 		/*
@@ -1666,45 +1714,35 @@ static BTVacuumPosting
 btreevacuumposting(BTVacState *vstate, IndexTuple posting,
 				   OffsetNumber updatedoffset, int *nremaining)
 {
-	int			live = 0;
+	int			ndeletable;
 	int			nitem = BTreeTupleGetNPosting(posting);
 	ItemPointer items = BTreeTupleGetPosting(posting);
+	bool		deletable[MaxIndexTuplesPerPage];
 	BTVacuumPosting vacposting = NULL;
 
-	for (int i = 0; i < nitem; i++)
+	ndeletable = vstate->callback(items, nitem, deletable, vstate->callback_state);
+
+	/* All items are live */
+	if (ndeletable == 0)
 	{
-		bool		dead;
+		*nremaining = nitem;
+		return NULL;
+	}
 
-		if (vstate->callback(items + i, 1, &dead, vstate->callback_state) == 0)
-		{
-			/* Live table TID */
-			live++;
-		}
-		else if (vacposting == NULL)
-		{
-			/*
-			 * First dead table TID encountered.
-			 *
-			 * It's now clear that we need to delete one or more dead table
-			 * TIDs, so start maintaining metadata describing how to update
-			 * existing posting list tuple.
-			 */
-			vacposting = palloc(offsetof(BTVacuumPostingData, deletetids) +
-								nitem * sizeof(uint16));
+	vacposting = palloc(offsetof(BTVacuumPostingData, deletetids) +
+						ndeletable * sizeof(uint16));
+	vacposting->itup = posting;
+	vacposting->updatedoffset = updatedoffset;
+	vacposting->ndeletedtids = 0;
 
-			vacposting->itup = posting;
-			vacposting->updatedoffset = updatedoffset;
-			vacposting->ndeletedtids = 0;
-			vacposting->deletetids[vacposting->ndeletedtids++] = i;
-		}
-		else
-		{
-			/* Second or subsequent dead table TID */
+	for (int i = 0; i < nitem; i++)
+	{
+		if (deletable[i])
 			vacposting->deletetids[vacposting->ndeletedtids++] = i;
-		}
 	}
+	Assert(ndeletable == vacposting->ndeletedtids);
 
-	*nremaining = live;
+	*nremaining = nitem - ndeletable;
 	return vacposting;
 }
 
-- 
2.43.5

#11Matheus Alcantara
matheusssilv97@gmail.com
In reply to: Masahiko Sawada (#4)
Re: Batch TIDs lookup in ambulkdelete

On 06/06/25 18:34, Masahiko Sawada wrote:

Thank you for the report. With the 0001 and 0002 patches, I got a
SEGV. I've fixed this issue in the attached updated version patches.
I've confirmed that the patches pass CI tests but I'm not sure it
fixes the shared memory segment leak problem you reported. The
attached patches incorporated the comments[1] from John as well.

Thanks for the new version. I've tested the v3 version attached on [1]/messages/by-id/CAD21AoAcfp5kdcsT5727Vw4JF-Rw7b73zXh_GwXqNHg3P7-UoA@mail.gmail.com
and I can confirm that the tests are now passing and I can't see the
shared memory issues that I've mentioned on v1.

Thanks!

[1]: /messages/by-id/CAD21AoAcfp5kdcsT5727Vw4JF-Rw7b73zXh_GwXqNHg3P7-UoA@mail.gmail.com

--
Matheus Alcantara

#12Masahiko Sawada
sawada.mshk@gmail.com
In reply to: John Naylor (#9)
Re: Batch TIDs lookup in ambulkdelete

On Mon, Jun 9, 2025 at 4:03 AM John Naylor <johncnaylorls@gmail.com> wrote:

On Sat, Jun 7, 2025 at 4:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

BTW I found that the constant 'maxblkno' in test_tidstore.sql actually
equals to InvalidBlockNumber, but not MaxBlockNumber. I think it
doesn't make sense that TidStore uses InvalidBlockNumber as the key.
The attached 0001 patch fixes it. I think we can fix it separately on
HEAD as well as back branches.

I don't see a bug here, so I don't see the need for a backpatch -- the
block numbers in the tests are just numbers, they don't refer to
actual relations. I understand the desire to make it closer to
reality, but it seems cosmetic.

Okay, the patch fixes the test which is not a bug. Related to this, I
realized that TidStoreSetBlockOffsets() checks if the given offset
number is a valid OffsetNumber but doesn't do that for the given block
number, which is not a bug neither, but I think we can add a check
along with the test change.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#13John Naylor
johncnaylorls@gmail.com
In reply to: Masahiko Sawada (#12)
Re: Batch TIDs lookup in ambulkdelete

On Thu, Jun 12, 2025 at 1:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Related to this, I
realized that TidStoreSetBlockOffsets() checks if the given offset
number is a valid OffsetNumber

To be specific, it does not check for OffsetNumberIsValid, it checks
that the offset can actually be stored in TidStore's bitmap array (see
comment above MAX_OFFSET_IN_BITMAP). In practice, I think it would
only matter with e.g. the combination of 32-bit platform and 32kB page
size, and even then only if some fork/AM/future core change allowed
about half the page to fill up with line pointers, since INT8_MAX *
BITS_PER_BITMAPWORD * sizeof(ItemIdData) = 16256

We should probably turn that elog into an assert. Maybe it would be
clearer if we just asserted the bitmap index is within range. We could
be more sure about the assertion if we changed "nwords" to be
unsigned. I think I had the idea to reserve -1 etc for a possible
special meaning, but I'm now thinking there's no use for that, since
we already have a separate struct member for flags.

but doesn't do that for the given block
number, which is not a bug neither, but I think we can add a check
along with the test change.

The difference here is no configuration I can think of will cause an
unstorable block number to arrive here by accident. In fact, I imagine
TidStore would work just fine with 64-bit block numbers.

--
John Naylor
Amazon Web Services