Emit fewer vacuum records by reaping removable tuples during pruning

Started by Melanie Plagemanabout 2 years ago103 messages
#1Melanie Plageman
melanieplageman@gmail.com
3 attachment(s)

Hi,

When there are no indexes on the relation, we can set would-be dead
items LP_UNUSED and remove them during pruning. This saves us a vacuum
WAL record, reducing WAL volume (and time spent writing and syncing
WAL).

See this example:

drop table if exists foo;
create table foo(a int) with (autovacuum_enabled=false);
insert into foo select i from generate_series(1,10000000)i;
update foo set a = 10;
\timing on
vacuum foo;

On my machine, the attached patch set provides a 10% speedup for vacuum
for this example -- and a 40% decrease in WAL bytes emitted.

Admittedly, this case is probably unusual in the real world. On-access
pruning would preclude it. Throw a SELECT * FROM foo before the vacuum
and the patch has no performance benefit.

However, it has no downside as far as I can tell. And, IMHO, it is a
code clarity improvement. This change means that lazy_vacuum_heap_page()
is only called when we are actually doing a second pass and reaping dead
items. I found it quite confusing that lazy_vacuum_heap_page() was
called by lazy_scan_heap() to set dead items unused in a block that we
just pruned.

I think it also makes it clear that we should update the VM in
lazy_scan_prune(). All callers of lazy_scan_prune() will now consider
updating the VM after returning. And most of the state communicated back
to lazy_scan_heap() from lazy_scan_prune() is to inform it whether or
not to update the VM. I didn't do that in this patch set because I would
need to pass all_visible_according_to_vm to lazy_scan_prune() and that
change didn't seem worth the improvement in code clarity in
lazy_scan_heap().

I am planning to add a VM update into the freeze record, at which point
I will move the VM update code into lazy_scan_prune(). This will then
allow us to consolidate the freespace map update code for the prune and
noprune cases and make lazy_scan_heap() short and sweet.

Note that (on principle) this patch set is on top of the bug fix I
proposed in [1]/messages/by-id/CAAKRu_YiL=44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ@mail.gmail.com.

- Melanie

[1]: /messages/by-id/CAAKRu_YiL=44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ@mail.gmail.com

Attachments:

v1-0003-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchapplication/x-patch; name=v1-0003-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchDownload
From 15e826c114f2df220c8e08ea0ad045bd462e43da Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Nov 2023 14:10:05 -0500
Subject: [PATCH v1 3/3] Set would-be dead items LP_UNUSED while pruning

If there are no indexes on a relation, items can be marked LP_UNUSED
instead of LP_DEAD during lazy_scan_prune(). This avoids a separate
invocation of lazy_vacuum_heap_page() and saves a vacuum WAL record.

To accomplish this, pass lazy_scan_prune() a new parameter, pronto_reap,
which indicates that dead line pointers should be set to LP_UNUSED
during pruning, allowing earlier reaping of tuples.

Because we don't update the freespace map until after dropping the lock
on the buffer and we need the lock while we update the visibility map,
save our intent to update the freespace map in
LVPagePruneState.recordfreespace.
---
 src/backend/access/heap/heapam.c     |   9 +-
 src/backend/access/heap/pruneheap.c  |  77 ++++++++++++---
 src/backend/access/heap/vacuumlazy.c | 140 +++++++++++----------------
 src/include/access/heapam.h          |   3 +-
 4 files changed, 125 insertions(+), 104 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14de8158d4..b578c32eeb 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8803,8 +8803,13 @@ heap_xlog_prune(XLogReaderState *record)
 		nunused = (end - nowunused);
 		Assert(nunused >= 0);
 
-		/* Update all line pointers per the record, and repair fragmentation */
-		heap_page_prune_execute(buffer,
+		/*
+		 * Update all line pointers per the record, and repair fragmentation.
+		 * We always pass pronto_reap as true, because we don't know whether
+		 * or not this option was used when pruning. This reduces the
+		 * validation done on replay in an assert build.
+		 */
+		heap_page_prune_execute(buffer, true,
 								redirected, nredirected,
 								nowdead, ndead,
 								nowunused, nunused);
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c5f1abd95a..750a03c925 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -35,6 +35,8 @@ typedef struct
 
 	/* tuple visibility test, initialized for the relation */
 	GlobalVisState *vistest;
+	/* whether or not dead items can be set LP_UNUSED during pruning */
+	bool		pronto_reap;
 
 	TransactionId new_prune_xid;	/* new prune hint value for page */
 	TransactionId snapshotConflictHorizon;	/* latest xid removed */
@@ -148,7 +150,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			PruneResult presult;
 
-			heap_page_prune(relation, buffer, vistest, &presult, NULL);
+			heap_page_prune(relation, buffer, vistest, false,
+							&presult, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -193,6 +196,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * (see heap_prune_satisfies_vacuum and
  * HeapTupleSatisfiesVacuum).
  *
+ * pronto_reap indicates whether or not dead items can be set LP_UNUSED during
+ * pruning.
+ *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -203,6 +209,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 void
 heap_page_prune(Relation relation, Buffer buffer,
 				GlobalVisState *vistest,
+				bool pronto_reap,
 				PruneResult *presult,
 				OffsetNumber *off_loc)
 {
@@ -227,6 +234,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 	prstate.new_prune_xid = InvalidTransactionId;
 	prstate.rel = relation;
 	prstate.vistest = vistest;
+	prstate.pronto_reap = pronto_reap;
 	prstate.snapshotConflictHorizon = InvalidTransactionId;
 	prstate.nredirected = prstate.ndead = prstate.nunused = 0;
 	memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -306,9 +314,9 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (off_loc)
 			*off_loc = offnum;
 
-		/* Nothing to do if slot is empty or already dead */
+		/* Nothing to do if slot is empty */
 		itemid = PageGetItemId(page, offnum);
-		if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+		if (!ItemIdIsUsed(itemid))
 			continue;
 
 		/* Process this item or chain of items */
@@ -330,7 +338,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 		 * Apply the planned item changes, then repair page fragmentation, and
 		 * update the page's hint bit about whether it has free line pointers.
 		 */
-		heap_page_prune_execute(buffer,
+		heap_page_prune_execute(buffer, prstate.pronto_reap,
 								prstate.redirected, prstate.nredirected,
 								prstate.nowdead, prstate.ndead,
 								prstate.nowunused, prstate.nunused);
@@ -581,7 +589,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * function.)
 		 */
 		if (ItemIdIsDead(lp))
+		{
+			/*
+			 * If the relation has no indexes, we can set dead line pointers
+			 * LP_UNUSED now. We don't increment ndeleted here since the LP
+			 * was already marked dead.
+			 */
+			if (prstate->pronto_reap)
+				heap_prune_record_unused(prstate, offnum);
+
 			break;
+		}
 
 		Assert(ItemIdIsNormal(lp));
 		htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -715,7 +733,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * redirect the root to the correct chain member.
 		 */
 		if (i >= nchain)
-			heap_prune_record_dead(prstate, rootoffnum);
+		{
+			/*
+			 * If the relation has no indexes, we can remove dead tuples
+			 * during pruning instead of marking their line pointers dead. Set
+			 * this tuple's line pointer LP_UNUSED.
+			 */
+			if (prstate->pronto_reap)
+				heap_prune_record_unused(prstate, rootoffnum);
+			else
+				heap_prune_record_dead(prstate, rootoffnum);
+		}
 		else
 			heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
 	}
@@ -726,9 +754,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * item.  This can happen if the loop in heap_page_prune caused us to
 		 * visit the dead successor of a redirect item before visiting the
 		 * redirect item.  We can clean up by setting the redirect item to
-		 * DEAD state.
+		 * DEAD state. If pronto_reap is true, we can set it LP_UNUSED now.
 		 */
-		heap_prune_record_dead(prstate, rootoffnum);
+		if (prstate->pronto_reap)
+			heap_prune_record_unused(prstate, rootoffnum);
+		else
+			heap_prune_record_dead(prstate, rootoffnum);
 	}
 
 	return ndeleted;
@@ -792,7 +823,7 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
  * buffer.
  */
 void
-heap_page_prune_execute(Buffer buffer,
+heap_page_prune_execute(Buffer buffer, bool pronto_reap,
 						OffsetNumber *redirected, int nredirected,
 						OffsetNumber *nowdead, int ndead,
 						OffsetNumber *nowunused, int nunused)
@@ -902,14 +933,28 @@ heap_page_prune_execute(Buffer buffer,
 
 #ifdef USE_ASSERT_CHECKING
 
-		/*
-		 * Only heap-only tuples can become LP_UNUSED during pruning.  They
-		 * don't need to be left in place as LP_DEAD items until VACUUM gets
-		 * around to doing index vacuuming.
-		 */
-		Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
-		htup = (HeapTupleHeader) PageGetItem(page, lp);
-		Assert(HeapTupleHeaderIsHeapOnly(htup));
+		if (pronto_reap)
+		{
+			/*
+			 * If the relation has no indexes, we may set any of LP_NORMAL,
+			 * LP_REDIRECT, or LP_DEAD items to LP_UNUSED during pruning. We
+			 * can't check much here except that, if the item is LP_NORMAL, it
+			 * should have storage before it is set LP_UNUSED.
+			 */
+			Assert(!ItemIdIsNormal(lp) || ItemIdHasStorage(lp));
+		}
+		else
+		{
+			/*
+			 * If the relation has indexes, only heap-only tuples can become
+			 * LP_UNUSED during pruning. They don't need to be left in place
+			 * as LP_DEAD items until VACUUM gets around to doing index
+			 * vacuuming.
+			 */
+			Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
+			htup = (HeapTupleHeader) PageGetItem(page, lp);
+			Assert(HeapTupleHeaderIsHeapOnly(htup));
+		}
 #endif
 
 		ItemIdSetUnused(lp);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 379f693280..26e1ea110f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -218,6 +218,8 @@ typedef struct LVRelState
 typedef struct LVPagePruneState
 {
 	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
+	/* whether or not caller should do FSM processing for block */
+	bool		recordfreespace;
 
 	/*
 	 * State describes the proper VM bit states to set for the page following
@@ -829,6 +831,7 @@ lazy_scan_heap(LVRelState *vacrel)
 				next_fsm_block_to_vacuum = 0;
 	VacDeadItems *dead_items = vacrel->dead_items;
 	Buffer		vmbuffer = InvalidBuffer;
+	int			tuples_already_deleted;
 	bool		next_unskippable_allvis,
 				skipping_current_range;
 	const int	initprog_index[] = {
@@ -1009,6 +1012,8 @@ lazy_scan_heap(LVRelState *vacrel)
 			continue;
 		}
 
+		tuples_already_deleted = vacrel->tuples_deleted;
+
 		/*
 		 * Prune, freeze, and count tuples.
 		 *
@@ -1022,63 +1027,6 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
-		if (vacrel->nindexes == 0)
-		{
-			/*
-			 * Consider the need to do page-at-a-time heap vacuuming when
-			 * using the one-pass strategy now.
-			 *
-			 * The one-pass strategy will never call lazy_vacuum().  The steps
-			 * performed here can be thought of as the one-pass equivalent of
-			 * a call to lazy_vacuum().
-			 */
-			if (prunestate.has_lpdead_items)
-			{
-				Size		freespace;
-
-				lazy_vacuum_heap_page(vacrel, blkno, buf, 0, vmbuffer);
-
-				/* Forget the LP_DEAD items that we just vacuumed */
-				dead_items->num_items = 0;
-
-				/*
-				 * Now perform FSM processing for blkno, and move on to next
-				 * page.
-				 *
-				 * Our call to lazy_vacuum_heap_page() will have considered if
-				 * it's possible to set all_visible/all_frozen independently
-				 * of lazy_scan_prune().  Note that prunestate was invalidated
-				 * by lazy_vacuum_heap_page() call.
-				 */
-				freespace = PageGetHeapFreeSpace(page);
-
-				UnlockReleaseBuffer(buf);
-				RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-
-				/*
-				 * Periodically perform FSM vacuuming to make newly-freed
-				 * space visible on upper FSM pages.
-				 */
-				if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
-				{
-					FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-											blkno);
-					next_fsm_block_to_vacuum = blkno;
-				}
-
-				continue;
-			}
-
-			/*
-			 * There was no call to lazy_vacuum_heap_page() because pruning
-			 * didn't encounter/create any LP_DEAD items that needed to be
-			 * vacuumed.  Prune state has not been invalidated, so proceed
-			 * with prunestate-driven visibility map and FSM steps (just like
-			 * the two-pass strategy).
-			 */
-			Assert(dead_items->num_items == 0);
-		}
-
 		/*
 		 * Handle setting visibility map bit based on information from the VM
 		 * (as of last lazy_scan_skip() call), and from prunestate
@@ -1189,38 +1137,45 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
-		 * FSM
+		 * FSM.
+		 *
+		 * If we will likely do index vacuuming, wait until
+		 * lazy_vacuum_heap_rel() to save free space. This doesn't just save
+		 * us some cycles; it also allows us to record any additional free
+		 * space that lazy_vacuum_heap_page() will make available in cases
+		 * where it's possible to truncate the page's line pointer array.
+		 *
+		 * Note: It's not in fact 100% certain that we really will call
+		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
+		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
+		 * because it only happens in emergencies, or when there is very
+		 * little free space anyway. (Besides, we start recording free space
+		 * in the FSM once index vacuuming has been abandoned.)
 		 */
-		if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
-		{
-			/*
-			 * Wait until lazy_vacuum_heap_rel() to save free space.  This
-			 * doesn't just save us some cycles; it also allows us to record
-			 * any additional free space that lazy_vacuum_heap_page() will
-			 * make available in cases where it's possible to truncate the
-			 * page's line pointer array.
-			 *
-			 * Note: It's not in fact 100% certain that we really will call
-			 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip
-			 * index vacuuming (and so must skip heap vacuuming).  This is
-			 * deemed okay because it only happens in emergencies, or when
-			 * there is very little free space anyway. (Besides, we start
-			 * recording free space in the FSM once index vacuuming has been
-			 * abandoned.)
-			 *
-			 * Note: The one-pass (no indexes) case is only supposed to make
-			 * it this far when there were no LP_DEAD items during pruning.
-			 */
-			Assert(vacrel->nindexes > 0);
-			UnlockReleaseBuffer(buf);
-		}
-		else
+		if (prunestate.recordfreespace)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
 			UnlockReleaseBuffer(buf);
 			RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
 		}
+		else
+			UnlockReleaseBuffer(buf);
+
+		/*
+		 * Periodically perform FSM vacuuming to make newly-freed space
+		 * visible on upper FSM pages. This is done after vacuuming if the
+		 * table has indexes.
+		 */
+		if (vacrel->nindexes == 0 &&
+			vacrel->tuples_deleted > tuples_already_deleted &&
+			(blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES))
+		{
+			FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
+									blkno);
+			next_fsm_block_to_vacuum = blkno;
+		}
+
 	}
 
 	vacrel->blkno = InvalidBlockNumber;
@@ -1544,6 +1499,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				live_tuples,
 				recently_dead_tuples;
 	HeapPageFreeze pagefrz;
+	bool		pronto_reap;
 	bool		hastup = false;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1569,6 +1525,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	live_tuples = 0;
 	recently_dead_tuples = 0;
 
+	pronto_reap = vacrel->nindexes == 0;
+
 	/*
 	 * Prune all HOT-update chains in this page.
 	 *
@@ -1577,7 +1535,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * lpdead_items's final value can be thought of as the number of tuples
 	 * that were deleted from indexes.
 	 */
-	heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum);
+	heap_page_prune(rel, buf, vacrel->vistest, pronto_reap,
+					&presult, &vacrel->offnum);
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
@@ -1587,6 +1546,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	prunestate->all_visible = true;
 	prunestate->all_frozen = true;
 	prunestate->visibility_cutoff_xid = InvalidTransactionId;
+	prunestate->recordfreespace = false;
 
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
@@ -1908,6 +1868,15 @@ lazy_scan_prune(LVRelState *vacrel,
 	vacrel->live_tuples += live_tuples;
 	vacrel->recently_dead_tuples += recently_dead_tuples;
 
+	/*
+	 * If we will not do index vacuuming, either because we have no indexes,
+	 * because there is nothing to vacuum, or because do_index_vacuuming is
+	 * false, make sure we update the freespace map.
+	 */
+	if (vacrel->nindexes == 0 ||
+		!vacrel->do_index_vacuuming || lpdead_items == 0)
+		prunestate->recordfreespace = true;
+
 	/* Remember the location of the last page with nonremovable tuples */
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
@@ -1929,7 +1898,8 @@ lazy_scan_prune(LVRelState *vacrel,
  * callers.
  *
  * recordfreespace flag instructs caller on whether or not it should do
- * generic FSM processing for page.
+ * generic FSM processing for page. vacrel is updated with page-level counts
+ * and to indicate whether or not rel truncation is safe.
  */
 static bool
 lazy_scan_noprune(LVRelState *vacrel,
@@ -2512,7 +2482,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
 
-	Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
+	Assert(vacrel->do_index_vacuuming);
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2d7a0ea72..1c731dded6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -320,9 +320,10 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
+							bool pronto_reap,
 							PruneResult *presult,
 							OffsetNumber *off_loc);
-extern void heap_page_prune_execute(Buffer buffer,
+extern void heap_page_prune_execute(Buffer buffer, bool pronto_reap,
 									OffsetNumber *redirected, int nredirected,
 									OffsetNumber *nowdead, int ndead,
 									OffsetNumber *nowunused, int nunused);
-- 
2.37.2

v1-0001-Release-lock-on-heap-buffer-before-vacuuming-FSM.patchapplication/x-patch; name=v1-0001-Release-lock-on-heap-buffer-before-vacuuming-FSM.patchDownload
From 85ffd55860ebc23f6570033a11db509cf2a00346 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Nov 2023 16:39:25 -0500
Subject: [PATCH v1 1/3] Release lock on heap buffer before vacuuming FSM

When there are no indexes on a table, we vacuum each heap block after
pruning it and then update the freespace map. Periodically, we also
vacuum the freespace map. This was done while unnecessarily holding a
lock on the heap page. Move down the call to FreeSpaceMapVacuumRange()
so that we have already recorded any newly freed space and released the
lock on the heap page.
---
 src/backend/access/heap/vacuumlazy.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6985d299b2..8b729828ce 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1046,18 +1046,6 @@ lazy_scan_heap(LVRelState *vacrel)
 				/* Forget the LP_DEAD items that we just vacuumed */
 				dead_items->num_items = 0;
 
-				/*
-				 * Periodically perform FSM vacuuming to make newly-freed
-				 * space visible on upper FSM pages.  Note we have not yet
-				 * performed FSM processing for blkno.
-				 */
-				if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
-				{
-					FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-											blkno);
-					next_fsm_block_to_vacuum = blkno;
-				}
-
 				/*
 				 * Now perform FSM processing for blkno, and move on to next
 				 * page.
@@ -1071,6 +1059,18 @@ lazy_scan_heap(LVRelState *vacrel)
 
 				UnlockReleaseBuffer(buf);
 				RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
+
+				/*
+				 * Periodically perform FSM vacuuming to make newly-freed
+				 * space visible on upper FSM pages.
+				 */
+				if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+				{
+					FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
+											blkno);
+					next_fsm_block_to_vacuum = blkno;
+				}
+
 				continue;
 			}
 
-- 
2.37.2

v1-0002-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patchapplication/x-patch; name=v1-0002-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patchDownload
From dd38d0152b7d0044eb8a82e6aa27239a8c071da2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Nov 2023 16:47:08 -0500
Subject: [PATCH v1 2/3] Indicate rel truncation unsafe in lazy_scan[no]prune

Both lazy_scan_prune() and lazy_scan_noprune() must determine whether or
not there are tuples on the page making rel truncation unsafe.
LVRelState->nonempty_pages is updated to reflect this. Previously, both
functions set an output parameter or output parameter member, hastup, to
indicate that nonempty_pages should be updated to reflect the latest
non-removable page. There doesn't seem to be any reason to wait until
lazy_scan_[no]prune() returns to update nonempty_pages. Plenty of other
counters in the LVRelState are updated in lazy_scan_[no]prune().
This allows us to get rid of the output parameter hastup.
---
 src/backend/access/heap/vacuumlazy.c | 50 +++++++++++++++-------------
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8b729828ce..379f693280 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -217,7 +217,6 @@ typedef struct LVRelState
  */
 typedef struct LVPagePruneState
 {
-	bool		hastup;			/* Page prevents rel truncation? */
 	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
 
 	/*
@@ -253,7 +252,7 @@ static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							LVPagePruneState *prunestate);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
-							  bool *hastup, bool *recordfreespace);
+							  bool *recordfreespace);
 static void lazy_vacuum(LVRelState *vacrel);
 static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
 static void lazy_vacuum_heap_rel(LVRelState *vacrel);
@@ -959,8 +958,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		hastup,
-						recordfreespace;
+			bool		recordfreespace;
 
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
@@ -972,20 +970,21 @@ lazy_scan_heap(LVRelState *vacrel)
 				continue;
 			}
 
-			/* Collect LP_DEAD items in dead_items array, count tuples */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page, &hastup,
+			/*
+			 * Collect LP_DEAD items in dead_items array, count tuples,
+			 * determine if rel truncation is safe
+			 */
+			if (lazy_scan_noprune(vacrel, buf, blkno, page,
 								  &recordfreespace))
 			{
 				Size		freespace = 0;
 
 				/*
 				 * Processed page successfully (without cleanup lock) -- just
-				 * need to perform rel truncation and FSM steps, much like the
-				 * lazy_scan_prune case.  Don't bother trying to match its
-				 * visibility map setting steps, though.
+				 * need to update the FSM, much like the lazy_scan_prune case.
+				 * Don't bother trying to match its visibility map setting
+				 * steps, though.
 				 */
-				if (hastup)
-					vacrel->nonempty_pages = blkno + 1;
 				if (recordfreespace)
 					freespace = PageGetHeapFreeSpace(page);
 				UnlockReleaseBuffer(buf);
@@ -1023,10 +1022,6 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
-		/* Remember the location of the last page with nonremovable tuples */
-		if (prunestate.hastup)
-			vacrel->nonempty_pages = blkno + 1;
-
 		if (vacrel->nindexes == 0)
 		{
 			/*
@@ -1549,6 +1544,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				live_tuples,
 				recently_dead_tuples;
 	HeapPageFreeze pagefrz;
+	bool		hastup = false;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1587,7 +1583,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * Now scan the page to collect LP_DEAD items and check for tuples
 	 * requiring freezing among remaining tuples with storage
 	 */
-	prunestate->hastup = false;
 	prunestate->has_lpdead_items = false;
 	prunestate->all_visible = true;
 	prunestate->all_frozen = true;
@@ -1614,7 +1609,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		if (ItemIdIsRedirected(itemid))
 		{
 			/* page makes rel truncation unsafe */
-			prunestate->hastup = true;
+			hastup = true;
 			continue;
 		}
 
@@ -1744,7 +1739,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				break;
 		}
 
-		prunestate->hastup = true;	/* page makes rel truncation unsafe */
+		hastup = true;			/* page makes rel truncation unsafe */
 
 		/* Tuple with storage -- consider need to freeze */
 		if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
@@ -1912,6 +1907,10 @@ lazy_scan_prune(LVRelState *vacrel,
 	vacrel->lpdead_items += lpdead_items;
 	vacrel->live_tuples += live_tuples;
 	vacrel->recently_dead_tuples += recently_dead_tuples;
+
+	/* Remember the location of the last page with nonremovable tuples */
+	if (hastup)
+		vacrel->nonempty_pages = blkno + 1;
 }
 
 /*
@@ -1929,7 +1928,6 @@ lazy_scan_prune(LVRelState *vacrel,
  * one or more tuples on the page.  We always return true for non-aggressive
  * callers.
  *
- * See lazy_scan_prune for an explanation of hastup return flag.
  * recordfreespace flag instructs caller on whether or not it should do
  * generic FSM processing for page.
  */
@@ -1938,7 +1936,6 @@ lazy_scan_noprune(LVRelState *vacrel,
 				  Buffer buf,
 				  BlockNumber blkno,
 				  Page page,
-				  bool *hastup,
 				  bool *recordfreespace)
 {
 	OffsetNumber offnum,
@@ -1947,6 +1944,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 				live_tuples,
 				recently_dead_tuples,
 				missed_dead_tuples;
+	bool		hastup;
 	HeapTupleHeader tupleheader;
 	TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
 	MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid;
@@ -1954,7 +1952,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
-	*hastup = false;			/* for now */
+	hastup = false;				/* for now */
 	*recordfreespace = false;	/* for now */
 
 	lpdead_items = 0;
@@ -1978,7 +1976,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 
 		if (ItemIdIsRedirected(itemid))
 		{
-			*hastup = true;
+			hastup = true;
 			continue;
 		}
 
@@ -1992,7 +1990,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 			continue;
 		}
 
-		*hastup = true;			/* page prevents rel truncation */
+		hastup = true;			/* page prevents rel truncation */
 		tupleheader = (HeapTupleHeader) PageGetItem(page, itemid);
 		if (heap_tuple_should_freeze(tupleheader, &vacrel->cutoffs,
 									 &NoFreezePageRelfrozenXid,
@@ -2094,7 +2092,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 			 * but it beats having to maintain specialized heap vacuuming code
 			 * forever, for vanishingly little benefit.)
 			 */
-			*hastup = true;
+			hastup = true;
 			missed_dead_tuples += lpdead_items;
 		}
 
@@ -2150,6 +2148,10 @@ lazy_scan_noprune(LVRelState *vacrel,
 	if (missed_dead_tuples > 0)
 		vacrel->missed_dead_pages++;
 
+	/* rel truncation is unsafe */
+	if (hastup)
+		vacrel->nonempty_pages = blkno + 1;
+
 	/* Caller won't need to call lazy_scan_prune with same page */
 	return true;
 }
-- 
2.37.2

In reply to: Melanie Plageman (#1)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Mon, Nov 13, 2023 at 2:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I think it also makes it clear that we should update the VM in
lazy_scan_prune(). All callers of lazy_scan_prune() will now consider
updating the VM after returning. And most of the state communicated back
to lazy_scan_heap() from lazy_scan_prune() is to inform it whether or
not to update the VM.

That makes sense.

I didn't do that in this patch set because I would
need to pass all_visible_according_to_vm to lazy_scan_prune() and that
change didn't seem worth the improvement in code clarity in
lazy_scan_heap().

Have you thought about finding a way to get rid of
all_visible_according_to_vm? (Not necessarily in the scope of the
ongoing work, just in general.)

all_visible_according_to_vm is inherently prone to races -- it tells
us what the VM used to say about the page, back when we looked. It is
very likely that the page isn't visible in this sense, anyway, because
VACUUM is after all choosing to scan the page in the first place when
we end up in lazy_scan_prune. (Granted, this is much less true than it
should be due to the influence of SKIP_PAGES_THRESHOLD, which
implements a weird and inefficient form of prefetching/readahead.)

Why should we necessarily care what all_visible_according_to_vm says
or would say at this point? We're looking at the heap page itself,
which is more authoritative than the VM (theoretically they're equally
authoritative, but not really, not when push comes to shove). The best
answer that I can think of is that all_visible_according_to_vm gives
us a way of noticing and then logging any inconsistencies between VM
and heap that we might end up "repairing" in passing (once we've
rechecked the VM). But maybe that could happen elsewhere.

Perhaps that cross-check could be pushed into visibilitymap.c, when we
go to set a !PageIsAllVisible(page) page all-visible in the VM. If
we're setting it and find that it's already set from earlier on, then
remember and complaing/LOG it. No all_visible_according_to_vm
required, plus this seems like it might be more thorough.

Just a thought.

--
Peter Geoghegan

#3Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#2)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Mon, Nov 13, 2023 at 5:59 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Nov 13, 2023 at 2:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I think it also makes it clear that we should update the VM in
lazy_scan_prune(). All callers of lazy_scan_prune() will now consider
updating the VM after returning. And most of the state communicated back
to lazy_scan_heap() from lazy_scan_prune() is to inform it whether or
not to update the VM.

That makes sense.

I didn't do that in this patch set because I would
need to pass all_visible_according_to_vm to lazy_scan_prune() and that
change didn't seem worth the improvement in code clarity in
lazy_scan_heap().

Have you thought about finding a way to get rid of
all_visible_according_to_vm? (Not necessarily in the scope of the
ongoing work, just in general.)

all_visible_according_to_vm is inherently prone to races -- it tells
us what the VM used to say about the page, back when we looked. It is
very likely that the page isn't visible in this sense, anyway, because
VACUUM is after all choosing to scan the page in the first place when
we end up in lazy_scan_prune. (Granted, this is much less true than it
should be due to the influence of SKIP_PAGES_THRESHOLD, which
implements a weird and inefficient form of prefetching/readahead.)

Why should we necessarily care what all_visible_according_to_vm says
or would say at this point? We're looking at the heap page itself,
which is more authoritative than the VM (theoretically they're equally
authoritative, but not really, not when push comes to shove). The best
answer that I can think of is that all_visible_according_to_vm gives
us a way of noticing and then logging any inconsistencies between VM
and heap that we might end up "repairing" in passing (once we've
rechecked the VM). But maybe that could happen elsewhere.

Perhaps that cross-check could be pushed into visibilitymap.c, when we
go to set a !PageIsAllVisible(page) page all-visible in the VM. If
we're setting it and find that it's already set from earlier on, then
remember and complaing/LOG it. No all_visible_according_to_vm
required, plus this seems like it might be more thorough.

Setting aside the data corruption cases (the repairing you mentioned), I
think the primary case that all_visible_according_to_vm seeks to protect
us from is if the page was already marked all visible in the VM and
pruning did not change that. So, it avoids a call to visibilitymap_set()
when the page was known to already be set all visible (as long as we
didn't newly set all frozen).

As you say, this does not seem like the common case. Pages we vacuum
will most often not be all visible.

I actually wonder if it is worse to rely on that old value of all
visible and then call visibilitymap_set(), which takes an exclusive lock
on the VM page, than it would be to just call visibilitymap_get_status()
anew. This is what we do with all frozen.

The problem is that it is kind of hard to tell because the whole thing
is a bit of a tangled knot.

I've ripped this code apart and put it back together again six ways from
Sunday trying to get rid of all_visible_according_to_vm and
next_unskippable_allvis/skipping_current_range's bootleg prefetching
without incurring any additional calls to visibilitymap_get_status().

As best I can tell, our best case scenario is Thomas' streaming read API
goes in, we add vacuum as a user, and we can likely remove the skip
range logic.

Then, the question remains about how useful it is to save the visibility
map statuses we got when deciding whether or not to skip a block.

- Melanie

[1]: /messages/by-id/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com

#4Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#1)
2 attachment(s)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Mon, Nov 13, 2023 at 5:28 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

When there are no indexes on the relation, we can set would-be dead
items LP_UNUSED and remove them during pruning. This saves us a vacuum
WAL record, reducing WAL volume (and time spent writing and syncing
WAL).

...

Note that (on principle) this patch set is on top of the bug fix I
proposed in [1].

[1] /messages/by-id/CAAKRu_YiL=44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ@mail.gmail.com

Rebased on top of fix in b2e237afddc56a and registered for the january fest
https://commitfest.postgresql.org/46/4665/

- Melanie

Attachments:

v2-0002-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchDownload
From ad07d577d7309bc0603357da3548c249c580bd1b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Nov 2023 14:10:05 -0500
Subject: [PATCH v2 2/2] Set would-be dead items LP_UNUSED while pruning

If there are no indexes on a relation, items can be marked LP_UNUSED
instead of LP_DEAD during lazy_scan_prune(). This avoids a separate
invocation of lazy_vacuum_heap_page() and saves a vacuum WAL record.

To accomplish this, pass lazy_scan_prune() a new parameter, pronto_reap,
which indicates that dead line pointers should be set to LP_UNUSED
during pruning, allowing earlier reaping of tuples.

Because we don't update the freespace map until after dropping the lock
on the buffer and we need the lock while we update the visibility map,
save our intent to update the freespace map in
LVPagePruneState.recordfreespace.

Discussion: https://postgr.es/m/CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw%40mail.gmail.com
---
 src/backend/access/heap/heapam.c     |   9 +-
 src/backend/access/heap/pruneheap.c  |  77 +++++++++++---
 src/backend/access/heap/vacuumlazy.c | 146 ++++++++++-----------------
 src/include/access/heapam.h          |   3 +-
 4 files changed, 125 insertions(+), 110 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14de8158d49..b578c32eeb6 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8803,8 +8803,13 @@ heap_xlog_prune(XLogReaderState *record)
 		nunused = (end - nowunused);
 		Assert(nunused >= 0);
 
-		/* Update all line pointers per the record, and repair fragmentation */
-		heap_page_prune_execute(buffer,
+		/*
+		 * Update all line pointers per the record, and repair fragmentation.
+		 * We always pass pronto_reap as true, because we don't know whether
+		 * or not this option was used when pruning. This reduces the
+		 * validation done on replay in an assert build.
+		 */
+		heap_page_prune_execute(buffer, true,
 								redirected, nredirected,
 								nowdead, ndead,
 								nowunused, nunused);
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c5f1abd95a9..750a03c9259 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -35,6 +35,8 @@ typedef struct
 
 	/* tuple visibility test, initialized for the relation */
 	GlobalVisState *vistest;
+	/* whether or not dead items can be set LP_UNUSED during pruning */
+	bool		pronto_reap;
 
 	TransactionId new_prune_xid;	/* new prune hint value for page */
 	TransactionId snapshotConflictHorizon;	/* latest xid removed */
@@ -148,7 +150,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			PruneResult presult;
 
-			heap_page_prune(relation, buffer, vistest, &presult, NULL);
+			heap_page_prune(relation, buffer, vistest, false,
+							&presult, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -193,6 +196,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * (see heap_prune_satisfies_vacuum and
  * HeapTupleSatisfiesVacuum).
  *
+ * pronto_reap indicates whether or not dead items can be set LP_UNUSED during
+ * pruning.
+ *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -203,6 +209,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 void
 heap_page_prune(Relation relation, Buffer buffer,
 				GlobalVisState *vistest,
+				bool pronto_reap,
 				PruneResult *presult,
 				OffsetNumber *off_loc)
 {
@@ -227,6 +234,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 	prstate.new_prune_xid = InvalidTransactionId;
 	prstate.rel = relation;
 	prstate.vistest = vistest;
+	prstate.pronto_reap = pronto_reap;
 	prstate.snapshotConflictHorizon = InvalidTransactionId;
 	prstate.nredirected = prstate.ndead = prstate.nunused = 0;
 	memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -306,9 +314,9 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (off_loc)
 			*off_loc = offnum;
 
-		/* Nothing to do if slot is empty or already dead */
+		/* Nothing to do if slot is empty */
 		itemid = PageGetItemId(page, offnum);
-		if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+		if (!ItemIdIsUsed(itemid))
 			continue;
 
 		/* Process this item or chain of items */
@@ -330,7 +338,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 		 * Apply the planned item changes, then repair page fragmentation, and
 		 * update the page's hint bit about whether it has free line pointers.
 		 */
-		heap_page_prune_execute(buffer,
+		heap_page_prune_execute(buffer, prstate.pronto_reap,
 								prstate.redirected, prstate.nredirected,
 								prstate.nowdead, prstate.ndead,
 								prstate.nowunused, prstate.nunused);
@@ -581,7 +589,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * function.)
 		 */
 		if (ItemIdIsDead(lp))
+		{
+			/*
+			 * If the relation has no indexes, we can set dead line pointers
+			 * LP_UNUSED now. We don't increment ndeleted here since the LP
+			 * was already marked dead.
+			 */
+			if (prstate->pronto_reap)
+				heap_prune_record_unused(prstate, offnum);
+
 			break;
+		}
 
 		Assert(ItemIdIsNormal(lp));
 		htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -715,7 +733,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * redirect the root to the correct chain member.
 		 */
 		if (i >= nchain)
-			heap_prune_record_dead(prstate, rootoffnum);
+		{
+			/*
+			 * If the relation has no indexes, we can remove dead tuples
+			 * during pruning instead of marking their line pointers dead. Set
+			 * this tuple's line pointer LP_UNUSED.
+			 */
+			if (prstate->pronto_reap)
+				heap_prune_record_unused(prstate, rootoffnum);
+			else
+				heap_prune_record_dead(prstate, rootoffnum);
+		}
 		else
 			heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
 	}
@@ -726,9 +754,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * item.  This can happen if the loop in heap_page_prune caused us to
 		 * visit the dead successor of a redirect item before visiting the
 		 * redirect item.  We can clean up by setting the redirect item to
-		 * DEAD state.
+		 * DEAD state. If pronto_reap is true, we can set it LP_UNUSED now.
 		 */
-		heap_prune_record_dead(prstate, rootoffnum);
+		if (prstate->pronto_reap)
+			heap_prune_record_unused(prstate, rootoffnum);
+		else
+			heap_prune_record_dead(prstate, rootoffnum);
 	}
 
 	return ndeleted;
@@ -792,7 +823,7 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
  * buffer.
  */
 void
-heap_page_prune_execute(Buffer buffer,
+heap_page_prune_execute(Buffer buffer, bool pronto_reap,
 						OffsetNumber *redirected, int nredirected,
 						OffsetNumber *nowdead, int ndead,
 						OffsetNumber *nowunused, int nunused)
@@ -902,14 +933,28 @@ heap_page_prune_execute(Buffer buffer,
 
 #ifdef USE_ASSERT_CHECKING
 
-		/*
-		 * Only heap-only tuples can become LP_UNUSED during pruning.  They
-		 * don't need to be left in place as LP_DEAD items until VACUUM gets
-		 * around to doing index vacuuming.
-		 */
-		Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
-		htup = (HeapTupleHeader) PageGetItem(page, lp);
-		Assert(HeapTupleHeaderIsHeapOnly(htup));
+		if (pronto_reap)
+		{
+			/*
+			 * If the relation has no indexes, we may set any of LP_NORMAL,
+			 * LP_REDIRECT, or LP_DEAD items to LP_UNUSED during pruning. We
+			 * can't check much here except that, if the item is LP_NORMAL, it
+			 * should have storage before it is set LP_UNUSED.
+			 */
+			Assert(!ItemIdIsNormal(lp) || ItemIdHasStorage(lp));
+		}
+		else
+		{
+			/*
+			 * If the relation has indexes, only heap-only tuples can become
+			 * LP_UNUSED during pruning. They don't need to be left in place
+			 * as LP_DEAD items until VACUUM gets around to doing index
+			 * vacuuming.
+			 */
+			Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
+			htup = (HeapTupleHeader) PageGetItem(page, lp);
+			Assert(HeapTupleHeaderIsHeapOnly(htup));
+		}
 #endif
 
 		ItemIdSetUnused(lp);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d158cccd00d..26e1ea110f4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -218,6 +218,8 @@ typedef struct LVRelState
 typedef struct LVPagePruneState
 {
 	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
+	/* whether or not caller should do FSM processing for block */
+	bool		recordfreespace;
 
 	/*
 	 * State describes the proper VM bit states to set for the page following
@@ -829,6 +831,7 @@ lazy_scan_heap(LVRelState *vacrel)
 				next_fsm_block_to_vacuum = 0;
 	VacDeadItems *dead_items = vacrel->dead_items;
 	Buffer		vmbuffer = InvalidBuffer;
+	int			tuples_already_deleted;
 	bool		next_unskippable_allvis,
 				skipping_current_range;
 	const int	initprog_index[] = {
@@ -1009,6 +1012,8 @@ lazy_scan_heap(LVRelState *vacrel)
 			continue;
 		}
 
+		tuples_already_deleted = vacrel->tuples_deleted;
+
 		/*
 		 * Prune, freeze, and count tuples.
 		 *
@@ -1022,69 +1027,6 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
-		if (vacrel->nindexes == 0)
-		{
-			/*
-			 * Consider the need to do page-at-a-time heap vacuuming when
-			 * using the one-pass strategy now.
-			 *
-			 * The one-pass strategy will never call lazy_vacuum().  The steps
-			 * performed here can be thought of as the one-pass equivalent of
-			 * a call to lazy_vacuum().
-			 */
-			if (prunestate.has_lpdead_items)
-			{
-				Size		freespace;
-
-				lazy_vacuum_heap_page(vacrel, blkno, buf, 0, vmbuffer);
-
-				/* Forget the LP_DEAD items that we just vacuumed */
-				dead_items->num_items = 0;
-
-				/*
-				 * Now perform FSM processing for blkno, and move on to next
-				 * page.
-				 *
-				 * Our call to lazy_vacuum_heap_page() will have considered if
-				 * it's possible to set all_visible/all_frozen independently
-				 * of lazy_scan_prune().  Note that prunestate was invalidated
-				 * by lazy_vacuum_heap_page() call.
-				 */
-				freespace = PageGetHeapFreeSpace(page);
-
-				UnlockReleaseBuffer(buf);
-				RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-
-				/*
-				 * Periodically perform FSM vacuuming to make newly-freed
-				 * space visible on upper FSM pages. FreeSpaceMapVacuumRange()
-				 * vacuums the portion of the freespace map covering heap
-				 * pages from start to end - 1. Include the block we just
-				 * vacuumed by passing it blkno + 1. Overflow isn't an issue
-				 * because MaxBlockNumber + 1 is InvalidBlockNumber which
-				 * causes FreeSpaceMapVacuumRange() to vacuum freespace map
-				 * pages covering the remainder of the relation.
-				 */
-				if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
-				{
-					FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-											blkno + 1);
-					next_fsm_block_to_vacuum = blkno + 1;
-				}
-
-				continue;
-			}
-
-			/*
-			 * There was no call to lazy_vacuum_heap_page() because pruning
-			 * didn't encounter/create any LP_DEAD items that needed to be
-			 * vacuumed.  Prune state has not been invalidated, so proceed
-			 * with prunestate-driven visibility map and FSM steps (just like
-			 * the two-pass strategy).
-			 */
-			Assert(dead_items->num_items == 0);
-		}
-
 		/*
 		 * Handle setting visibility map bit based on information from the VM
 		 * (as of last lazy_scan_skip() call), and from prunestate
@@ -1195,38 +1137,45 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
-		 * FSM
+		 * FSM.
+		 *
+		 * If we will likely do index vacuuming, wait until
+		 * lazy_vacuum_heap_rel() to save free space. This doesn't just save
+		 * us some cycles; it also allows us to record any additional free
+		 * space that lazy_vacuum_heap_page() will make available in cases
+		 * where it's possible to truncate the page's line pointer array.
+		 *
+		 * Note: It's not in fact 100% certain that we really will call
+		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
+		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
+		 * because it only happens in emergencies, or when there is very
+		 * little free space anyway. (Besides, we start recording free space
+		 * in the FSM once index vacuuming has been abandoned.)
 		 */
-		if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
-		{
-			/*
-			 * Wait until lazy_vacuum_heap_rel() to save free space.  This
-			 * doesn't just save us some cycles; it also allows us to record
-			 * any additional free space that lazy_vacuum_heap_page() will
-			 * make available in cases where it's possible to truncate the
-			 * page's line pointer array.
-			 *
-			 * Note: It's not in fact 100% certain that we really will call
-			 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip
-			 * index vacuuming (and so must skip heap vacuuming).  This is
-			 * deemed okay because it only happens in emergencies, or when
-			 * there is very little free space anyway. (Besides, we start
-			 * recording free space in the FSM once index vacuuming has been
-			 * abandoned.)
-			 *
-			 * Note: The one-pass (no indexes) case is only supposed to make
-			 * it this far when there were no LP_DEAD items during pruning.
-			 */
-			Assert(vacrel->nindexes > 0);
-			UnlockReleaseBuffer(buf);
-		}
-		else
+		if (prunestate.recordfreespace)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
 			UnlockReleaseBuffer(buf);
 			RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
 		}
+		else
+			UnlockReleaseBuffer(buf);
+
+		/*
+		 * Periodically perform FSM vacuuming to make newly-freed space
+		 * visible on upper FSM pages. This is done after vacuuming if the
+		 * table has indexes.
+		 */
+		if (vacrel->nindexes == 0 &&
+			vacrel->tuples_deleted > tuples_already_deleted &&
+			(blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES))
+		{
+			FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
+									blkno);
+			next_fsm_block_to_vacuum = blkno;
+		}
+
 	}
 
 	vacrel->blkno = InvalidBlockNumber;
@@ -1550,6 +1499,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				live_tuples,
 				recently_dead_tuples;
 	HeapPageFreeze pagefrz;
+	bool		pronto_reap;
 	bool		hastup = false;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1575,6 +1525,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	live_tuples = 0;
 	recently_dead_tuples = 0;
 
+	pronto_reap = vacrel->nindexes == 0;
+
 	/*
 	 * Prune all HOT-update chains in this page.
 	 *
@@ -1583,7 +1535,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * lpdead_items's final value can be thought of as the number of tuples
 	 * that were deleted from indexes.
 	 */
-	heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum);
+	heap_page_prune(rel, buf, vacrel->vistest, pronto_reap,
+					&presult, &vacrel->offnum);
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
@@ -1593,6 +1546,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	prunestate->all_visible = true;
 	prunestate->all_frozen = true;
 	prunestate->visibility_cutoff_xid = InvalidTransactionId;
+	prunestate->recordfreespace = false;
 
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
@@ -1914,6 +1868,15 @@ lazy_scan_prune(LVRelState *vacrel,
 	vacrel->live_tuples += live_tuples;
 	vacrel->recently_dead_tuples += recently_dead_tuples;
 
+	/*
+	 * If we will not do index vacuuming, either because we have no indexes,
+	 * because there is nothing to vacuum, or because do_index_vacuuming is
+	 * false, make sure we update the freespace map.
+	 */
+	if (vacrel->nindexes == 0 ||
+		!vacrel->do_index_vacuuming || lpdead_items == 0)
+		prunestate->recordfreespace = true;
+
 	/* Remember the location of the last page with nonremovable tuples */
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
@@ -1935,7 +1898,8 @@ lazy_scan_prune(LVRelState *vacrel,
  * callers.
  *
  * recordfreespace flag instructs caller on whether or not it should do
- * generic FSM processing for page.
+ * generic FSM processing for page. vacrel is updated with page-level counts
+ * and to indicate whether or not rel truncation is safe.
  */
 static bool
 lazy_scan_noprune(LVRelState *vacrel,
@@ -2518,7 +2482,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
 
-	Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
+	Assert(vacrel->do_index_vacuuming);
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2d7a0ea72f..1c731dded65 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -320,9 +320,10 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
+							bool pronto_reap,
 							PruneResult *presult,
 							OffsetNumber *off_loc);
-extern void heap_page_prune_execute(Buffer buffer,
+extern void heap_page_prune_execute(Buffer buffer, bool pronto_reap,
 									OffsetNumber *redirected, int nredirected,
 									OffsetNumber *nowdead, int ndead,
 									OffsetNumber *nowunused, int nunused);
-- 
2.37.2

v2-0001-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patchDownload
From 608658f2cbc0acde55aac815c0fdb523ec24c452 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Nov 2023 16:47:08 -0500
Subject: [PATCH v2 1/2] Indicate rel truncation unsafe in lazy_scan[no]prune

Both lazy_scan_prune() and lazy_scan_noprune() must determine whether or
not there are tuples on the page making rel truncation unsafe.
LVRelState->nonempty_pages is updated to reflect this. Previously, both
functions set an output parameter or output parameter member, hastup, to
indicate that nonempty_pages should be updated to reflect the latest
non-removable page. There doesn't seem to be any reason to wait until
lazy_scan_[no]prune() returns to update nonempty_pages. Plenty of other
counters in the LVRelState are updated in lazy_scan_[no]prune().
This allows us to get rid of the output parameter hastup.
---
 src/backend/access/heap/vacuumlazy.c | 50 +++++++++++++++-------------
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 59f51f40e1b..d158cccd00d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -217,7 +217,6 @@ typedef struct LVRelState
  */
 typedef struct LVPagePruneState
 {
-	bool		hastup;			/* Page prevents rel truncation? */
 	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
 
 	/*
@@ -253,7 +252,7 @@ static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							LVPagePruneState *prunestate);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
-							  bool *hastup, bool *recordfreespace);
+							  bool *recordfreespace);
 static void lazy_vacuum(LVRelState *vacrel);
 static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
 static void lazy_vacuum_heap_rel(LVRelState *vacrel);
@@ -959,8 +958,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		hastup,
-						recordfreespace;
+			bool		recordfreespace;
 
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
@@ -972,20 +970,21 @@ lazy_scan_heap(LVRelState *vacrel)
 				continue;
 			}
 
-			/* Collect LP_DEAD items in dead_items array, count tuples */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page, &hastup,
+			/*
+			 * Collect LP_DEAD items in dead_items array, count tuples,
+			 * determine if rel truncation is safe
+			 */
+			if (lazy_scan_noprune(vacrel, buf, blkno, page,
 								  &recordfreespace))
 			{
 				Size		freespace = 0;
 
 				/*
 				 * Processed page successfully (without cleanup lock) -- just
-				 * need to perform rel truncation and FSM steps, much like the
-				 * lazy_scan_prune case.  Don't bother trying to match its
-				 * visibility map setting steps, though.
+				 * need to update the FSM, much like the lazy_scan_prune case.
+				 * Don't bother trying to match its visibility map setting
+				 * steps, though.
 				 */
-				if (hastup)
-					vacrel->nonempty_pages = blkno + 1;
 				if (recordfreespace)
 					freespace = PageGetHeapFreeSpace(page);
 				UnlockReleaseBuffer(buf);
@@ -1023,10 +1022,6 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
-		/* Remember the location of the last page with nonremovable tuples */
-		if (prunestate.hastup)
-			vacrel->nonempty_pages = blkno + 1;
-
 		if (vacrel->nindexes == 0)
 		{
 			/*
@@ -1555,6 +1550,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				live_tuples,
 				recently_dead_tuples;
 	HeapPageFreeze pagefrz;
+	bool		hastup = false;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1593,7 +1589,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * Now scan the page to collect LP_DEAD items and check for tuples
 	 * requiring freezing among remaining tuples with storage
 	 */
-	prunestate->hastup = false;
 	prunestate->has_lpdead_items = false;
 	prunestate->all_visible = true;
 	prunestate->all_frozen = true;
@@ -1620,7 +1615,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		if (ItemIdIsRedirected(itemid))
 		{
 			/* page makes rel truncation unsafe */
-			prunestate->hastup = true;
+			hastup = true;
 			continue;
 		}
 
@@ -1750,7 +1745,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				break;
 		}
 
-		prunestate->hastup = true;	/* page makes rel truncation unsafe */
+		hastup = true;			/* page makes rel truncation unsafe */
 
 		/* Tuple with storage -- consider need to freeze */
 		if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
@@ -1918,6 +1913,10 @@ lazy_scan_prune(LVRelState *vacrel,
 	vacrel->lpdead_items += lpdead_items;
 	vacrel->live_tuples += live_tuples;
 	vacrel->recently_dead_tuples += recently_dead_tuples;
+
+	/* Remember the location of the last page with nonremovable tuples */
+	if (hastup)
+		vacrel->nonempty_pages = blkno + 1;
 }
 
 /*
@@ -1935,7 +1934,6 @@ lazy_scan_prune(LVRelState *vacrel,
  * one or more tuples on the page.  We always return true for non-aggressive
  * callers.
  *
- * See lazy_scan_prune for an explanation of hastup return flag.
  * recordfreespace flag instructs caller on whether or not it should do
  * generic FSM processing for page.
  */
@@ -1944,7 +1942,6 @@ lazy_scan_noprune(LVRelState *vacrel,
 				  Buffer buf,
 				  BlockNumber blkno,
 				  Page page,
-				  bool *hastup,
 				  bool *recordfreespace)
 {
 	OffsetNumber offnum,
@@ -1953,6 +1950,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 				live_tuples,
 				recently_dead_tuples,
 				missed_dead_tuples;
+	bool		hastup;
 	HeapTupleHeader tupleheader;
 	TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
 	MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid;
@@ -1960,7 +1958,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
-	*hastup = false;			/* for now */
+	hastup = false;				/* for now */
 	*recordfreespace = false;	/* for now */
 
 	lpdead_items = 0;
@@ -1984,7 +1982,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 
 		if (ItemIdIsRedirected(itemid))
 		{
-			*hastup = true;
+			hastup = true;
 			continue;
 		}
 
@@ -1998,7 +1996,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 			continue;
 		}
 
-		*hastup = true;			/* page prevents rel truncation */
+		hastup = true;			/* page prevents rel truncation */
 		tupleheader = (HeapTupleHeader) PageGetItem(page, itemid);
 		if (heap_tuple_should_freeze(tupleheader, &vacrel->cutoffs,
 									 &NoFreezePageRelfrozenXid,
@@ -2100,7 +2098,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 			 * but it beats having to maintain specialized heap vacuuming code
 			 * forever, for vanishingly little benefit.)
 			 */
-			*hastup = true;
+			hastup = true;
 			missed_dead_tuples += lpdead_items;
 		}
 
@@ -2156,6 +2154,10 @@ lazy_scan_noprune(LVRelState *vacrel,
 	if (missed_dead_tuples > 0)
 		vacrel->missed_dead_pages++;
 
+	/* rel truncation is unsafe */
+	if (hastup)
+		vacrel->nonempty_pages = blkno + 1;
+
 	/* Caller won't need to call lazy_scan_prune with same page */
 	return true;
 }
-- 
2.37.2

#5Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#4)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Nov 17, 2023 at 6:12 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Mon, Nov 13, 2023 at 5:28 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

When there are no indexes on the relation, we can set would-be dead
items LP_UNUSED and remove them during pruning. This saves us a vacuum
WAL record, reducing WAL volume (and time spent writing and syncing
WAL).

...

Note that (on principle) this patch set is on top of the bug fix I
proposed in [1].

[1] /messages/by-id/CAAKRu_YiL=44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ@mail.gmail.com

Rebased on top of fix in b2e237afddc56a and registered for the january fest
https://commitfest.postgresql.org/46/4665/

I got an off-list question about whether or not this codepath is
exercised in existing regression tests. It is -- vacuum.sql tests
include those which vacuum a table with no indexes and tuples that can
be deleted.

I also looked through [1]https://www.postgresql.org/docs/devel/routine-vacuuming.html to see if there were any user-facing docs
which happened to mention the exact implementation details of how and
when tuples are deleted by vacuum. I didn't see anything like that, so
I don't think there are user-facing docs which need updating.

- Melanie

[1]: https://www.postgresql.org/docs/devel/routine-vacuuming.html

#6Michael Paquier
michael@paquier.xyz
In reply to: Melanie Plageman (#3)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Mon, Nov 13, 2023 at 07:06:15PM -0500, Melanie Plageman wrote:

As best I can tell, our best case scenario is Thomas' streaming read API
goes in, we add vacuum as a user, and we can likely remove the skip
range logic.

This does not prevent the work you've been doing in 0001 and 0002
posted upthread, right? Some progress is always better than no
progress, and I can see the appeal behind doing 0001 actually to keep
the updates of the block numbers closer to where we determine if
relation truncation is safe of not rather than use an intermediate
state in LVPagePruneState.

0002 is much, much, much trickier..
--
Michael

#7Melanie Plageman
melanieplageman@gmail.com
In reply to: Michael Paquier (#6)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Sat, Dec 23, 2023 at 9:14 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Nov 13, 2023 at 07:06:15PM -0500, Melanie Plageman wrote:

As best I can tell, our best case scenario is Thomas' streaming read API
goes in, we add vacuum as a user, and we can likely remove the skip
range logic.

This does not prevent the work you've been doing in 0001 and 0002
posted upthread, right? Some progress is always better than no
progress

Correct. Peter and I were mainly discussing next refactoring steps as
we move toward combining the prune, freeze, and VM records. This
thread's patches stand alone.

I can see the appeal behind doing 0001 actually to keep
the updates of the block numbers closer to where we determine if
relation truncation is safe of not rather than use an intermediate
state in LVPagePruneState.

Exactly.

0002 is much, much, much trickier..

Do you have specific concerns about its correctness? I understand it
is an area where we have to be sure we are correct. But, to be fair,
that is true of all the pruning and vacuuming code.

- Melanie

#8Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#7)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Dec 27, 2023 at 11:27 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Do you have specific concerns about its correctness? I understand it
is an area where we have to be sure we are correct. But, to be fair,
that is true of all the pruning and vacuuming code.

I'm kind of concerned that 0002 might be a performance regression. It
pushes more branches down into the heap-pruning code, which I think
can sometimes be quite hot, for the sake of a case that rarely occurs
in practice. I take your point about it improving things when there
are no indexes, but what about when there are? And even if there are
no adverse performance consequences, is it really worth complicating
the logic at such a low level?

Also, I find "pronto_reap" to be a poor choice of name. "pronto" is an
informal word that seems to have no advantage over something like
"immediate" or "now," and I don't think "reap" has a precise,
universally-understood meaning. You could call this "mark_unused_now"
or "immediately_mark_unused" or something and it would be far more
self-documenting, IMHO.

--
Robert Haas
EDB: http://www.enterprisedb.com

#9Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#8)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Hi,

On 2024-01-04 12:31:36 -0500, Robert Haas wrote:

On Wed, Dec 27, 2023 at 11:27 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Do you have specific concerns about its correctness? I understand it
is an area where we have to be sure we are correct. But, to be fair,
that is true of all the pruning and vacuuming code.

I'm kind of concerned that 0002 might be a performance regression. It
pushes more branches down into the heap-pruning code, which I think
can sometimes be quite hot, for the sake of a case that rarely occurs
in practice.

I was wondering the same when talking about this with Melanie. But I guess
there are some scenarios that aren't unrealistic, consider e.g. bulk data
loading with COPY with an UPDATE to massage the data afterwards, before
creating the indexes.

Where I could see this becoming more interesting / powerful is if we were able
to do 'pronto reaping' not just from vacuum but also during on-access
pruning. For ETL workloads VACUUM might often not run between modifications of
the same page, but on-access pruning will. Being able to reclaim dead items at
that point seems like it could be pretty sizable improvement?

I take your point about it improving things when there are no indexes, but
what about when there are?

I suspect that branch prediction should take care of the additional
branches. heap_prune_chain() indeed can be very hot, but I think we're
primarily bottlenecked on data cache misses.

For a single VACUUM, whether we'd do pronto reaping would be constant -> very
well predictable. We could add an unlikely() to make sure the
branch-predictor-is-empty case optimizes for the much more common case of
having indexes. Falsely assuming we'd not pronto reap wouldn't be
particularly bad, as the wins for the case are so much bigger.

If we were to use pronto reaping for on-access pruning, it's perhaps a bit
less predictable, as pruning for pages of a relation with indexes could be
interleaved with pruning for relations without. But even there I suspect it'd
not be the primary bottleneck: We call heap_page_prune_chain() in a loop for
every tuple on a page, the branch predictor should quickly learn whether we're
using pronto reaping. Whereas we're not becoming less cache-miss heavy when
looking at subsequent tuples.

And even if there are no adverse performance consequences, is it really
worth complicating the logic at such a low level?

Yes, I think this is the main question here. It's not clear though that the
state after the patch is meaningfullye more complicated? It removes nontrivial
code from lazy_scan_heap() and pays for that with a bit more complexity in
heap_prune_chain().

Also, I find "pronto_reap" to be a poor choice of name. "pronto" is an
informal word that seems to have no advantage over something like
"immediate" or "now,"

I didn't like that either :)

and I don't think "reap" has a precise, universally-understood meaning.

Less concerned about that.

You could call this "mark_unused_now" or "immediately_mark_unused" or
something and it would be far more self-documenting, IMHO.

How about 'no_indexes' or such?

Greetings,

Andres Freund

#10Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#4)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Hi,

On 2023-11-17 18:12:08 -0500, Melanie Plageman wrote:

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14de8158d49..b578c32eeb6 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8803,8 +8803,13 @@ heap_xlog_prune(XLogReaderState *record)
nunused = (end - nowunused);
Assert(nunused >= 0);
-		/* Update all line pointers per the record, and repair fragmentation */
-		heap_page_prune_execute(buffer,
+		/*
+		 * Update all line pointers per the record, and repair fragmentation.
+		 * We always pass pronto_reap as true, because we don't know whether
+		 * or not this option was used when pruning. This reduces the
+		 * validation done on replay in an assert build.
+		 */

Hm, that seems not great. Both because we loose validation and because it
seems to invite problems down the line, due to pronto_reap falsely being set
to true in heap_page_prune_execute().

@@ -581,7 +589,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* function.)
*/
if (ItemIdIsDead(lp))
+		{
+			/*
+			 * If the relation has no indexes, we can set dead line pointers
+			 * LP_UNUSED now. We don't increment ndeleted here since the LP
+			 * was already marked dead.
+			 */
+			if (prstate->pronto_reap)
+				heap_prune_record_unused(prstate, offnum);
+
break;
+		}

I wasn't immediately sure whether this is reachable - but it is, e.g. after
on-access pruning (which currently doesn't yet use pronto reaping), after
pg_upgrade or dropping an index.

Assert(ItemIdIsNormal(lp));
htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -715,7 +733,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
-			heap_prune_record_dead(prstate, rootoffnum);
+		{
+			/*
+			 * If the relation has no indexes, we can remove dead tuples
+			 * during pruning instead of marking their line pointers dead. Set
+			 * this tuple's line pointer LP_UNUSED.
+			 */
+			if (prstate->pronto_reap)
+				heap_prune_record_unused(prstate, rootoffnum);
+			else
+				heap_prune_record_dead(prstate, rootoffnum);
+		}
else
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
}
@@ -726,9 +754,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* item.  This can happen if the loop in heap_page_prune caused us to
* visit the dead successor of a redirect item before visiting the
* redirect item.  We can clean up by setting the redirect item to
-		 * DEAD state.
+		 * DEAD state. If pronto_reap is true, we can set it LP_UNUSED now.
*/
-		heap_prune_record_dead(prstate, rootoffnum);
+		if (prstate->pronto_reap)
+			heap_prune_record_unused(prstate, rootoffnum);
+		else
+			heap_prune_record_dead(prstate, rootoffnum);
}

return ndeleted;

There's three new calls to heap_prune_record_unused() and the logic got more
nested. Is there a way to get to a nicer end result?

From 608658f2cbc0acde55aac815c0fdb523ec24c452 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Nov 2023 16:47:08 -0500
Subject: [PATCH v2 1/2] Indicate rel truncation unsafe in lazy_scan[no]prune

Both lazy_scan_prune() and lazy_scan_noprune() must determine whether or
not there are tuples on the page making rel truncation unsafe.
LVRelState->nonempty_pages is updated to reflect this. Previously, both
functions set an output parameter or output parameter member, hastup, to
indicate that nonempty_pages should be updated to reflect the latest
non-removable page. There doesn't seem to be any reason to wait until
lazy_scan_[no]prune() returns to update nonempty_pages. Plenty of other
counters in the LVRelState are updated in lazy_scan_[no]prune().
This allows us to get rid of the output parameter hastup.

@@ -972,20 +970,21 @@ lazy_scan_heap(LVRelState *vacrel)
continue;
}

-			/* Collect LP_DEAD items in dead_items array, count tuples */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page, &hastup,
+			/*
+			 * Collect LP_DEAD items in dead_items array, count tuples,
+			 * determine if rel truncation is safe
+			 */
+			if (lazy_scan_noprune(vacrel, buf, blkno, page,
&recordfreespace))
{
Size		freespace = 0;
/*
* Processed page successfully (without cleanup lock) -- just
-				 * need to perform rel truncation and FSM steps, much like the
-				 * lazy_scan_prune case.  Don't bother trying to match its
-				 * visibility map setting steps, though.
+				 * need to update the FSM, much like the lazy_scan_prune case.
+				 * Don't bother trying to match its visibility map setting
+				 * steps, though.
*/
-				if (hastup)
-					vacrel->nonempty_pages = blkno + 1;
if (recordfreespace)
freespace = PageGetHeapFreeSpace(page);
UnlockReleaseBuffer(buf);

The comment continues to say that we "determine if rel truncation is safe" -
but I don't see that? Oh, I see, it's done inside lazy_scan_noprune(). This
doesn't seem like a clear improvement to me. Particularly because it's only
set if lazy_scan_noprune() actually does something.

I don't like the existing code in lazy_scan_heap(). But this kinda seems like
tinkering around the edges, without getting to the heart of the issue. I think
we should

1) Move everything after ReadBufferExtended() and the end of the loop into its
own function

2) All the code in the loop body after the call to lazy_scan_prune() is
specific to the lazy_scan_prune() path, it doesn't make sense that it's at
the same level as the the calls to lazy_scan_noprune(),
lazy_scan_new_or_empty() or lazy_scan_prune(). Either it should be in
lazy_scan_prune() or a new wrapper function.

3) It's imo wrong that we have UnlockReleaseBuffer() (there are 6 different
places unlocking if I didn't miscount!) and RecordPageWithFreeSpace() calls
in this many places. I think this is largely a consequence of the previous
points. Once those are addressed, we can have one common place.

But I digress.

Greetings,

Andres Freund

#11Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#10)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Thanks for the review!

On Thu, Jan 4, 2024 at 3:03 PM Andres Freund <andres@anarazel.de> wrote:

On 2023-11-17 18:12:08 -0500, Melanie Plageman wrote:

Assert(ItemIdIsNormal(lp));
htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -715,7 +733,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
-                     heap_prune_record_dead(prstate, rootoffnum);
+             {
+                     /*
+                      * If the relation has no indexes, we can remove dead tuples
+                      * during pruning instead of marking their line pointers dead. Set
+                      * this tuple's line pointer LP_UNUSED.
+                      */
+                     if (prstate->pronto_reap)
+                             heap_prune_record_unused(prstate, rootoffnum);
+                     else
+                             heap_prune_record_dead(prstate, rootoffnum);
+             }
else
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
}
@@ -726,9 +754,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* item.  This can happen if the loop in heap_page_prune caused us to
* visit the dead successor of a redirect item before visiting the
* redirect item.  We can clean up by setting the redirect item to
-              * DEAD state.
+              * DEAD state. If pronto_reap is true, we can set it LP_UNUSED now.
*/
-             heap_prune_record_dead(prstate, rootoffnum);
+             if (prstate->pronto_reap)
+                     heap_prune_record_unused(prstate, rootoffnum);
+             else
+                     heap_prune_record_dead(prstate, rootoffnum);
}

return ndeleted;

There's three new calls to heap_prune_record_unused() and the logic got more
nested. Is there a way to get to a nicer end result?

So, I could do another loop through the line pointers in
heap_page_prune() (after the loop calling heap_prune_chain()) and, if
pronto_reap is true, set dead line pointers LP_UNUSED. Then, when
constructing the WAL record, I would just not add the prstate.nowdead
that I saved from heap_prune_chain() to the prune WAL record.

This would eliminate the extra if statements from heap_prune_chain().
It will be more performant than sticking with the original (master)
call to lazy_vacuum_heap_page(). However, I'm not convinced that the
extra loop to set line pointers LP_DEAD -> LP_UNUSED is less confusing
than keeping the if pronto_reap test in heap_prune_chain().
heap_prune_chain() is where line pointers' new values are decided. It
seems weird to pick one new value for a line pointer in
heap_prune_chain() and then pick another, different new value in a
loop after heap_prune_chain(). I don't see any way to eliminate the if
pronto_reap tests without a separate loop setting LP_DEAD->LP_UNUSED,
though.

From 608658f2cbc0acde55aac815c0fdb523ec24c452 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Nov 2023 16:47:08 -0500
Subject: [PATCH v2 1/2] Indicate rel truncation unsafe in lazy_scan[no]prune

Both lazy_scan_prune() and lazy_scan_noprune() must determine whether or
not there are tuples on the page making rel truncation unsafe.
LVRelState->nonempty_pages is updated to reflect this. Previously, both
functions set an output parameter or output parameter member, hastup, to
indicate that nonempty_pages should be updated to reflect the latest
non-removable page. There doesn't seem to be any reason to wait until
lazy_scan_[no]prune() returns to update nonempty_pages. Plenty of other
counters in the LVRelState are updated in lazy_scan_[no]prune().
This allows us to get rid of the output parameter hastup.

@@ -972,20 +970,21 @@ lazy_scan_heap(LVRelState *vacrel)
continue;
}

-                     /* Collect LP_DEAD items in dead_items array, count tuples */
-                     if (lazy_scan_noprune(vacrel, buf, blkno, page, &hastup,
+                     /*
+                      * Collect LP_DEAD items in dead_items array, count tuples,
+                      * determine if rel truncation is safe
+                      */
+                     if (lazy_scan_noprune(vacrel, buf, blkno, page,
&recordfreespace))
{
Size            freespace = 0;
/*
* Processed page successfully (without cleanup lock) -- just
-                              * need to perform rel truncation and FSM steps, much like the
-                              * lazy_scan_prune case.  Don't bother trying to match its
-                              * visibility map setting steps, though.
+                              * need to update the FSM, much like the lazy_scan_prune case.
+                              * Don't bother trying to match its visibility map setting
+                              * steps, though.
*/
-                             if (hastup)
-                                     vacrel->nonempty_pages = blkno + 1;
if (recordfreespace)
freespace = PageGetHeapFreeSpace(page);
UnlockReleaseBuffer(buf);

The comment continues to say that we "determine if rel truncation is safe" -
but I don't see that? Oh, I see, it's done inside lazy_scan_noprune(). This
doesn't seem like a clear improvement to me. Particularly because it's only
set if lazy_scan_noprune() actually does something.

I don't get what the last sentence means ("Particularly because...").
The new location of the hastup test in lazy_scan_noprune() is above an
unconditional return true, so it is also only set if
lazy_scan_noprune() actually does something. I think the
lazy_scan[]prune() functions shouldn't try to export the hastup
information to lazy_scan_heap(). It's confusing. We should be moving
all of the page-specific processing into the individual functions
instead of in the body of lazy_scan_heap().

I don't like the existing code in lazy_scan_heap(). But this kinda seems like
tinkering around the edges, without getting to the heart of the issue. I think
we should

1) Move everything after ReadBufferExtended() and the end of the loop into its
own function

2) All the code in the loop body after the call to lazy_scan_prune() is
specific to the lazy_scan_prune() path, it doesn't make sense that it's at
the same level as the the calls to lazy_scan_noprune(),
lazy_scan_new_or_empty() or lazy_scan_prune(). Either it should be in
lazy_scan_prune() or a new wrapper function.

3) It's imo wrong that we have UnlockReleaseBuffer() (there are 6 different
places unlocking if I didn't miscount!) and RecordPageWithFreeSpace() calls
in this many places. I think this is largely a consequence of the previous
points. Once those are addressed, we can have one common place.

I have other patches that do versions of all of the above, but they
didn't seem to really fit with this patch set. I am taking a step to
move code out of lazy_scan_heap() that doesn't belong there. That fact
that other code should also be moved from there seems more like a "yes
and" than a "no but". That being said, do you think I should introduce
patches doing further refactoring of lazy_scan_heap() (like what you
suggest above) into this thread?

- Melanie

#12Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#8)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 4, 2024 at 12:31 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Dec 27, 2023 at 11:27 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Do you have specific concerns about its correctness? I understand it
is an area where we have to be sure we are correct. But, to be fair,
that is true of all the pruning and vacuuming code.

I'm kind of concerned that 0002 might be a performance regression. It
pushes more branches down into the heap-pruning code, which I think
can sometimes be quite hot, for the sake of a case that rarely occurs
in practice. I take your point about it improving things when there
are no indexes, but what about when there are? And even if there are
no adverse performance consequences, is it really worth complicating
the logic at such a low level?

Regarding the additional code complexity, I think the extra call to
lazy_vacuum_heap_page() in lazy_scan_heap() actually represents a fair
amount of code complexity. It is a special case of page-level
processing that should be handled by heap_page_prune() and not
lazy_scan_heap().

lazy_scan_heap() is responsible for three main things -- loop through
the blocks in a relation and process each page (pruning, freezing,
etc), invoke index vacuuming, invoke functions to loop through
dead_items and vacuum pages. The logic to do the per-page processing
is spread across three places, though.

When a single page is being processed, page pruning happens in
heap_page_prune(). Freezing, dead items recording, and visibility
checks happen in lazy_scan_prune(). Visibility map updates and
freespace map updates happen back in lazy_scan_heap(). Except, if the
table has no indexes, in which case, lazy_scan_heap() also invokes
lazy_vacuum_heap_page() to set dead line pointers unused and do
another separate visibility check and VM update. I maintain that all
page-level processing should be done in the page-level processing
functions (like lazy_scan_prune()). And lazy_scan_heap() shouldn't be
directly responsible for special case page-level processing.

Also, I find "pronto_reap" to be a poor choice of name. "pronto" is an
informal word that seems to have no advantage over something like
"immediate" or "now," and I don't think "reap" has a precise,
universally-understood meaning. You could call this "mark_unused_now"
or "immediately_mark_unused" or something and it would be far more
self-documenting, IMHO.

Yes, I see how pronto is unnecessarily informal. If there are no cases
other than when the table has no indexes that we would consider
immediately marking LPs unused, then perhaps it is better to call it
"no_indexes" (per andres' suggestion)?

- Melanie

#13Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#12)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 4, 2024 at 6:03 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

When a single page is being processed, page pruning happens in
heap_page_prune(). Freezing, dead items recording, and visibility
checks happen in lazy_scan_prune(). Visibility map updates and
freespace map updates happen back in lazy_scan_heap(). Except, if the
table has no indexes, in which case, lazy_scan_heap() also invokes
lazy_vacuum_heap_page() to set dead line pointers unused and do
another separate visibility check and VM update. I maintain that all
page-level processing should be done in the page-level processing
functions (like lazy_scan_prune()). And lazy_scan_heap() shouldn't be
directly responsible for special case page-level processing.

But you can just as easily turn this argument on its head, can't you?
In general, except for HOT tuples, line pointers are marked dead by
pruning and unused by vacuum. Here you want to turn it on its head and
make pruning do what would normally be vacuum's responsibility.

I mean, that's not to say that your argument is "wrong" ... but what I
just said really is how I think about it, too.

Also, I find "pronto_reap" to be a poor choice of name. "pronto" is an
informal word that seems to have no advantage over something like
"immediate" or "now," and I don't think "reap" has a precise,
universally-understood meaning. You could call this "mark_unused_now"
or "immediately_mark_unused" or something and it would be far more
self-documenting, IMHO.

Yes, I see how pronto is unnecessarily informal. If there are no cases
other than when the table has no indexes that we would consider
immediately marking LPs unused, then perhaps it is better to call it
"no_indexes" (per andres' suggestion)?

wfm.

--
Robert Haas
EDB: http://www.enterprisedb.com

#14Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#13)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 5, 2024 at 8:59 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 4, 2024 at 6:03 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

When a single page is being processed, page pruning happens in
heap_page_prune(). Freezing, dead items recording, and visibility
checks happen in lazy_scan_prune(). Visibility map updates and
freespace map updates happen back in lazy_scan_heap(). Except, if the
table has no indexes, in which case, lazy_scan_heap() also invokes
lazy_vacuum_heap_page() to set dead line pointers unused and do
another separate visibility check and VM update. I maintain that all
page-level processing should be done in the page-level processing
functions (like lazy_scan_prune()). And lazy_scan_heap() shouldn't be
directly responsible for special case page-level processing.

But you can just as easily turn this argument on its head, can't you?
In general, except for HOT tuples, line pointers are marked dead by
pruning and unused by vacuum. Here you want to turn it on its head and
make pruning do what would normally be vacuum's responsibility.

I actually think we are going to want to stop referring to these steps
as pruning and vacuuming. It is confusing because vacuuming refers to
the whole process of doing garbage collection on the table and also to
the specific step of setting dead line pointers unused. If we called
these steps say, pruning and reaping, that may be more clear.

Vacuuming consists of three phases -- the first pass, index vacuuming,
and the second pass. I don't think we should dictate what happens in
each pass. That is, we shouldn't expect only pruning to happen in the
first pass and only reaping to happen in the second pass. For example,
I think Andres has previously proposed doing another round of pruning
after index vacuuming. The second pass/third phase is distinguished
primarily by being after index vacuuming.

If we think about it this way, that frees us up to set dead line
pointers unused in the first pass when the table has no indexes. For
clarity, I could add a block comment explaining that doing this is an
optimization and not a logical requirement. One way to make this even
more clear would be to set the dead line pointers unused in a separate
loop after heap_prune_chain() as I proposed upthread.

- Melanie

#15Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#14)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 5, 2024 at 12:59 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I actually think we are going to want to stop referring to these steps
as pruning and vacuuming. It is confusing because vacuuming refers to
the whole process of doing garbage collection on the table and also to
the specific step of setting dead line pointers unused. If we called
these steps say, pruning and reaping, that may be more clear.

I agree that there's some terminological confusion here, but if we
leave all of the current naming unchanged and start using new naming
conventions in new code, I don't think that's going to clear things
up. Getting rid of the weird naming with pruning, "scanning" (that is
part of vacuum), and "vacuuming" (that is another part of vacuum) is
gonna take some work.

Vacuuming consists of three phases -- the first pass, index vacuuming,
and the second pass. I don't think we should dictate what happens in
each pass. That is, we shouldn't expect only pruning to happen in the
first pass and only reaping to happen in the second pass. For example,
I think Andres has previously proposed doing another round of pruning
after index vacuuming. The second pass/third phase is distinguished
primarily by being after index vacuuming.

If we think about it this way, that frees us up to set dead line
pointers unused in the first pass when the table has no indexes. For
clarity, I could add a block comment explaining that doing this is an
optimization and not a logical requirement. One way to make this even
more clear would be to set the dead line pointers unused in a separate
loop after heap_prune_chain() as I proposed upthread.

I don't really disagree with any of this, but I think the question is
whether the code is cleaner with mark-LP-as-unused pushed down into
pruning or whether it's better the way it is. Andres seems confident
that the change won't suck for performance, which is good, but I'm not
quite convinced that it's the right direction to go with the code, and
he doesn't seem to be either. Perhaps this all turns on this point:

MP> I am planning to add a VM update into the freeze record, at which point
MP> I will move the VM update code into lazy_scan_prune(). This will then
MP> allow us to consolidate the freespace map update code for the prune and
MP> noprune cases and make lazy_scan_heap() short and sweet.

Can we see what that looks like on top of this change?

--
Robert Haas
EDB: http://www.enterprisedb.com

#16Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#11)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Hi,

On 2024-01-04 17:37:27 -0500, Melanie Plageman wrote:

On Thu, Jan 4, 2024 at 3:03 PM Andres Freund <andres@anarazel.de> wrote:

On 2023-11-17 18:12:08 -0500, Melanie Plageman wrote:

Assert(ItemIdIsNormal(lp));
htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -715,7 +733,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
-                     heap_prune_record_dead(prstate, rootoffnum);
+             {
+                     /*
+                      * If the relation has no indexes, we can remove dead tuples
+                      * during pruning instead of marking their line pointers dead. Set
+                      * this tuple's line pointer LP_UNUSED.
+                      */
+                     if (prstate->pronto_reap)
+                             heap_prune_record_unused(prstate, rootoffnum);
+                     else
+                             heap_prune_record_dead(prstate, rootoffnum);
+             }
else
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
}
@@ -726,9 +754,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* item.  This can happen if the loop in heap_page_prune caused us to
* visit the dead successor of a redirect item before visiting the
* redirect item.  We can clean up by setting the redirect item to
-              * DEAD state.
+              * DEAD state. If pronto_reap is true, we can set it LP_UNUSED now.
*/
-             heap_prune_record_dead(prstate, rootoffnum);
+             if (prstate->pronto_reap)
+                     heap_prune_record_unused(prstate, rootoffnum);
+             else
+                     heap_prune_record_dead(prstate, rootoffnum);
}

return ndeleted;

There's three new calls to heap_prune_record_unused() and the logic got more
nested. Is there a way to get to a nicer end result?

So, I could do another loop through the line pointers in
heap_page_prune() (after the loop calling heap_prune_chain()) and, if
pronto_reap is true, set dead line pointers LP_UNUSED. Then, when
constructing the WAL record, I would just not add the prstate.nowdead
that I saved from heap_prune_chain() to the prune WAL record.

Hm, that seems a bit sad as well. I am wondering if we could move the
pronto_reap handling into heap_prune_record_dead() or a wrapper of it. I am
more concerned about the human reader than the CPU here...

@@ -972,20 +970,21 @@ lazy_scan_heap(LVRelState *vacrel)
continue;
}

-                     /* Collect LP_DEAD items in dead_items array, count tuples */
-                     if (lazy_scan_noprune(vacrel, buf, blkno, page, &hastup,
+                     /*
+                      * Collect LP_DEAD items in dead_items array, count tuples,
+                      * determine if rel truncation is safe
+                      */
+                     if (lazy_scan_noprune(vacrel, buf, blkno, page,
&recordfreespace))
{
Size            freespace = 0;
/*
* Processed page successfully (without cleanup lock) -- just
-                              * need to perform rel truncation and FSM steps, much like the
-                              * lazy_scan_prune case.  Don't bother trying to match its
-                              * visibility map setting steps, though.
+                              * need to update the FSM, much like the lazy_scan_prune case.
+                              * Don't bother trying to match its visibility map setting
+                              * steps, though.
*/
-                             if (hastup)
-                                     vacrel->nonempty_pages = blkno + 1;
if (recordfreespace)
freespace = PageGetHeapFreeSpace(page);
UnlockReleaseBuffer(buf);

The comment continues to say that we "determine if rel truncation is safe" -
but I don't see that? Oh, I see, it's done inside lazy_scan_noprune(). This
doesn't seem like a clear improvement to me. Particularly because it's only
set if lazy_scan_noprune() actually does something.

I don't get what the last sentence means ("Particularly because...").

Took me a second to understand myself again too, oops. What I think I meant is
that it seems error-prone that it's only set in some paths inside
lazy_scan_noprune(). Previously it was at least a bit clearer in
lazy_scan_heap() that it would be set for the different possible paths.

I don't like the existing code in lazy_scan_heap(). But this kinda seems like
tinkering around the edges, without getting to the heart of the issue. I think
we should

1) Move everything after ReadBufferExtended() and the end of the loop into its
own function

2) All the code in the loop body after the call to lazy_scan_prune() is
specific to the lazy_scan_prune() path, it doesn't make sense that it's at
the same level as the the calls to lazy_scan_noprune(),
lazy_scan_new_or_empty() or lazy_scan_prune(). Either it should be in
lazy_scan_prune() or a new wrapper function.

3) It's imo wrong that we have UnlockReleaseBuffer() (there are 6 different
places unlocking if I didn't miscount!) and RecordPageWithFreeSpace() calls
in this many places. I think this is largely a consequence of the previous
points. Once those are addressed, we can have one common place.

I have other patches that do versions of all of the above, but they
didn't seem to really fit with this patch set. I am taking a step to
move code out of lazy_scan_heap() that doesn't belong there. That fact
that other code should also be moved from there seems more like a "yes
and" than a "no but". That being said, do you think I should introduce
patches doing further refactoring of lazy_scan_heap() (like what you
suggest above) into this thread?

It probably should not be part of this patchset. I probably shouldn't have
written the above here, but after concluding that I didn't think your small
refactoring patch was quite right, I couldn't stop myself from thinking about
what would be right.

Greetings,

Andres Freund

#17Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#13)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Hi,

On 2024-01-05 08:59:41 -0500, Robert Haas wrote:

On Thu, Jan 4, 2024 at 6:03 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

When a single page is being processed, page pruning happens in
heap_page_prune(). Freezing, dead items recording, and visibility
checks happen in lazy_scan_prune(). Visibility map updates and
freespace map updates happen back in lazy_scan_heap(). Except, if the
table has no indexes, in which case, lazy_scan_heap() also invokes
lazy_vacuum_heap_page() to set dead line pointers unused and do
another separate visibility check and VM update. I maintain that all
page-level processing should be done in the page-level processing
functions (like lazy_scan_prune()). And lazy_scan_heap() shouldn't be
directly responsible for special case page-level processing.

But you can just as easily turn this argument on its head, can't you?
In general, except for HOT tuples, line pointers are marked dead by
pruning and unused by vacuum. Here you want to turn it on its head and
make pruning do what would normally be vacuum's responsibility.

OTOH, the pruning logic, including its WAL record, already supports marking
items unused, all we need to do is to tell it to do so in a few more cases. If
we didn't already need to have support for this, I'd a much harder time
arguing for doing this.

One important part of the larger project is to combine the WAL records for
pruning, freezing and setting the all-visible/all-frozen bit into one WAL
record. We can't set all-frozen before we have removed the dead items. So
either we need to combine pruning and setting items unused for no-index tables
or we end up considerably less efficient in the no-indexes case.

An aside:

As I think we chatted about before, I eventually would like the option to
remove index entries for a tuple during on-access pruning, for OLTP
workloads. I.e. before removing the tuple, construct the corresponding index
tuple, use it to look up index entries pointing to the tuple. If all the index
entries were found (they might not be, if they already were marked dead during
a lookup, or if an expression wasn't actually immutable), we can prune without
the full index scan. Obviously this would only be suitable for some
workloads, but it could be quite beneficial when you have huge indexes. The
reason I mention this is that then we'd have another source of marking items
unused during pruning.

Greetings,

Andres Freund

#18Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#17)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 5, 2024 at 3:05 PM Andres Freund <andres@anarazel.de> wrote:

OTOH, the pruning logic, including its WAL record, already supports marking
items unused, all we need to do is to tell it to do so in a few more cases. If
we didn't already need to have support for this, I'd a much harder time
arguing for doing this.

One important part of the larger project is to combine the WAL records for
pruning, freezing and setting the all-visible/all-frozen bit into one WAL
record. We can't set all-frozen before we have removed the dead items. So
either we need to combine pruning and setting items unused for no-index tables
or we end up considerably less efficient in the no-indexes case.

Those are fair arguments.

An aside:

As I think we chatted about before, I eventually would like the option to
remove index entries for a tuple during on-access pruning, for OLTP
workloads. I.e. before removing the tuple, construct the corresponding index
tuple, use it to look up index entries pointing to the tuple. If all the index
entries were found (they might not be, if they already were marked dead during
a lookup, or if an expression wasn't actually immutable), we can prune without
the full index scan. Obviously this would only be suitable for some
workloads, but it could be quite beneficial when you have huge indexes. The
reason I mention this is that then we'd have another source of marking items
unused during pruning.

I will be astonished if you can make this work well enough to avoid
huge regressions in plausible cases. There are plenty of cases where
we do a very thorough job opportunistically removing index tuples.

--
Robert Haas
EDB: http://www.enterprisedb.com

#19Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#15)
6 attachment(s)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 5, 2024 at 1:47 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jan 5, 2024 at 12:59 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
MP> I am planning to add a VM update into the freeze record, at which point
MP> I will move the VM update code into lazy_scan_prune(). This will then
MP> allow us to consolidate the freespace map update code for the prune and
MP> noprune cases and make lazy_scan_heap() short and sweet.

Can we see what that looks like on top of this change?

Yes, attached is a patch set which does this. My previous patchset
already reduced the number of places we unlock the buffer and update
the freespace map in lazy_scan_heap(). This patchset combines the
lazy_scan_prune() and lazy_scan_noprune() FSM update cases. I also
have a version which moves the freespace map updates into
lazy_scan_prune() and lazy_scan_noprune() -- eliminating all of these
from lazy_scan_heap(). This is arguably more clear. But Andres
mentioned he wanted fewer places unlocking the buffer and updating the
FSM.

- Melanie

Attachments:

v3-0004-Inline-LVPagePruneState-members-in-lazy_scan_prun.patchtext/x-patch; charset=US-ASCII; name=v3-0004-Inline-LVPagePruneState-members-in-lazy_scan_prun.patchDownload
From 5f8d7544fbe53645b9d46b0eeb74c39cbfdc21f7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 5 Jan 2024 11:47:31 -0500
Subject: [PATCH v3 4/6] Inline LVPagePruneState members in lazy_scan_prune()

After moving the code to update the visibility map from lazy_scan_heap
into lazy_scan_prune, most of the members of LVPagePruneState are no
longer required. Make them local variables in lazy_scan_prune().
---
 src/backend/access/heap/vacuumlazy.c | 104 ++++++++++++++-------------
 1 file changed, 55 insertions(+), 49 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1bc7c56396f..c0266eabb44 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -217,18 +217,8 @@ typedef struct LVRelState
  */
 typedef struct LVPagePruneState
 {
-	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
 	/* whether or not caller should do FSM processing for block */
 	bool		recordfreespace;
-
-	/*
-	 * State describes the proper VM bit states to set for the page following
-	 * pruning and freezing.  all_visible implies !has_lpdead_items, but don't
-	 * trust all_frozen result unless all_visible is also set to true.
-	 */
-	bool		all_visible;	/* Every item visible to all? */
-	bool		all_frozen;		/* provided all_visible is also true */
-	TransactionId visibility_cutoff_xid;	/* For recovery conflicts */
 } LVPagePruneState;
 
 /* Struct for saving and restoring vacuum error information. */
@@ -1397,6 +1387,10 @@ lazy_scan_prune(LVRelState *vacrel,
 	HeapPageFreeze pagefrz;
 	bool		no_indexes;
 	bool		hastup = false;
+	bool		all_visible,
+				all_frozen;
+	TransactionId visibility_cutoff_xid;
+	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1436,14 +1430,24 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
-	 * requiring freezing among remaining tuples with storage
+	 * requiring freezing among remaining tuples with storage.
+	 * has_lpdead_items includes existing LP_DEAD items.
 	 */
-	prunestate->has_lpdead_items = false;
-	prunestate->all_visible = true;
-	prunestate->all_frozen = true;
-	prunestate->visibility_cutoff_xid = InvalidTransactionId;
+	has_lpdead_items = false;
+	/* for recovery conflicts */
+	visibility_cutoff_xid = InvalidTransactionId;
 	prunestate->recordfreespace = false;
 
+	/*
+	 * We will update the VM after collecting LP_DEAD items and freezing
+	 * tuples. Keep track of whether or not the page is all_visible and
+	 * all_frozen and use this information to update the VM. all_visible
+	 * implies !has_lpdead_items, but don't trust all_frozen result unless
+	 * all_visible is also set to true.
+	 */
+	all_visible = true;
+	all_frozen = true;
+
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
 		 offnum = OffsetNumberNext(offnum))
@@ -1530,13 +1534,13 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * asynchronously. See SetHintBits for more info. Check that
 				 * the tuple is hinted xmin-committed because of that.
 				 */
-				if (prunestate->all_visible)
+				if (all_visible)
 				{
 					TransactionId xmin;
 
 					if (!HeapTupleHeaderXminCommitted(htup))
 					{
-						prunestate->all_visible = false;
+						all_visible = false;
 						break;
 					}
 
@@ -1548,14 +1552,14 @@ lazy_scan_prune(LVRelState *vacrel,
 					if (!TransactionIdPrecedes(xmin,
 											   vacrel->cutoffs.OldestXmin))
 					{
-						prunestate->all_visible = false;
+						all_visible = false;
 						break;
 					}
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, prunestate->visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
 						TransactionIdIsNormal(xmin))
-						prunestate->visibility_cutoff_xid = xmin;
+						visibility_cutoff_xid = xmin;
 				}
 				break;
 			case HEAPTUPLE_RECENTLY_DEAD:
@@ -1566,7 +1570,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * pruning.)
 				 */
 				recently_dead_tuples++;
-				prunestate->all_visible = false;
+				all_visible = false;
 				break;
 			case HEAPTUPLE_INSERT_IN_PROGRESS:
 
@@ -1577,11 +1581,11 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * results.  This assumption is a bit shaky, but it is what
 				 * acquire_sample_rows() does, so be consistent.
 				 */
-				prunestate->all_visible = false;
+				all_visible = false;
 				break;
 			case HEAPTUPLE_DELETE_IN_PROGRESS:
 				/* This is an expected case during concurrent vacuum */
-				prunestate->all_visible = false;
+				all_visible = false;
 
 				/*
 				 * Count such rows as live.  As above, we assume the deleting
@@ -1611,7 +1615,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * definitely cannot be set all-frozen in the visibility map later on
 		 */
 		if (!totally_frozen)
-			prunestate->all_frozen = false;
+			all_frozen = false;
 	}
 
 	/*
@@ -1629,7 +1633,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * page all-frozen afterwards (might not happen until final heap pass).
 	 */
 	if (pagefrz.freeze_required || tuples_frozen == 0 ||
-		(prunestate->all_visible && prunestate->all_frozen &&
+		(all_visible && all_frozen &&
 		 fpi_before != pgWalUsage.wal_fpi))
 	{
 		/*
@@ -1667,11 +1671,11 @@ lazy_scan_prune(LVRelState *vacrel,
 			 * once we're done with it.  Otherwise we generate a conservative
 			 * cutoff by stepping back from OldestXmin.
 			 */
-			if (prunestate->all_visible && prunestate->all_frozen)
+			if (all_visible && all_frozen)
 			{
 				/* Using same cutoff when setting VM is now unnecessary */
-				snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
-				prunestate->visibility_cutoff_xid = InvalidTransactionId;
+				snapshotConflictHorizon = visibility_cutoff_xid;
+				visibility_cutoff_xid = InvalidTransactionId;
 			}
 			else
 			{
@@ -1694,7 +1698,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 */
 		vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
 		vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
-		prunestate->all_frozen = false;
+		all_frozen = false;
 		tuples_frozen = 0;		/* avoid miscounts in instrumentation */
 	}
 
@@ -1707,16 +1711,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	 */
 #ifdef USE_ASSERT_CHECKING
 	/* Note that all_frozen value does not matter when !all_visible */
-	if (prunestate->all_visible && lpdead_items == 0)
+	if (all_visible && lpdead_items == 0)
 	{
-		TransactionId cutoff;
-		bool		all_frozen;
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen))
+		if (!heap_page_is_all_visible(vacrel, buf,
+									  &debug_cutoff, &debug_all_frozen))
 			Assert(false);
 
-		Assert(!TransactionIdIsValid(cutoff) ||
-			   cutoff == prunestate->visibility_cutoff_xid);
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == visibility_cutoff_xid);
 	}
 #endif
 
@@ -1729,7 +1734,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		ItemPointerData tmp;
 
 		vacrel->lpdead_item_pages++;
-		prunestate->has_lpdead_items = true;
+		has_lpdead_items = true;
 
 		ItemPointerSetBlockNumber(&tmp, blkno);
 
@@ -1754,7 +1759,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * Now that freezing has been finalized, unset all_visible.  It needs
 		 * to reflect the present state of things, as expected by our caller.
 		 */
-		prunestate->all_visible = false;
+		all_visible = false;
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1777,19 +1782,20 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
 
-	Assert(!prunestate->all_visible || !prunestate->has_lpdead_items);
+	Assert(!all_visible || !has_lpdead_items);
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last lazy_scan_skip() call), and from prunestate
+	 * of last lazy_scan_skip() call), and from all_visible and all_frozen
+	 * variables
 	 */
-	if (!all_visible_according_to_vm && prunestate->all_visible)
+	if (!all_visible_according_to_vm && all_visible)
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
-		if (prunestate->all_frozen)
+		if (all_frozen)
 		{
-			Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
@@ -1809,7 +1815,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
 		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-						  vmbuffer, prunestate->visibility_cutoff_xid,
+						  vmbuffer, visibility_cutoff_xid,
 						  flags);
 	}
 
@@ -1842,7 +1848,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
 	 * however.
 	 */
-	else if (prunestate->has_lpdead_items && PageIsAllVisible(page))
+	else if (has_lpdead_items && PageIsAllVisible(page))
 	{
 		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
 			 vacrel->relname, blkno);
@@ -1855,16 +1861,16 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both prunestate fields.
+	 * true, so we must check both all_visible and all_frozen.
 	 */
-	else if (all_visible_according_to_vm && prunestate->all_visible &&
-			 prunestate->all_frozen &&
+	else if (all_visible_according_to_vm && all_visible &&
+			 all_frozen &&
 			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		/*
 		 * Avoid relying on all_visible_according_to_vm as a proxy for the
 		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set in prunestate
+		 * stale -- even when all_visible is set
 		 */
 		if (!PageIsAllVisible(page))
 		{
@@ -1879,7 +1885,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * since a snapshotConflictHorizon sufficient to make everything safe
 		 * for REDO was logged when the page's tuples were frozen.
 		 */
-		Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+		Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
 						  vmbuffer, InvalidTransactionId,
 						  VISIBILITYMAP_ALL_VISIBLE |
-- 
2.37.2

v3-0001-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patchDownload
From a575569076a7fb138e0a87463d7022d1e1710f25 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Nov 2023 16:47:08 -0500
Subject: [PATCH v3 1/6] Indicate rel truncation unsafe in lazy_scan[no]prune

Both lazy_scan_prune() and lazy_scan_noprune() must determine whether or
not there are tuples on the page making rel truncation unsafe.
LVRelState->nonempty_pages is updated to reflect this. Previously, both
functions set an output parameter or output parameter member, hastup, to
indicate that nonempty_pages should be updated to reflect the latest
non-removable page. There doesn't seem to be any reason to wait until
lazy_scan_[no]prune() returns to update nonempty_pages. Plenty of other
counters in the LVRelState are updated in lazy_scan_[no]prune().
This allows us to get rid of the output parameter hastup.
---
 src/backend/access/heap/vacuumlazy.c | 50 +++++++++++++++-------------
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index abbba8947fa..c1a63bf6125 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -217,7 +217,6 @@ typedef struct LVRelState
  */
 typedef struct LVPagePruneState
 {
-	bool		hastup;			/* Page prevents rel truncation? */
 	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
 
 	/*
@@ -253,7 +252,7 @@ static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							LVPagePruneState *prunestate);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
-							  bool *hastup, bool *recordfreespace);
+							  bool *recordfreespace);
 static void lazy_vacuum(LVRelState *vacrel);
 static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
 static void lazy_vacuum_heap_rel(LVRelState *vacrel);
@@ -959,8 +958,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		hastup,
-						recordfreespace;
+			bool		recordfreespace;
 
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
@@ -972,20 +970,21 @@ lazy_scan_heap(LVRelState *vacrel)
 				continue;
 			}
 
-			/* Collect LP_DEAD items in dead_items array, count tuples */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page, &hastup,
+			/*
+			 * Collect LP_DEAD items in dead_items array, count tuples,
+			 * determine if rel truncation is safe
+			 */
+			if (lazy_scan_noprune(vacrel, buf, blkno, page,
 								  &recordfreespace))
 			{
 				Size		freespace = 0;
 
 				/*
 				 * Processed page successfully (without cleanup lock) -- just
-				 * need to perform rel truncation and FSM steps, much like the
-				 * lazy_scan_prune case.  Don't bother trying to match its
-				 * visibility map setting steps, though.
+				 * need to update the FSM, much like the lazy_scan_prune case.
+				 * Don't bother trying to match its visibility map setting
+				 * steps, though.
 				 */
-				if (hastup)
-					vacrel->nonempty_pages = blkno + 1;
 				if (recordfreespace)
 					freespace = PageGetHeapFreeSpace(page);
 				UnlockReleaseBuffer(buf);
@@ -1023,10 +1022,6 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
-		/* Remember the location of the last page with nonremovable tuples */
-		if (prunestate.hastup)
-			vacrel->nonempty_pages = blkno + 1;
-
 		if (vacrel->nindexes == 0)
 		{
 			/*
@@ -1555,6 +1550,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				live_tuples,
 				recently_dead_tuples;
 	HeapPageFreeze pagefrz;
+	bool		hastup = false;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1593,7 +1589,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * Now scan the page to collect LP_DEAD items and check for tuples
 	 * requiring freezing among remaining tuples with storage
 	 */
-	prunestate->hastup = false;
 	prunestate->has_lpdead_items = false;
 	prunestate->all_visible = true;
 	prunestate->all_frozen = true;
@@ -1620,7 +1615,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		if (ItemIdIsRedirected(itemid))
 		{
 			/* page makes rel truncation unsafe */
-			prunestate->hastup = true;
+			hastup = true;
 			continue;
 		}
 
@@ -1750,7 +1745,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				break;
 		}
 
-		prunestate->hastup = true;	/* page makes rel truncation unsafe */
+		hastup = true;			/* page makes rel truncation unsafe */
 
 		/* Tuple with storage -- consider need to freeze */
 		if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
@@ -1918,6 +1913,10 @@ lazy_scan_prune(LVRelState *vacrel,
 	vacrel->lpdead_items += lpdead_items;
 	vacrel->live_tuples += live_tuples;
 	vacrel->recently_dead_tuples += recently_dead_tuples;
+
+	/* Remember the location of the last page with nonremovable tuples */
+	if (hastup)
+		vacrel->nonempty_pages = blkno + 1;
 }
 
 /*
@@ -1935,7 +1934,6 @@ lazy_scan_prune(LVRelState *vacrel,
  * one or more tuples on the page.  We always return true for non-aggressive
  * callers.
  *
- * See lazy_scan_prune for an explanation of hastup return flag.
  * recordfreespace flag instructs caller on whether or not it should do
  * generic FSM processing for page.
  */
@@ -1944,7 +1942,6 @@ lazy_scan_noprune(LVRelState *vacrel,
 				  Buffer buf,
 				  BlockNumber blkno,
 				  Page page,
-				  bool *hastup,
 				  bool *recordfreespace)
 {
 	OffsetNumber offnum,
@@ -1953,6 +1950,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 				live_tuples,
 				recently_dead_tuples,
 				missed_dead_tuples;
+	bool		hastup;
 	HeapTupleHeader tupleheader;
 	TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
 	MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid;
@@ -1960,7 +1958,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
-	*hastup = false;			/* for now */
+	hastup = false;				/* for now */
 	*recordfreespace = false;	/* for now */
 
 	lpdead_items = 0;
@@ -1984,7 +1982,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 
 		if (ItemIdIsRedirected(itemid))
 		{
-			*hastup = true;
+			hastup = true;
 			continue;
 		}
 
@@ -1998,7 +1996,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 			continue;
 		}
 
-		*hastup = true;			/* page prevents rel truncation */
+		hastup = true;			/* page prevents rel truncation */
 		tupleheader = (HeapTupleHeader) PageGetItem(page, itemid);
 		if (heap_tuple_should_freeze(tupleheader, &vacrel->cutoffs,
 									 &NoFreezePageRelfrozenXid,
@@ -2100,7 +2098,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 			 * but it beats having to maintain specialized heap vacuuming code
 			 * forever, for vanishingly little benefit.)
 			 */
-			*hastup = true;
+			hastup = true;
 			missed_dead_tuples += lpdead_items;
 		}
 
@@ -2156,6 +2154,10 @@ lazy_scan_noprune(LVRelState *vacrel,
 	if (missed_dead_tuples > 0)
 		vacrel->missed_dead_pages++;
 
+	/* rel truncation is unsafe */
+	if (hastup)
+		vacrel->nonempty_pages = blkno + 1;
+
 	/* Caller won't need to call lazy_scan_prune with same page */
 	return true;
 }
-- 
2.37.2

v3-0003-Update-VM-in-lazy_scan_prune.patchtext/x-patch; charset=US-ASCII; name=v3-0003-Update-VM-in-lazy_scan_prune.patchDownload
From c1b4d81a85a0a25b65ffa707ef6bd7e9258a3a8a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 5 Jan 2024 11:12:58 -0500
Subject: [PATCH v3 3/6] Update VM in lazy_scan_prune()

The visibility map is updated after lazy_scan_prune() returns. Most of
the output parameters of lazy_scan_prune() are used to update the
visibility map. Moving that code into lazy_scan_prune() simplifies
lazy_scan_heap() and requires less communication between the two.
---
 src/backend/access/heap/vacuumlazy.c | 225 ++++++++++++++-------------
 1 file changed, 115 insertions(+), 110 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2f6c1b1ba51..1bc7c56396f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -251,6 +251,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
+							Buffer vmbuffer, bool all_visible_according_to_vm,
 							LVPagePruneState *prunestate);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
@@ -1023,117 +1024,10 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * were pruned some time earlier.  Also considers freezing XIDs in the
 		 * tuple headers of remaining items with storage.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate);
+		lazy_scan_prune(vacrel, buf, blkno, page,
+						vmbuffer, all_visible_according_to_vm,
+						&prunestate);
 
-		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
-
-		/*
-		 * Handle setting visibility map bit based on information from the VM
-		 * (as of last lazy_scan_skip() call), and from prunestate
-		 */
-		if (!all_visible_according_to_vm && prunestate.all_visible)
-		{
-			uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-			if (prunestate.all_frozen)
-			{
-				Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-				flags |= VISIBILITYMAP_ALL_FROZEN;
-			}
-
-			/*
-			 * It should never be the case that the visibility map page is set
-			 * while the page-level bit is clear, but the reverse is allowed
-			 * (if checksums are not enabled).  Regardless, set both bits so
-			 * that we get back in sync.
-			 *
-			 * NB: If the heap page is all-visible but the VM bit is not set,
-			 * we don't need to dirty the heap page.  However, if checksums
-			 * are enabled, we do need to make sure that the heap page is
-			 * dirtied before passing it to visibilitymap_set(), because it
-			 * may be logged.  Given that this situation should only happen in
-			 * rare cases after a crash, it is not worth optimizing.
-			 */
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, prunestate.visibility_cutoff_xid,
-							  flags);
-		}
-
-		/*
-		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
-		 * the page-level bit is clear.  However, it's possible that the bit
-		 * got cleared after lazy_scan_skip() was called, so we must recheck
-		 * with buffer lock before concluding that the VM is corrupt.
-		 */
-		else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-				 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-		{
-			elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-				 vacrel->relname, blkno);
-			visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-								VISIBILITYMAP_VALID_BITS);
-		}
-
-		/*
-		 * It's possible for the value returned by
-		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-		 * wrong for us to see tuples that appear to not be visible to
-		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
-		 * xmin value never moves backwards, but
-		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
-		 * returns a value that's unnecessarily small, so if we see that
-		 * contradiction it just means that the tuples that we think are not
-		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
-		 * is correct.
-		 *
-		 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE
-		 * set, however.
-		 */
-		else if (prunestate.has_lpdead_items && PageIsAllVisible(page))
-		{
-			elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-				 vacrel->relname, blkno);
-			PageClearAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-								VISIBILITYMAP_VALID_BITS);
-		}
-
-		/*
-		 * If the all-visible page is all-frozen but not marked as such yet,
-		 * mark it as all-frozen.  Note that all_frozen is only valid if
-		 * all_visible is true, so we must check both prunestate fields.
-		 */
-		else if (all_visible_according_to_vm && prunestate.all_visible &&
-				 prunestate.all_frozen &&
-				 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-		{
-			/*
-			 * Avoid relying on all_visible_according_to_vm as a proxy for the
-			 * page-level PD_ALL_VISIBLE bit being set, since it might have
-			 * become stale -- even when all_visible is set in prunestate
-			 */
-			if (!PageIsAllVisible(page))
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buf);
-			}
-
-			/*
-			 * Set the page all-frozen (and all-visible) in the VM.
-			 *
-			 * We can pass InvalidTransactionId as our visibility_cutoff_xid,
-			 * since a snapshotConflictHorizon sufficient to make everything
-			 * safe for REDO was logged when the page's tuples were frozen.
-			 */
-			Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
-		}
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1487,6 +1381,8 @@ lazy_scan_prune(LVRelState *vacrel,
 				Buffer buf,
 				BlockNumber blkno,
 				Page page,
+				Buffer vmbuffer,
+				bool all_visible_according_to_vm,
 				LVPagePruneState *prunestate)
 {
 	Relation	rel = vacrel->rel;
@@ -1880,6 +1776,115 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Remember the location of the last page with nonremovable tuples */
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
+
+	Assert(!prunestate->all_visible || !prunestate->has_lpdead_items);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last lazy_scan_skip() call), and from prunestate
+	 */
+	if (!all_visible_according_to_vm && prunestate->all_visible)
+	{
+		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
+
+		if (prunestate->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+			flags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled).  Regardless, set both bits so that we
+		 * get back in sync.
+		 *
+		 * NB: If the heap page is all-visible but the VM bit is not set, we
+		 * don't need to dirty the heap page.  However, if checksums are
+		 * enabled, we do need to make sure that the heap page is dirtied
+		 * before passing it to visibilitymap_set(), because it may be logged.
+		 * Given that this situation should only happen in rare cases after a
+		 * crash, it is not worth optimizing.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, prunestate->visibility_cutoff_xid,
+						  flags);
+	}
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after lazy_scan_skip() was called, so we must recheck with
+	 * buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
+			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 vacrel->relname, blkno);
+		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prunestate->has_lpdead_items && PageIsAllVisible(page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 vacrel->relname, blkno);
+		PageClearAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * If the all-visible page is all-frozen but not marked as such yet, mark
+	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
+	 * true, so we must check both prunestate fields.
+	 */
+	else if (all_visible_according_to_vm && prunestate->all_visible &&
+			 prunestate->all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	{
+		/*
+		 * Avoid relying on all_visible_according_to_vm as a proxy for the
+		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
+		 * stale -- even when all_visible is set in prunestate
+		 */
+		if (!PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
+		/*
+		 * Set the page all-frozen (and all-visible) in the VM.
+		 *
+		 * We can pass InvalidTransactionId as our visibility_cutoff_xid,
+		 * since a snapshotConflictHorizon sufficient to make everything safe
+		 * for REDO was logged when the page's tuples were frozen.
+		 */
+		Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, InvalidTransactionId,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
+	}
 }
 
 /*
-- 
2.37.2

v3-0002-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchDownload
From 80056f7d05a7bf32f017fb2559f026973ed9d93c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Nov 2023 14:10:05 -0500
Subject: [PATCH v3 2/6] Set would-be dead items LP_UNUSED while pruning

If there are no indexes on a relation, items can be marked LP_UNUSED
instead of LP_DEAD during lazy_scan_prune(). This avoids a separate
invocation of lazy_vacuum_heap_page() and saves a vacuum WAL record.

To accomplish this, pass lazy_scan_prune() a new parameter, no_indexes,
which indicates that dead line pointers should be set to LP_UNUSED
during pruning, allowing earlier reaping of tuples.

Because we don't update the freespace map until after dropping the lock
on the buffer and we need the lock while we update the visibility map,
save our intent to update the freespace map in
LVPagePruneState.recordfreespace.

Discussion: https://postgr.es/m/CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw%40mail.gmail.com
---
 src/backend/access/heap/heapam.c     |   9 +-
 src/backend/access/heap/pruneheap.c  |  77 +++++++++++---
 src/backend/access/heap/vacuumlazy.c | 146 ++++++++++-----------------
 src/include/access/heapam.h          |   3 +-
 4 files changed, 125 insertions(+), 110 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 707460a5364..27d87b619f9 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8810,8 +8810,13 @@ heap_xlog_prune(XLogReaderState *record)
 		nunused = (end - nowunused);
 		Assert(nunused >= 0);
 
-		/* Update all line pointers per the record, and repair fragmentation */
-		heap_page_prune_execute(buffer,
+		/*
+		 * Update all line pointers per the record, and repair fragmentation.
+		 * We always pass no_indexes as true, because we don't know whether or
+		 * not this option was used when pruning. This reduces the validation
+		 * done on replay in an assert build.
+		 */
+		heap_page_prune_execute(buffer, true,
 								redirected, nredirected,
 								nowdead, ndead,
 								nowunused, nunused);
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3e0a1a260e6..d3f75a29ccc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -35,6 +35,8 @@ typedef struct
 
 	/* tuple visibility test, initialized for the relation */
 	GlobalVisState *vistest;
+	/* whether or not dead items can be set LP_UNUSED during pruning */
+	bool		no_indexes;
 
 	TransactionId new_prune_xid;	/* new prune hint value for page */
 	TransactionId snapshotConflictHorizon;	/* latest xid removed */
@@ -148,7 +150,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			PruneResult presult;
 
-			heap_page_prune(relation, buffer, vistest, &presult, NULL);
+			heap_page_prune(relation, buffer, vistest, false,
+							&presult, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -193,6 +196,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * (see heap_prune_satisfies_vacuum and
  * HeapTupleSatisfiesVacuum).
  *
+ * no_indexes indicates whether or not dead items can be set LP_UNUSED during
+ * pruning.
+ *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -203,6 +209,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 void
 heap_page_prune(Relation relation, Buffer buffer,
 				GlobalVisState *vistest,
+				bool no_indexes,
 				PruneResult *presult,
 				OffsetNumber *off_loc)
 {
@@ -227,6 +234,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 	prstate.new_prune_xid = InvalidTransactionId;
 	prstate.rel = relation;
 	prstate.vistest = vistest;
+	prstate.no_indexes = no_indexes;
 	prstate.snapshotConflictHorizon = InvalidTransactionId;
 	prstate.nredirected = prstate.ndead = prstate.nunused = 0;
 	memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -306,9 +314,9 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (off_loc)
 			*off_loc = offnum;
 
-		/* Nothing to do if slot is empty or already dead */
+		/* Nothing to do if slot is empty */
 		itemid = PageGetItemId(page, offnum);
-		if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+		if (!ItemIdIsUsed(itemid))
 			continue;
 
 		/* Process this item or chain of items */
@@ -330,7 +338,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 		 * Apply the planned item changes, then repair page fragmentation, and
 		 * update the page's hint bit about whether it has free line pointers.
 		 */
-		heap_page_prune_execute(buffer,
+		heap_page_prune_execute(buffer, prstate.no_indexes,
 								prstate.redirected, prstate.nredirected,
 								prstate.nowdead, prstate.ndead,
 								prstate.nowunused, prstate.nunused);
@@ -581,7 +589,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * function.)
 		 */
 		if (ItemIdIsDead(lp))
+		{
+			/*
+			 * If the relation has no indexes, we can set dead line pointers
+			 * LP_UNUSED now. We don't increment ndeleted here since the LP
+			 * was already marked dead.
+			 */
+			if (prstate->no_indexes)
+				heap_prune_record_unused(prstate, offnum);
+
 			break;
+		}
 
 		Assert(ItemIdIsNormal(lp));
 		htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -715,7 +733,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * redirect the root to the correct chain member.
 		 */
 		if (i >= nchain)
-			heap_prune_record_dead(prstate, rootoffnum);
+		{
+			/*
+			 * If the relation has no indexes, we can remove dead tuples
+			 * during pruning instead of marking their line pointers dead. Set
+			 * this tuple's line pointer LP_UNUSED.
+			 */
+			if (prstate->no_indexes)
+				heap_prune_record_unused(prstate, rootoffnum);
+			else
+				heap_prune_record_dead(prstate, rootoffnum);
+		}
 		else
 			heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
 	}
@@ -726,9 +754,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * item.  This can happen if the loop in heap_page_prune caused us to
 		 * visit the dead successor of a redirect item before visiting the
 		 * redirect item.  We can clean up by setting the redirect item to
-		 * DEAD state.
+		 * DEAD state. If no_indexes is true, we can set it LP_UNUSED now.
 		 */
-		heap_prune_record_dead(prstate, rootoffnum);
+		if (prstate->no_indexes)
+			heap_prune_record_unused(prstate, rootoffnum);
+		else
+			heap_prune_record_dead(prstate, rootoffnum);
 	}
 
 	return ndeleted;
@@ -792,7 +823,7 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
  * buffer.
  */
 void
-heap_page_prune_execute(Buffer buffer,
+heap_page_prune_execute(Buffer buffer, bool no_indexes,
 						OffsetNumber *redirected, int nredirected,
 						OffsetNumber *nowdead, int ndead,
 						OffsetNumber *nowunused, int nunused)
@@ -902,14 +933,28 @@ heap_page_prune_execute(Buffer buffer,
 
 #ifdef USE_ASSERT_CHECKING
 
-		/*
-		 * Only heap-only tuples can become LP_UNUSED during pruning.  They
-		 * don't need to be left in place as LP_DEAD items until VACUUM gets
-		 * around to doing index vacuuming.
-		 */
-		Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
-		htup = (HeapTupleHeader) PageGetItem(page, lp);
-		Assert(HeapTupleHeaderIsHeapOnly(htup));
+		if (no_indexes)
+		{
+			/*
+			 * If the relation has no indexes, we may set any of LP_NORMAL,
+			 * LP_REDIRECT, or LP_DEAD items to LP_UNUSED during pruning. We
+			 * can't check much here except that, if the item is LP_NORMAL, it
+			 * should have storage before it is set LP_UNUSED.
+			 */
+			Assert(!ItemIdIsNormal(lp) || ItemIdHasStorage(lp));
+		}
+		else
+		{
+			/*
+			 * If the relation has indexes, only heap-only tuples can become
+			 * LP_UNUSED during pruning. They don't need to be left in place
+			 * as LP_DEAD items until VACUUM gets around to doing index
+			 * vacuuming.
+			 */
+			Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
+			htup = (HeapTupleHeader) PageGetItem(page, lp);
+			Assert(HeapTupleHeaderIsHeapOnly(htup));
+		}
 #endif
 
 		ItemIdSetUnused(lp);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c1a63bf6125..2f6c1b1ba51 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -218,6 +218,8 @@ typedef struct LVRelState
 typedef struct LVPagePruneState
 {
 	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
+	/* whether or not caller should do FSM processing for block */
+	bool		recordfreespace;
 
 	/*
 	 * State describes the proper VM bit states to set for the page following
@@ -829,6 +831,7 @@ lazy_scan_heap(LVRelState *vacrel)
 				next_fsm_block_to_vacuum = 0;
 	VacDeadItems *dead_items = vacrel->dead_items;
 	Buffer		vmbuffer = InvalidBuffer;
+	int			tuples_already_deleted;
 	bool		next_unskippable_allvis,
 				skipping_current_range;
 	const int	initprog_index[] = {
@@ -1009,6 +1012,8 @@ lazy_scan_heap(LVRelState *vacrel)
 			continue;
 		}
 
+		tuples_already_deleted = vacrel->tuples_deleted;
+
 		/*
 		 * Prune, freeze, and count tuples.
 		 *
@@ -1022,69 +1027,6 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
-		if (vacrel->nindexes == 0)
-		{
-			/*
-			 * Consider the need to do page-at-a-time heap vacuuming when
-			 * using the one-pass strategy now.
-			 *
-			 * The one-pass strategy will never call lazy_vacuum().  The steps
-			 * performed here can be thought of as the one-pass equivalent of
-			 * a call to lazy_vacuum().
-			 */
-			if (prunestate.has_lpdead_items)
-			{
-				Size		freespace;
-
-				lazy_vacuum_heap_page(vacrel, blkno, buf, 0, vmbuffer);
-
-				/* Forget the LP_DEAD items that we just vacuumed */
-				dead_items->num_items = 0;
-
-				/*
-				 * Now perform FSM processing for blkno, and move on to next
-				 * page.
-				 *
-				 * Our call to lazy_vacuum_heap_page() will have considered if
-				 * it's possible to set all_visible/all_frozen independently
-				 * of lazy_scan_prune().  Note that prunestate was invalidated
-				 * by lazy_vacuum_heap_page() call.
-				 */
-				freespace = PageGetHeapFreeSpace(page);
-
-				UnlockReleaseBuffer(buf);
-				RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-
-				/*
-				 * Periodically perform FSM vacuuming to make newly-freed
-				 * space visible on upper FSM pages. FreeSpaceMapVacuumRange()
-				 * vacuums the portion of the freespace map covering heap
-				 * pages from start to end - 1. Include the block we just
-				 * vacuumed by passing it blkno + 1. Overflow isn't an issue
-				 * because MaxBlockNumber + 1 is InvalidBlockNumber which
-				 * causes FreeSpaceMapVacuumRange() to vacuum freespace map
-				 * pages covering the remainder of the relation.
-				 */
-				if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
-				{
-					FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-											blkno + 1);
-					next_fsm_block_to_vacuum = blkno + 1;
-				}
-
-				continue;
-			}
-
-			/*
-			 * There was no call to lazy_vacuum_heap_page() because pruning
-			 * didn't encounter/create any LP_DEAD items that needed to be
-			 * vacuumed.  Prune state has not been invalidated, so proceed
-			 * with prunestate-driven visibility map and FSM steps (just like
-			 * the two-pass strategy).
-			 */
-			Assert(dead_items->num_items == 0);
-		}
-
 		/*
 		 * Handle setting visibility map bit based on information from the VM
 		 * (as of last lazy_scan_skip() call), and from prunestate
@@ -1195,38 +1137,45 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
-		 * FSM
+		 * FSM.
+		 *
+		 * If we will likely do index vacuuming, wait until
+		 * lazy_vacuum_heap_rel() to save free space. This doesn't just save
+		 * us some cycles; it also allows us to record any additional free
+		 * space that lazy_vacuum_heap_page() will make available in cases
+		 * where it's possible to truncate the page's line pointer array.
+		 *
+		 * Note: It's not in fact 100% certain that we really will call
+		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
+		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
+		 * because it only happens in emergencies, or when there is very
+		 * little free space anyway. (Besides, we start recording free space
+		 * in the FSM once index vacuuming has been abandoned.)
 		 */
-		if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
-		{
-			/*
-			 * Wait until lazy_vacuum_heap_rel() to save free space.  This
-			 * doesn't just save us some cycles; it also allows us to record
-			 * any additional free space that lazy_vacuum_heap_page() will
-			 * make available in cases where it's possible to truncate the
-			 * page's line pointer array.
-			 *
-			 * Note: It's not in fact 100% certain that we really will call
-			 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip
-			 * index vacuuming (and so must skip heap vacuuming).  This is
-			 * deemed okay because it only happens in emergencies, or when
-			 * there is very little free space anyway. (Besides, we start
-			 * recording free space in the FSM once index vacuuming has been
-			 * abandoned.)
-			 *
-			 * Note: The one-pass (no indexes) case is only supposed to make
-			 * it this far when there were no LP_DEAD items during pruning.
-			 */
-			Assert(vacrel->nindexes > 0);
-			UnlockReleaseBuffer(buf);
-		}
-		else
+		if (prunestate.recordfreespace)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
 			UnlockReleaseBuffer(buf);
 			RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
 		}
+		else
+			UnlockReleaseBuffer(buf);
+
+		/*
+		 * Periodically perform FSM vacuuming to make newly-freed space
+		 * visible on upper FSM pages. This is done after vacuuming if the
+		 * table has indexes.
+		 */
+		if (vacrel->nindexes == 0 &&
+			vacrel->tuples_deleted > tuples_already_deleted &&
+			(blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES))
+		{
+			FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
+									blkno);
+			next_fsm_block_to_vacuum = blkno;
+		}
+
 	}
 
 	vacrel->blkno = InvalidBlockNumber;
@@ -1550,6 +1499,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				live_tuples,
 				recently_dead_tuples;
 	HeapPageFreeze pagefrz;
+	bool		no_indexes;
 	bool		hastup = false;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1575,6 +1525,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	live_tuples = 0;
 	recently_dead_tuples = 0;
 
+	no_indexes = vacrel->nindexes == 0;
+
 	/*
 	 * Prune all HOT-update chains in this page.
 	 *
@@ -1583,7 +1535,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * lpdead_items's final value can be thought of as the number of tuples
 	 * that were deleted from indexes.
 	 */
-	heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum);
+	heap_page_prune(rel, buf, vacrel->vistest, no_indexes,
+					&presult, &vacrel->offnum);
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
@@ -1593,6 +1546,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	prunestate->all_visible = true;
 	prunestate->all_frozen = true;
 	prunestate->visibility_cutoff_xid = InvalidTransactionId;
+	prunestate->recordfreespace = false;
 
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
@@ -1914,6 +1868,15 @@ lazy_scan_prune(LVRelState *vacrel,
 	vacrel->live_tuples += live_tuples;
 	vacrel->recently_dead_tuples += recently_dead_tuples;
 
+	/*
+	 * If we will not do index vacuuming, either because we have no indexes,
+	 * because there is nothing to vacuum, or because do_index_vacuuming is
+	 * false, make sure we update the freespace map.
+	 */
+	if (vacrel->nindexes == 0 ||
+		!vacrel->do_index_vacuuming || lpdead_items == 0)
+		prunestate->recordfreespace = true;
+
 	/* Remember the location of the last page with nonremovable tuples */
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
@@ -1935,7 +1898,8 @@ lazy_scan_prune(LVRelState *vacrel,
  * callers.
  *
  * recordfreespace flag instructs caller on whether or not it should do
- * generic FSM processing for page.
+ * generic FSM processing for page. vacrel is updated with page-level counts
+ * and to indicate whether or not rel truncation is safe.
  */
 static bool
 lazy_scan_noprune(LVRelState *vacrel,
@@ -2518,7 +2482,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
 
-	Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
+	Assert(vacrel->do_index_vacuuming);
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 932ec0d6f2b..867ec36cb64 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -320,9 +320,10 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
+							bool no_indexes,
 							PruneResult *presult,
 							OffsetNumber *off_loc);
-extern void heap_page_prune_execute(Buffer buffer,
+extern void heap_page_prune_execute(Buffer buffer, bool no_indexes,
 									OffsetNumber *redirected, int nredirected,
 									OffsetNumber *nowdead, int ndead,
 									OffsetNumber *nowunused, int nunused);
-- 
2.37.2

v3-0005-Remove-unneeded-struct-LVPagePruneState.patchtext/x-patch; charset=US-ASCII; name=v3-0005-Remove-unneeded-struct-LVPagePruneState.patchDownload
From f4c0cf21f068e3633ec9dbd183b68aad8fd26db7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 5 Jan 2024 11:56:37 -0500
Subject: [PATCH v3 5/6] Remove unneeded struct LVPagePruneState

recordfreespace is the only needed output parameter for
lazy_scan_prune(), so LVPagePruneState is no longer needed. This also
aligns lazy_scan_prune() with lazy_scan_noprune() which has
recordfreespace as an output parameter.
---
 src/backend/access/heap/vacuumlazy.c | 25 +++++++------------------
 src/tools/pgindent/typedefs.list     |  1 -
 2 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c0266eabb44..86942ca6004 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -212,15 +212,6 @@ typedef struct LVRelState
 	int64		missed_dead_tuples; /* # removable, but not removed */
 } LVRelState;
 
-/*
- * State returned by lazy_scan_prune()
- */
-typedef struct LVPagePruneState
-{
-	/* whether or not caller should do FSM processing for block */
-	bool		recordfreespace;
-} LVPagePruneState;
-
 /* Struct for saving and restoring vacuum error information. */
 typedef struct LVSavedErrInfo
 {
@@ -242,7 +233,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
-							LVPagePruneState *prunestate);
+							bool *recordfreespace);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *recordfreespace);
@@ -825,6 +816,7 @@ lazy_scan_heap(LVRelState *vacrel)
 	int			tuples_already_deleted;
 	bool		next_unskippable_allvis,
 				skipping_current_range;
+	bool		recordfreespace;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -847,7 +839,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		Buffer		buf;
 		Page		page;
 		bool		all_visible_according_to_vm;
-		LVPagePruneState prunestate;
 
 		if (blkno == next_unskippable_block)
 		{
@@ -952,8 +943,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		recordfreespace;
-
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
 			/* Check for new or empty pages before lazy_scan_noprune call */
@@ -1004,6 +993,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		}
 
 		tuples_already_deleted = vacrel->tuples_deleted;
+		recordfreespace = false;
 
 		/*
 		 * Prune, freeze, and count tuples.
@@ -1016,7 +1006,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 */
 		lazy_scan_prune(vacrel, buf, blkno, page,
 						vmbuffer, all_visible_according_to_vm,
-						&prunestate);
+						&recordfreespace);
 
 
 		/*
@@ -1036,7 +1026,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * little free space anyway. (Besides, we start recording free space
 		 * in the FSM once index vacuuming has been abandoned.)
 		 */
-		if (prunestate.recordfreespace)
+		if (recordfreespace)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
@@ -1373,7 +1363,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				Page page,
 				Buffer vmbuffer,
 				bool all_visible_according_to_vm,
-				LVPagePruneState *prunestate)
+				bool *recordfreespace)
 {
 	Relation	rel = vacrel->rel;
 	OffsetNumber offnum,
@@ -1436,7 +1426,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	has_lpdead_items = false;
 	/* for recovery conflicts */
 	visibility_cutoff_xid = InvalidTransactionId;
-	prunestate->recordfreespace = false;
 
 	/*
 	 * We will update the VM after collecting LP_DEAD items and freezing
@@ -1776,7 +1765,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 */
 	if (vacrel->nindexes == 0 ||
 		!vacrel->do_index_vacuuming || lpdead_items == 0)
-		prunestate->recordfreespace = true;
+		*recordfreespace = true;
 
 	/* Remember the location of the last page with nonremovable tuples */
 	if (hastup)
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5fd46b7bd1f..1edafd0f516 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1405,7 +1405,6 @@ LPVOID
 LPWSTR
 LSEG
 LUID
-LVPagePruneState
 LVRelState
 LVSavedErrInfo
 LWLock
-- 
2.37.2

v3-0006-Conslidate-FSM-update-code-in-lazy_scan_heap.patchtext/x-patch; charset=US-ASCII; name=v3-0006-Conslidate-FSM-update-code-in-lazy_scan_heap.patchDownload
From c6fc52bc33f271c8ed9c685270bd6c00edbefa7b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 5 Jan 2024 15:18:11 -0500
Subject: [PATCH v3 6/6] Conslidate FSM update code in lazy_scan_heap()

lazy_scan_prune() and lazy_scan_noprune() both require the a FSM
update after returning. Consolidate these FSM updates, being careful not
to call lazy_scan_prune() if lazy_scan_noprune() has already finished
processing the page.
---
 src/backend/access/heap/vacuumlazy.c | 54 ++++++++++++----------------
 1 file changed, 23 insertions(+), 31 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 86942ca6004..03bfbed30df 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -817,6 +817,7 @@ lazy_scan_heap(LVRelState *vacrel)
 	bool		next_unskippable_allvis,
 				skipping_current_range;
 	bool		recordfreespace;
+	bool		do_prune = true;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -955,46 +956,35 @@ lazy_scan_heap(LVRelState *vacrel)
 
 			/*
 			 * Collect LP_DEAD items in dead_items array, count tuples,
-			 * determine if rel truncation is safe
+			 * determine if rel truncation is safe. If lazy_scan_noprune()
+			 * returns false, we must get a cleanup lock and call
+			 * lazy_scan_prune().
 			 */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page,
-								  &recordfreespace))
-			{
-				Size		freespace = 0;
-
-				/*
-				 * Processed page successfully (without cleanup lock) -- just
-				 * need to update the FSM, much like the lazy_scan_prune case.
-				 * Don't bother trying to match its visibility map setting
-				 * steps, though.
-				 */
-				if (recordfreespace)
-					freespace = PageGetHeapFreeSpace(page);
-				UnlockReleaseBuffer(buf);
-				if (recordfreespace)
-					RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-				continue;
-			}
+			do_prune = !lazy_scan_noprune(vacrel, buf, blkno, page,
+										  &recordfreespace);
 
 			/*
 			 * lazy_scan_noprune could not do all required processing.  Wait
 			 * for a cleanup lock, and call lazy_scan_prune in the usual way.
 			 */
-			Assert(vacrel->aggressive);
-			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-			LockBufferForCleanup(buf);
+			if (do_prune)
+			{
+				Assert(vacrel->aggressive);
+				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				LockBufferForCleanup(buf);
+			}
 		}
 
-		/* Check for new or empty pages before lazy_scan_prune call */
-		if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, false, vmbuffer))
+		tuples_already_deleted = vacrel->tuples_deleted;
+
+		/* Check for new or empty pages before lazy_scan_prune processing */
+		if (do_prune &&
+			lazy_scan_new_or_empty(vacrel, buf, blkno, page, false, vmbuffer))
 		{
 			/* Processed as new/empty page (lock and pin released) */
 			continue;
 		}
 
-		tuples_already_deleted = vacrel->tuples_deleted;
-		recordfreespace = false;
-
 		/*
 		 * Prune, freeze, and count tuples.
 		 *
@@ -1004,10 +994,10 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * were pruned some time earlier.  Also considers freezing XIDs in the
 		 * tuple headers of remaining items with storage.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page,
-						vmbuffer, all_visible_according_to_vm,
-						&recordfreespace);
-
+		if (do_prune)
+			lazy_scan_prune(vacrel, buf, blkno, page,
+							vmbuffer, all_visible_according_to_vm,
+							&recordfreespace);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1387,6 +1377,8 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
+	*recordfreespace = false;
+
 	/*
 	 * maxoff might be reduced following line pointer array truncation in
 	 * heap_page_prune.  That's safe for us to ignore, since the reclaimed
-- 
2.37.2

#20Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#18)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Hi,

On 2024-01-05 15:23:12 -0500, Robert Haas wrote:

On Fri, Jan 5, 2024 at 3:05 PM Andres Freund <andres@anarazel.de> wrote:

An aside:

As I think we chatted about before, I eventually would like the option to
remove index entries for a tuple during on-access pruning, for OLTP
workloads. I.e. before removing the tuple, construct the corresponding index
tuple, use it to look up index entries pointing to the tuple. If all the index
entries were found (they might not be, if they already were marked dead during
a lookup, or if an expression wasn't actually immutable), we can prune without
the full index scan. Obviously this would only be suitable for some
workloads, but it could be quite beneficial when you have huge indexes. The
reason I mention this is that then we'd have another source of marking items
unused during pruning.

I will be astonished if you can make this work well enough to avoid
huge regressions in plausible cases. There are plenty of cases where
we do a very thorough job opportunistically removing index tuples.

These days the AM is often involved with that, via
table_index_delete_tuples()/heap_index_delete_tuples(). That IIRC has to
happen before physically removing the already-marked-killed index entries. We
can't rely on being able to actually prune the heap page at that point, there
might be other backends pinning it, but often we will be able to. If we were
to prune below heap_index_delete_tuples(), we wouldn't need to recheck that
index again during "individual tuple pruning", if the to-be-marked-unused heap
tuple is one of the tuples passed to heap_index_delete_tuples(). Which
presumably will be very commonly the case.

At least for nbtree, we are much more aggressive about marking index entries
as killed, than about actually removing the index entries. "individual tuple
pruning" would have to look for killed-but-still-present index entries, not
just for "live" entries.

Greetings,

Andres Freund

#21Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#16)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 5, 2024 at 2:47 PM Andres Freund <andres@anarazel.de> wrote:

On 2024-01-04 17:37:27 -0500, Melanie Plageman wrote:

On Thu, Jan 4, 2024 at 3:03 PM Andres Freund <andres@anarazel.de> wrote:

On 2023-11-17 18:12:08 -0500, Melanie Plageman wrote:

@@ -972,20 +970,21 @@ lazy_scan_heap(LVRelState *vacrel)
continue;
}

-                     /* Collect LP_DEAD items in dead_items array, count tuples */
-                     if (lazy_scan_noprune(vacrel, buf, blkno, page, &hastup,
+                     /*
+                      * Collect LP_DEAD items in dead_items array, count tuples,
+                      * determine if rel truncation is safe
+                      */
+                     if (lazy_scan_noprune(vacrel, buf, blkno, page,
&recordfreespace))
{
Size            freespace = 0;
/*
* Processed page successfully (without cleanup lock) -- just
-                              * need to perform rel truncation and FSM steps, much like the
-                              * lazy_scan_prune case.  Don't bother trying to match its
-                              * visibility map setting steps, though.
+                              * need to update the FSM, much like the lazy_scan_prune case.
+                              * Don't bother trying to match its visibility map setting
+                              * steps, though.
*/
-                             if (hastup)
-                                     vacrel->nonempty_pages = blkno + 1;
if (recordfreespace)
freespace = PageGetHeapFreeSpace(page);
UnlockReleaseBuffer(buf);

The comment continues to say that we "determine if rel truncation is safe" -
but I don't see that? Oh, I see, it's done inside lazy_scan_noprune(). This
doesn't seem like a clear improvement to me. Particularly because it's only
set if lazy_scan_noprune() actually does something.

I don't get what the last sentence means ("Particularly because...").

Took me a second to understand myself again too, oops. What I think I meant is
that it seems error-prone that it's only set in some paths inside
lazy_scan_noprune(). Previously it was at least a bit clearer in
lazy_scan_heap() that it would be set for the different possible paths.

I see what you are saying. But if lazy_scan_noprune() doesn't do
anything, then it calls lazy_scan_prune(), which does set hastup and
update vacrel->nonempty_pages if needed.

Using hastup in lazy_scan_[no]prune() also means that they are
directly updating LVRelState after determining how to update it.
lazy_scan_heap() isn't doing responsible anymore. I don't see a reason
to be passing information back to lazy_scan_heap() to update
LVRelState when lazy_scan_[no]prune() has access to the LVRelState.

Importantly, once I combine the prune and freeze records, hastup is
set in heap_page_prune() instead of lazy_scan_prune() (that whole loop
in lazy_scan_prune() is eliminated()). And I don't like having to pass
hastup back through lazy_scan_prune() and then to lazy_scan_heap() so
that lazy_scan_heap() can use it (and use it to update a data
structure available in lazy_scan_prune()).

- Melanie

#22Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#10)
4 attachment(s)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Patch 0001 in the attached set addresses the following review feedback:

- pronto_reap renamed to no_indexes
- reduce the number of callers of heap_prune_record_unused() by calling
it from heap_prune_record_dead() when appropriate
- add unlikely hint to no_indexes test

I've also dropped the patch which moves the test of hastup into
lazy_scan_[no]prune(). In the future, I plan to remove the loop from
lazy_scan_prune() which sets hastup and set it instead in
heap_page_prune(). Changes to hastup's usage could be done with that
change instead of in this set.

The one review point I did not address in the code is that which I've
responded to inline below.

Though not required for the immediate reaping feature, I've included
patches 0002-0004 which are purely refactoring. These patches simplify
lazy_scan_heap() by moving the visibility map code into
lazy_scan_prune() and consolidating the updates to the FSM and
vacrel->nonempty_pages. I've proposed them in this thread because there
is an interdependence between eliminating the lazy_vacuum_heap_page()
call, moving the VM code, and combining the three FSM updates
(lazy_scan_prune() [index and no index] and lazy_scan_noprune()).

This illustrates the code clarity benefits of the change to mark line
pointers LP_UNUSED during pruning if the table has no indexes. I can
propose them in another thread if 0001 is merged.

On Thu, Jan 4, 2024 at 3:03 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2023-11-17 18:12:08 -0500, Melanie Plageman wrote:

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 14de8158d49..b578c32eeb6 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8803,8 +8803,13 @@ heap_xlog_prune(XLogReaderState *record)
nunused = (end - nowunused);
Assert(nunused >= 0);
-             /* Update all line pointers per the record, and repair fragmentation */
-             heap_page_prune_execute(buffer,
+             /*
+              * Update all line pointers per the record, and repair fragmentation.
+              * We always pass pronto_reap as true, because we don't know whether
+              * or not this option was used when pruning. This reduces the
+              * validation done on replay in an assert build.
+              */

Hm, that seems not great. Both because we loose validation and because it
seems to invite problems down the line, due to pronto_reap falsely being set
to true in heap_page_prune_execute().

I see what you are saying. With the name change to no_indexes, it
would be especially misleading to future programmers who might decide
to use that parameter for something else. Are you thinking it would be
okay to check the catalog to see if the table has indexes in
heap_xlog_prune() or are you suggesting that I add some kind of flag
to the prune WAL record?

- Melanie

Attachments:

v4-0004-Conslidate-FSM-update-code-in-lazy_scan_heap.patchtext/x-patch; charset=US-ASCII; name=v4-0004-Conslidate-FSM-update-code-in-lazy_scan_heap.patchDownload
From 4a2129de86aee6e1069b381eda47320d3a26f286 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 5 Jan 2024 15:18:11 -0500
Subject: [PATCH v4 4/4] Conslidate FSM update code in lazy_scan_heap()

lazy_scan_prune() and lazy_scan_noprune() both require the a FSM update
after returning. Consolidate these FSM updates, being careful not to
call lazy_scan_prune() or lazy_scan_new_or_empty() if
lazy_scan_noprune() has already finished processing the page.

Note that lazy_scan_prune() and lazy_scan_noprune() also both require
vacrel->nonempty_pages to be updated. Consolidate this as well.
---
 src/backend/access/heap/vacuumlazy.c | 51 ++++++++++++----------------
 1 file changed, 22 insertions(+), 29 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3a22192728b..dd8476b0fd8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -818,6 +818,7 @@ lazy_scan_heap(LVRelState *vacrel)
 				skipping_current_range;
 	bool		recordfreespace;
 	bool		hastup;
+	bool		do_prune = true;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -954,39 +955,30 @@ lazy_scan_heap(LVRelState *vacrel)
 				continue;
 			}
 
-			/* Collect LP_DEAD items in dead_items array, count tuples */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page, &hastup,
-								  &recordfreespace))
-			{
-				Size		freespace = 0;
-
-				/*
-				 * Processed page successfully (without cleanup lock) -- just
-				 * need to perform rel truncation and FSM steps, much like the
-				 * lazy_scan_prune case.  Don't bother trying to match its
-				 * visibility map setting steps, though.
-				 */
-				if (hastup)
-					vacrel->nonempty_pages = blkno + 1;
-				if (recordfreespace)
-					freespace = PageGetHeapFreeSpace(page);
-				UnlockReleaseBuffer(buf);
-				if (recordfreespace)
-					RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-				continue;
-			}
+			/*
+			 * Collect LP_DEAD items in dead_items array, count tuples,
+			 * determine if rel truncation is safe. If lazy_scan_noprune()
+			 * returns false, we must get a cleanup lock and call
+			 * lazy_scan_prune().
+			 */
+			do_prune = !lazy_scan_noprune(vacrel, buf, blkno, page,
+										  &hastup, &recordfreespace);
 
 			/*
 			 * lazy_scan_noprune could not do all required processing.  Wait
 			 * for a cleanup lock, and call lazy_scan_prune in the usual way.
 			 */
-			Assert(vacrel->aggressive);
-			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-			LockBufferForCleanup(buf);
+			if (do_prune)
+			{
+				Assert(vacrel->aggressive);
+				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				LockBufferForCleanup(buf);
+			}
 		}
 
-		/* Check for new or empty pages before lazy_scan_prune call */
-		if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, false, vmbuffer))
+		/* Check for new or empty pages before lazy_scan_prune processing */
+		if (do_prune &&
+			lazy_scan_new_or_empty(vacrel, buf, blkno, page, false, vmbuffer))
 		{
 			/* Processed as new/empty page (lock and pin released) */
 			continue;
@@ -1003,9 +995,10 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * were pruned some time earlier.  Also considers freezing XIDs in the
 		 * tuple headers of remaining items with storage.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page,
-						vmbuffer, all_visible_according_to_vm,
-						&recordfreespace, &hastup);
+		if (do_prune)
+			lazy_scan_prune(vacrel, buf, blkno, page,
+							vmbuffer, all_visible_according_to_vm,
+							&recordfreespace, &hastup);
 
 		/* Remember the location of the last page with nonremovable tuples */
 		if (hastup)
-- 
2.37.2

v4-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchDownload
From 6a6a78640d69b5309d8d1fd14bde3b38eee1cb79 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Nov 2023 14:10:05 -0500
Subject: [PATCH v4 1/4] Set would-be dead items LP_UNUSED while pruning

If there are no indexes on a relation, items can be marked LP_UNUSED
instead of LP_DEAD during lazy_scan_prune(). This avoids a separate
invocation of lazy_vacuum_heap_page() and saves a vacuum WAL record.

To accomplish this, pass lazy_scan_prune() a new parameter, no_indexes,
which indicates that dead line pointers should be set to LP_UNUSED
during pruning, allowing earlier reaping of tuples.

Because we don't update the freespace map until after dropping the lock
on the buffer and we need the lock while we update the visibility map,
save our intent to update the freespace map in output parameter
recordfreespace. This is not added to the LVPagePruneState because
future commits will eliminate the LVPagePruneState.

Discussion: https://postgr.es/m/CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw%40mail.gmail.com
---
 src/backend/access/heap/heapam.c     |   9 +-
 src/backend/access/heap/pruneheap.c  |  71 +++++++++---
 src/backend/access/heap/vacuumlazy.c | 156 +++++++++++----------------
 src/include/access/heapam.h          |   3 +-
 4 files changed, 126 insertions(+), 113 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 707460a5364..27d87b619f9 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8810,8 +8810,13 @@ heap_xlog_prune(XLogReaderState *record)
 		nunused = (end - nowunused);
 		Assert(nunused >= 0);
 
-		/* Update all line pointers per the record, and repair fragmentation */
-		heap_page_prune_execute(buffer,
+		/*
+		 * Update all line pointers per the record, and repair fragmentation.
+		 * We always pass no_indexes as true, because we don't know whether or
+		 * not this option was used when pruning. This reduces the validation
+		 * done on replay in an assert build.
+		 */
+		heap_page_prune_execute(buffer, true,
 								redirected, nredirected,
 								nowdead, ndead,
 								nowunused, nunused);
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3e0a1a260e6..80e3db873e4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -35,6 +35,8 @@ typedef struct
 
 	/* tuple visibility test, initialized for the relation */
 	GlobalVisState *vistest;
+	/* whether or not dead items can be set LP_UNUSED during pruning */
+	bool		no_indexes;
 
 	TransactionId new_prune_xid;	/* new prune hint value for page */
 	TransactionId snapshotConflictHorizon;	/* latest xid removed */
@@ -148,7 +150,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			PruneResult presult;
 
-			heap_page_prune(relation, buffer, vistest, &presult, NULL);
+			heap_page_prune(relation, buffer, vistest, false,
+							&presult, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -193,6 +196,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * (see heap_prune_satisfies_vacuum and
  * HeapTupleSatisfiesVacuum).
  *
+ * no_indexes indicates whether or not dead items can be set LP_UNUSED during
+ * pruning.
+ *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -203,6 +209,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 void
 heap_page_prune(Relation relation, Buffer buffer,
 				GlobalVisState *vistest,
+				bool no_indexes,
 				PruneResult *presult,
 				OffsetNumber *off_loc)
 {
@@ -227,6 +234,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 	prstate.new_prune_xid = InvalidTransactionId;
 	prstate.rel = relation;
 	prstate.vistest = vistest;
+	prstate.no_indexes = no_indexes;
 	prstate.snapshotConflictHorizon = InvalidTransactionId;
 	prstate.nredirected = prstate.ndead = prstate.nunused = 0;
 	memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -306,9 +314,9 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (off_loc)
 			*off_loc = offnum;
 
-		/* Nothing to do if slot is empty or already dead */
+		/* Nothing to do if slot is empty */
 		itemid = PageGetItemId(page, offnum);
-		if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+		if (!ItemIdIsUsed(itemid))
 			continue;
 
 		/* Process this item or chain of items */
@@ -330,7 +338,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 		 * Apply the planned item changes, then repair page fragmentation, and
 		 * update the page's hint bit about whether it has free line pointers.
 		 */
-		heap_page_prune_execute(buffer,
+		heap_page_prune_execute(buffer, prstate.no_indexes,
 								prstate.redirected, prstate.nredirected,
 								prstate.nowdead, prstate.ndead,
 								prstate.nowunused, prstate.nunused);
@@ -581,7 +589,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * function.)
 		 */
 		if (ItemIdIsDead(lp))
+		{
+			/*
+			 * If the relation has no indexes, we can set dead line pointers
+			 * LP_UNUSED now. We don't increment ndeleted here since the LP
+			 * was already marked dead.
+			 */
+			if (unlikely(prstate->no_indexes))
+				heap_prune_record_unused(prstate, offnum);
+
 			break;
+		}
 
 		Assert(ItemIdIsNormal(lp));
 		htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -726,7 +744,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * item.  This can happen if the loop in heap_page_prune caused us to
 		 * visit the dead successor of a redirect item before visiting the
 		 * redirect item.  We can clean up by setting the redirect item to
-		 * DEAD state.
+		 * DEAD state or LP_UNUSED if the table has no indexes.
 		 */
 		heap_prune_record_dead(prstate, rootoffnum);
 	}
@@ -767,6 +785,17 @@ heap_prune_record_redirect(PruneState *prstate,
 static void
 heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
 {
+	/*
+	 * If the relation has no indexes, we can remove dead tuples during
+	 * pruning instead of marking their line pointers dead. Set this tuple's
+	 * line pointer LP_UNUSED. We hint that tables with indexes are more
+	 * likely.
+	 */
+	if (unlikely(prstate->no_indexes))
+	{
+		heap_prune_record_unused(prstate, offnum);
+		return;
+	}
 	Assert(prstate->ndead < MaxHeapTuplesPerPage);
 	prstate->nowdead[prstate->ndead] = offnum;
 	prstate->ndead++;
@@ -792,7 +821,7 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
  * buffer.
  */
 void
-heap_page_prune_execute(Buffer buffer,
+heap_page_prune_execute(Buffer buffer, bool no_indexes,
 						OffsetNumber *redirected, int nredirected,
 						OffsetNumber *nowdead, int ndead,
 						OffsetNumber *nowunused, int nunused)
@@ -902,14 +931,28 @@ heap_page_prune_execute(Buffer buffer,
 
 #ifdef USE_ASSERT_CHECKING
 
-		/*
-		 * Only heap-only tuples can become LP_UNUSED during pruning.  They
-		 * don't need to be left in place as LP_DEAD items until VACUUM gets
-		 * around to doing index vacuuming.
-		 */
-		Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
-		htup = (HeapTupleHeader) PageGetItem(page, lp);
-		Assert(HeapTupleHeaderIsHeapOnly(htup));
+		if (no_indexes)
+		{
+			/*
+			 * If the relation has no indexes, we may set any of LP_NORMAL,
+			 * LP_REDIRECT, or LP_DEAD items to LP_UNUSED during pruning. We
+			 * can't check much here except that, if the item is LP_NORMAL, it
+			 * should have storage before it is set LP_UNUSED.
+			 */
+			Assert(!ItemIdIsNormal(lp) || ItemIdHasStorage(lp));
+		}
+		else
+		{
+			/*
+			 * If the relation has indexes, only heap-only tuples can become
+			 * LP_UNUSED during pruning. They don't need to be left in place
+			 * as LP_DEAD items until VACUUM gets around to doing index
+			 * vacuuming.
+			 */
+			Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
+			htup = (HeapTupleHeader) PageGetItem(page, lp);
+			Assert(HeapTupleHeaderIsHeapOnly(htup));
+		}
 #endif
 
 		ItemIdSetUnused(lp);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index abbba8947fa..1b26d63d3d6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -250,7 +250,8 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							LVPagePruneState *prunestate);
+							LVPagePruneState *prunestate,
+							bool *recordfreespace);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *hastup, bool *recordfreespace);
@@ -830,8 +831,10 @@ lazy_scan_heap(LVRelState *vacrel)
 				next_fsm_block_to_vacuum = 0;
 	VacDeadItems *dead_items = vacrel->dead_items;
 	Buffer		vmbuffer = InvalidBuffer;
+	int			tuples_already_deleted;
 	bool		next_unskippable_allvis,
 				skipping_current_range;
+	bool		recordfreespace;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -959,8 +962,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		hastup,
-						recordfreespace;
+			bool		hastup;
 
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
@@ -1010,6 +1012,8 @@ lazy_scan_heap(LVRelState *vacrel)
 			continue;
 		}
 
+		tuples_already_deleted = vacrel->tuples_deleted;
+
 		/*
 		 * Prune, freeze, and count tuples.
 		 *
@@ -1019,7 +1023,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * were pruned some time earlier.  Also considers freezing XIDs in the
 		 * tuple headers of remaining items with storage.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate);
+		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate, &recordfreespace);
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
@@ -1027,69 +1031,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (prunestate.hastup)
 			vacrel->nonempty_pages = blkno + 1;
 
-		if (vacrel->nindexes == 0)
-		{
-			/*
-			 * Consider the need to do page-at-a-time heap vacuuming when
-			 * using the one-pass strategy now.
-			 *
-			 * The one-pass strategy will never call lazy_vacuum().  The steps
-			 * performed here can be thought of as the one-pass equivalent of
-			 * a call to lazy_vacuum().
-			 */
-			if (prunestate.has_lpdead_items)
-			{
-				Size		freespace;
-
-				lazy_vacuum_heap_page(vacrel, blkno, buf, 0, vmbuffer);
-
-				/* Forget the LP_DEAD items that we just vacuumed */
-				dead_items->num_items = 0;
-
-				/*
-				 * Now perform FSM processing for blkno, and move on to next
-				 * page.
-				 *
-				 * Our call to lazy_vacuum_heap_page() will have considered if
-				 * it's possible to set all_visible/all_frozen independently
-				 * of lazy_scan_prune().  Note that prunestate was invalidated
-				 * by lazy_vacuum_heap_page() call.
-				 */
-				freespace = PageGetHeapFreeSpace(page);
-
-				UnlockReleaseBuffer(buf);
-				RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-
-				/*
-				 * Periodically perform FSM vacuuming to make newly-freed
-				 * space visible on upper FSM pages. FreeSpaceMapVacuumRange()
-				 * vacuums the portion of the freespace map covering heap
-				 * pages from start to end - 1. Include the block we just
-				 * vacuumed by passing it blkno + 1. Overflow isn't an issue
-				 * because MaxBlockNumber + 1 is InvalidBlockNumber which
-				 * causes FreeSpaceMapVacuumRange() to vacuum freespace map
-				 * pages covering the remainder of the relation.
-				 */
-				if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
-				{
-					FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-											blkno + 1);
-					next_fsm_block_to_vacuum = blkno + 1;
-				}
-
-				continue;
-			}
-
-			/*
-			 * There was no call to lazy_vacuum_heap_page() because pruning
-			 * didn't encounter/create any LP_DEAD items that needed to be
-			 * vacuumed.  Prune state has not been invalidated, so proceed
-			 * with prunestate-driven visibility map and FSM steps (just like
-			 * the two-pass strategy).
-			 */
-			Assert(dead_items->num_items == 0);
-		}
-
 		/*
 		 * Handle setting visibility map bit based on information from the VM
 		 * (as of last lazy_scan_skip() call), and from prunestate
@@ -1200,38 +1141,45 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
-		 * FSM
+		 * FSM.
+		 *
+		 * If we will likely do index vacuuming, wait until
+		 * lazy_vacuum_heap_rel() to save free space. This doesn't just save
+		 * us some cycles; it also allows us to record any additional free
+		 * space that lazy_vacuum_heap_page() will make available in cases
+		 * where it's possible to truncate the page's line pointer array.
+		 *
+		 * Note: It's not in fact 100% certain that we really will call
+		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
+		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
+		 * because it only happens in emergencies, or when there is very
+		 * little free space anyway. (Besides, we start recording free space
+		 * in the FSM once index vacuuming has been abandoned.)
 		 */
-		if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
-		{
-			/*
-			 * Wait until lazy_vacuum_heap_rel() to save free space.  This
-			 * doesn't just save us some cycles; it also allows us to record
-			 * any additional free space that lazy_vacuum_heap_page() will
-			 * make available in cases where it's possible to truncate the
-			 * page's line pointer array.
-			 *
-			 * Note: It's not in fact 100% certain that we really will call
-			 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip
-			 * index vacuuming (and so must skip heap vacuuming).  This is
-			 * deemed okay because it only happens in emergencies, or when
-			 * there is very little free space anyway. (Besides, we start
-			 * recording free space in the FSM once index vacuuming has been
-			 * abandoned.)
-			 *
-			 * Note: The one-pass (no indexes) case is only supposed to make
-			 * it this far when there were no LP_DEAD items during pruning.
-			 */
-			Assert(vacrel->nindexes > 0);
-			UnlockReleaseBuffer(buf);
-		}
-		else
+		if (recordfreespace)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
 			UnlockReleaseBuffer(buf);
 			RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
 		}
+		else
+			UnlockReleaseBuffer(buf);
+
+		/*
+		 * Periodically perform FSM vacuuming to make newly-freed space
+		 * visible on upper FSM pages. This is done after vacuuming if the
+		 * table has indexes.
+		 */
+		if (vacrel->nindexes == 0 &&
+			vacrel->tuples_deleted > tuples_already_deleted &&
+			(blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES))
+		{
+			FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
+									blkno);
+			next_fsm_block_to_vacuum = blkno;
+		}
+
 	}
 
 	vacrel->blkno = InvalidBlockNumber;
@@ -1543,7 +1491,8 @@ lazy_scan_prune(LVRelState *vacrel,
 				Buffer buf,
 				BlockNumber blkno,
 				Page page,
-				LVPagePruneState *prunestate)
+				LVPagePruneState *prunestate,
+				bool *recordfreespace)
 {
 	Relation	rel = vacrel->rel;
 	OffsetNumber offnum,
@@ -1555,6 +1504,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				live_tuples,
 				recently_dead_tuples;
 	HeapPageFreeze pagefrz;
+	bool		no_indexes;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1579,6 +1529,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	live_tuples = 0;
 	recently_dead_tuples = 0;
 
+	no_indexes = vacrel->nindexes == 0;
+
 	/*
 	 * Prune all HOT-update chains in this page.
 	 *
@@ -1587,7 +1539,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * lpdead_items's final value can be thought of as the number of tuples
 	 * that were deleted from indexes.
 	 */
-	heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum);
+	heap_page_prune(rel, buf, vacrel->vistest, no_indexes,
+					&presult, &vacrel->offnum);
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
@@ -1598,6 +1551,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	prunestate->all_visible = true;
 	prunestate->all_frozen = true;
 	prunestate->visibility_cutoff_xid = InvalidTransactionId;
+	*recordfreespace = false;
 
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
@@ -1918,6 +1872,15 @@ lazy_scan_prune(LVRelState *vacrel,
 	vacrel->lpdead_items += lpdead_items;
 	vacrel->live_tuples += live_tuples;
 	vacrel->recently_dead_tuples += recently_dead_tuples;
+
+	/*
+	 * If we will not do index vacuuming, either because we have no indexes,
+	 * because there is nothing to vacuum, or because do_index_vacuuming is
+	 * false, make sure we update the freespace map.
+	 */
+	if (vacrel->nindexes == 0 ||
+		!vacrel->do_index_vacuuming || lpdead_items == 0)
+		*recordfreespace = true;
 }
 
 /*
@@ -1937,7 +1900,8 @@ lazy_scan_prune(LVRelState *vacrel,
  *
  * See lazy_scan_prune for an explanation of hastup return flag.
  * recordfreespace flag instructs caller on whether or not it should do
- * generic FSM processing for page.
+ * generic FSM processing for page. vacrel is updated with page-level counts
+ * and to indicate whether or not rel truncation is safe.
  */
 static bool
 lazy_scan_noprune(LVRelState *vacrel,
@@ -2516,7 +2480,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
 
-	Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
+	Assert(vacrel->do_index_vacuuming);
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 932ec0d6f2b..867ec36cb64 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -320,9 +320,10 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
+							bool no_indexes,
 							PruneResult *presult,
 							OffsetNumber *off_loc);
-extern void heap_page_prune_execute(Buffer buffer,
+extern void heap_page_prune_execute(Buffer buffer, bool no_indexes,
 									OffsetNumber *redirected, int nredirected,
 									OffsetNumber *nowdead, int ndead,
 									OffsetNumber *nowunused, int nunused);
-- 
2.37.2

v4-0002-Shift-VM-update-code-to-lazy_scan_prune.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Shift-VM-update-code-to-lazy_scan_prune.patchDownload
From c84eb7b31b1f3a260c63247efe7fd8d895de36fc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 5 Jan 2024 11:12:58 -0500
Subject: [PATCH v4 2/4] Shift VM update code to lazy_scan_prune()

The visibility map is updated after lazy_scan_prune() returns in
lazy_scan_heap(). Half of the output parameters of lazy_scan_prune() are
used to update the visibility map. Moving that code into
lazy_scan_prune() simplifies lazy_scan_heap() and requires less
communication between the two. This is a pure lift and shift. No VM
update logic is changed.
---
 src/backend/access/heap/vacuumlazy.c | 229 ++++++++++++++-------------
 1 file changed, 116 insertions(+), 113 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1b26d63d3d6..6244aee331f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -250,8 +250,8 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							LVPagePruneState *prunestate,
-							bool *recordfreespace);
+							Buffer vmbuffer, bool all_visible_according_to_vm,
+							LVPagePruneState *prunestate, bool *recordfreespace);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *hastup, bool *recordfreespace);
@@ -1023,122 +1023,14 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * were pruned some time earlier.  Also considers freezing XIDs in the
 		 * tuple headers of remaining items with storage.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate, &recordfreespace);
-
-		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
+		lazy_scan_prune(vacrel, buf, blkno, page,
+						vmbuffer, all_visible_according_to_vm,
+						&prunestate, &recordfreespace);
 
 		/* Remember the location of the last page with nonremovable tuples */
 		if (prunestate.hastup)
 			vacrel->nonempty_pages = blkno + 1;
 
-		/*
-		 * Handle setting visibility map bit based on information from the VM
-		 * (as of last lazy_scan_skip() call), and from prunestate
-		 */
-		if (!all_visible_according_to_vm && prunestate.all_visible)
-		{
-			uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-			if (prunestate.all_frozen)
-			{
-				Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-				flags |= VISIBILITYMAP_ALL_FROZEN;
-			}
-
-			/*
-			 * It should never be the case that the visibility map page is set
-			 * while the page-level bit is clear, but the reverse is allowed
-			 * (if checksums are not enabled).  Regardless, set both bits so
-			 * that we get back in sync.
-			 *
-			 * NB: If the heap page is all-visible but the VM bit is not set,
-			 * we don't need to dirty the heap page.  However, if checksums
-			 * are enabled, we do need to make sure that the heap page is
-			 * dirtied before passing it to visibilitymap_set(), because it
-			 * may be logged.  Given that this situation should only happen in
-			 * rare cases after a crash, it is not worth optimizing.
-			 */
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, prunestate.visibility_cutoff_xid,
-							  flags);
-		}
-
-		/*
-		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
-		 * the page-level bit is clear.  However, it's possible that the bit
-		 * got cleared after lazy_scan_skip() was called, so we must recheck
-		 * with buffer lock before concluding that the VM is corrupt.
-		 */
-		else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-				 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-		{
-			elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-				 vacrel->relname, blkno);
-			visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-								VISIBILITYMAP_VALID_BITS);
-		}
-
-		/*
-		 * It's possible for the value returned by
-		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-		 * wrong for us to see tuples that appear to not be visible to
-		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
-		 * xmin value never moves backwards, but
-		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
-		 * returns a value that's unnecessarily small, so if we see that
-		 * contradiction it just means that the tuples that we think are not
-		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
-		 * is correct.
-		 *
-		 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE
-		 * set, however.
-		 */
-		else if (prunestate.has_lpdead_items && PageIsAllVisible(page))
-		{
-			elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-				 vacrel->relname, blkno);
-			PageClearAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-								VISIBILITYMAP_VALID_BITS);
-		}
-
-		/*
-		 * If the all-visible page is all-frozen but not marked as such yet,
-		 * mark it as all-frozen.  Note that all_frozen is only valid if
-		 * all_visible is true, so we must check both prunestate fields.
-		 */
-		else if (all_visible_according_to_vm && prunestate.all_visible &&
-				 prunestate.all_frozen &&
-				 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-		{
-			/*
-			 * Avoid relying on all_visible_according_to_vm as a proxy for the
-			 * page-level PD_ALL_VISIBLE bit being set, since it might have
-			 * become stale -- even when all_visible is set in prunestate
-			 */
-			if (!PageIsAllVisible(page))
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buf);
-			}
-
-			/*
-			 * Set the page all-frozen (and all-visible) in the VM.
-			 *
-			 * We can pass InvalidTransactionId as our visibility_cutoff_xid,
-			 * since a snapshotConflictHorizon sufficient to make everything
-			 * safe for REDO was logged when the page's tuples were frozen.
-			 */
-			Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
-		}
-
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
 		 * FSM.
@@ -1491,6 +1383,8 @@ lazy_scan_prune(LVRelState *vacrel,
 				Buffer buf,
 				BlockNumber blkno,
 				Page page,
+				Buffer vmbuffer,
+				bool all_visible_according_to_vm,
 				LVPagePruneState *prunestate,
 				bool *recordfreespace)
 {
@@ -1881,6 +1775,115 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0 ||
 		!vacrel->do_index_vacuuming || lpdead_items == 0)
 		*recordfreespace = true;
+
+	Assert(!prunestate->all_visible || !prunestate->has_lpdead_items);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last lazy_scan_skip() call), and from prunestate
+	 */
+	if (!all_visible_according_to_vm && prunestate->all_visible)
+	{
+		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
+
+		if (prunestate->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+			flags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled).  Regardless, set both bits so that we
+		 * get back in sync.
+		 *
+		 * NB: If the heap page is all-visible but the VM bit is not set, we
+		 * don't need to dirty the heap page.  However, if checksums are
+		 * enabled, we do need to make sure that the heap page is dirtied
+		 * before passing it to visibilitymap_set(), because it may be logged.
+		 * Given that this situation should only happen in rare cases after a
+		 * crash, it is not worth optimizing.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, prunestate->visibility_cutoff_xid,
+						  flags);
+	}
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after lazy_scan_skip() was called, so we must recheck with
+	 * buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
+			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 vacrel->relname, blkno);
+		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prunestate->has_lpdead_items && PageIsAllVisible(page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 vacrel->relname, blkno);
+		PageClearAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * If the all-visible page is all-frozen but not marked as such yet, mark
+	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
+	 * true, so we must check both prunestate fields.
+	 */
+	else if (all_visible_according_to_vm && prunestate->all_visible &&
+			 prunestate->all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	{
+		/*
+		 * Avoid relying on all_visible_according_to_vm as a proxy for the
+		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
+		 * stale -- even when all_visible is set in prunestate
+		 */
+		if (!PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
+		/*
+		 * Set the page all-frozen (and all-visible) in the VM.
+		 *
+		 * We can pass InvalidTransactionId as our visibility_cutoff_xid,
+		 * since a snapshotConflictHorizon sufficient to make everything safe
+		 * for REDO was logged when the page's tuples were frozen.
+		 */
+		Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, InvalidTransactionId,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
+	}
 }
 
 /*
-- 
2.37.2

v4-0003-Inline-LVPagePruneState-members-in-lazy_scan_prun.patchtext/x-patch; charset=US-ASCII; name=v4-0003-Inline-LVPagePruneState-members-in-lazy_scan_prun.patchDownload
From e53030caec9adabd05cb9237b70936d186b2368d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 5 Jan 2024 11:47:31 -0500
Subject: [PATCH v4 3/4] Inline LVPagePruneState members in lazy_scan_prune()

After moving the code to update the visibility map from lazy_scan_heap()
into lazy_scan_prune, most of the members of LVPagePruneState are no
longer required. Make them local variables in lazy_scan_prune().

recordfreespace remains as an output parameter and hastup is added as an
output parameter of lazy_scan_prune(). This mirrors lazy_scan_noprune().
Future commits will consolidate lazy_scan_prune() and
lazy_scan_noprune()'s update of the FSM and setting of
vacrel->nonempty_pages.
---
 src/backend/access/heap/vacuumlazy.c | 131 +++++++++++++--------------
 src/tools/pgindent/typedefs.list     |   1 -
 2 files changed, 62 insertions(+), 70 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6244aee331f..3a22192728b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -212,24 +212,6 @@ typedef struct LVRelState
 	int64		missed_dead_tuples; /* # removable, but not removed */
 } LVRelState;
 
-/*
- * State returned by lazy_scan_prune()
- */
-typedef struct LVPagePruneState
-{
-	bool		hastup;			/* Page prevents rel truncation? */
-	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
-
-	/*
-	 * State describes the proper VM bit states to set for the page following
-	 * pruning and freezing.  all_visible implies !has_lpdead_items, but don't
-	 * trust all_frozen result unless all_visible is also set to true.
-	 */
-	bool		all_visible;	/* Every item visible to all? */
-	bool		all_frozen;		/* provided all_visible is also true */
-	TransactionId visibility_cutoff_xid;	/* For recovery conflicts */
-} LVPagePruneState;
-
 /* Struct for saving and restoring vacuum error information. */
 typedef struct LVSavedErrInfo
 {
@@ -251,7 +233,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
-							LVPagePruneState *prunestate, bool *recordfreespace);
+							bool *recordfreespace, bool *hastup);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *hastup, bool *recordfreespace);
@@ -835,6 +817,7 @@ lazy_scan_heap(LVRelState *vacrel)
 	bool		next_unskippable_allvis,
 				skipping_current_range;
 	bool		recordfreespace;
+	bool		hastup;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -857,7 +840,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		Buffer		buf;
 		Page		page;
 		bool		all_visible_according_to_vm;
-		LVPagePruneState prunestate;
 
 		if (blkno == next_unskippable_block)
 		{
@@ -962,8 +944,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		hastup;
-
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
 			/* Check for new or empty pages before lazy_scan_noprune call */
@@ -1025,10 +1005,10 @@ lazy_scan_heap(LVRelState *vacrel)
 		 */
 		lazy_scan_prune(vacrel, buf, blkno, page,
 						vmbuffer, all_visible_according_to_vm,
-						&prunestate, &recordfreespace);
+						&recordfreespace, &hastup);
 
 		/* Remember the location of the last page with nonremovable tuples */
-		if (prunestate.hastup)
+		if (hastup)
 			vacrel->nonempty_pages = blkno + 1;
 
 		/*
@@ -1385,8 +1365,8 @@ lazy_scan_prune(LVRelState *vacrel,
 				Page page,
 				Buffer vmbuffer,
 				bool all_visible_according_to_vm,
-				LVPagePruneState *prunestate,
-				bool *recordfreespace)
+				bool *recordfreespace,
+				bool *hastup)
 {
 	Relation	rel = vacrel->rel;
 	OffsetNumber offnum,
@@ -1394,11 +1374,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	ItemId		itemid;
 	PruneResult presult;
 	int			tuples_frozen,
-				lpdead_items,
 				live_tuples,
 				recently_dead_tuples;
 	HeapPageFreeze pagefrz;
 	bool		no_indexes;
+	bool		all_visible,
+				all_frozen;
+	TransactionId visibility_cutoff_xid;
+	int			lpdead_items;	/* includes existing LP_DEAD items */
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1438,14 +1421,23 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
-	 * requiring freezing among remaining tuples with storage
+	 * requiring freezing among remaining tuples with storage. lpdead_items
+	 * includes existing LP_DEAD items.
 	 */
-	prunestate->hastup = false;
-	prunestate->has_lpdead_items = false;
-	prunestate->all_visible = true;
-	prunestate->all_frozen = true;
-	prunestate->visibility_cutoff_xid = InvalidTransactionId;
 	*recordfreespace = false;
+	*hastup = false;
+	/* for recovery conflicts */
+	visibility_cutoff_xid = InvalidTransactionId;
+
+	/*
+	 * We will update the VM after collecting LP_DEAD items and freezing
+	 * tuples. Keep track of whether or not the page is all_visible and
+	 * all_frozen and use this information to update the VM. all_visible
+	 * implies lpdead_items == 0, but don't trust all_frozen result unless
+	 * all_visible is also set to true.
+	 */
+	all_visible = true;
+	all_frozen = true;
 
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
@@ -1468,7 +1460,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		if (ItemIdIsRedirected(itemid))
 		{
 			/* page makes rel truncation unsafe */
-			prunestate->hastup = true;
+			*hastup = true;
 			continue;
 		}
 
@@ -1533,13 +1525,13 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * asynchronously. See SetHintBits for more info. Check that
 				 * the tuple is hinted xmin-committed because of that.
 				 */
-				if (prunestate->all_visible)
+				if (all_visible)
 				{
 					TransactionId xmin;
 
 					if (!HeapTupleHeaderXminCommitted(htup))
 					{
-						prunestate->all_visible = false;
+						all_visible = false;
 						break;
 					}
 
@@ -1551,14 +1543,14 @@ lazy_scan_prune(LVRelState *vacrel,
 					if (!TransactionIdPrecedes(xmin,
 											   vacrel->cutoffs.OldestXmin))
 					{
-						prunestate->all_visible = false;
+						all_visible = false;
 						break;
 					}
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, prunestate->visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
 						TransactionIdIsNormal(xmin))
-						prunestate->visibility_cutoff_xid = xmin;
+						visibility_cutoff_xid = xmin;
 				}
 				break;
 			case HEAPTUPLE_RECENTLY_DEAD:
@@ -1569,7 +1561,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * pruning.)
 				 */
 				recently_dead_tuples++;
-				prunestate->all_visible = false;
+				all_visible = false;
 				break;
 			case HEAPTUPLE_INSERT_IN_PROGRESS:
 
@@ -1580,11 +1572,11 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * results.  This assumption is a bit shaky, but it is what
 				 * acquire_sample_rows() does, so be consistent.
 				 */
-				prunestate->all_visible = false;
+				all_visible = false;
 				break;
 			case HEAPTUPLE_DELETE_IN_PROGRESS:
 				/* This is an expected case during concurrent vacuum */
-				prunestate->all_visible = false;
+				all_visible = false;
 
 				/*
 				 * Count such rows as live.  As above, we assume the deleting
@@ -1598,7 +1590,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				break;
 		}
 
-		prunestate->hastup = true;	/* page makes rel truncation unsafe */
+		*hastup = true;			/* page makes rel truncation unsafe */
 
 		/* Tuple with storage -- consider need to freeze */
 		if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
@@ -1614,7 +1606,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * definitely cannot be set all-frozen in the visibility map later on
 		 */
 		if (!totally_frozen)
-			prunestate->all_frozen = false;
+			all_frozen = false;
 	}
 
 	/*
@@ -1632,7 +1624,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * page all-frozen afterwards (might not happen until final heap pass).
 	 */
 	if (pagefrz.freeze_required || tuples_frozen == 0 ||
-		(prunestate->all_visible && prunestate->all_frozen &&
+		(all_visible && all_frozen &&
 		 fpi_before != pgWalUsage.wal_fpi))
 	{
 		/*
@@ -1670,11 +1662,11 @@ lazy_scan_prune(LVRelState *vacrel,
 			 * once we're done with it.  Otherwise we generate a conservative
 			 * cutoff by stepping back from OldestXmin.
 			 */
-			if (prunestate->all_visible && prunestate->all_frozen)
+			if (all_visible && all_frozen)
 			{
 				/* Using same cutoff when setting VM is now unnecessary */
-				snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
-				prunestate->visibility_cutoff_xid = InvalidTransactionId;
+				snapshotConflictHorizon = visibility_cutoff_xid;
+				visibility_cutoff_xid = InvalidTransactionId;
 			}
 			else
 			{
@@ -1697,7 +1689,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 */
 		vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
 		vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
-		prunestate->all_frozen = false;
+		all_frozen = false;
 		tuples_frozen = 0;		/* avoid miscounts in instrumentation */
 	}
 
@@ -1710,16 +1702,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	 */
 #ifdef USE_ASSERT_CHECKING
 	/* Note that all_frozen value does not matter when !all_visible */
-	if (prunestate->all_visible && lpdead_items == 0)
+	if (all_visible && lpdead_items == 0)
 	{
-		TransactionId cutoff;
-		bool		all_frozen;
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen))
+		if (!heap_page_is_all_visible(vacrel, buf,
+									  &debug_cutoff, &debug_all_frozen))
 			Assert(false);
 
-		Assert(!TransactionIdIsValid(cutoff) ||
-			   cutoff == prunestate->visibility_cutoff_xid);
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == visibility_cutoff_xid);
 	}
 #endif
 
@@ -1732,7 +1725,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		ItemPointerData tmp;
 
 		vacrel->lpdead_item_pages++;
-		prunestate->has_lpdead_items = true;
 
 		ItemPointerSetBlockNumber(&tmp, blkno);
 
@@ -1757,7 +1749,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * Now that freezing has been finalized, unset all_visible.  It needs
 		 * to reflect the present state of things, as expected by our caller.
 		 */
-		prunestate->all_visible = false;
+		all_visible = false;
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1776,19 +1768,20 @@ lazy_scan_prune(LVRelState *vacrel,
 		!vacrel->do_index_vacuuming || lpdead_items == 0)
 		*recordfreespace = true;
 
-	Assert(!prunestate->all_visible || !prunestate->has_lpdead_items);
+	Assert(!all_visible || lpdead_items == 0);
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last lazy_scan_skip() call), and from prunestate
+	 * of last lazy_scan_skip() call), and from all_visible and all_frozen
+	 * variables
 	 */
-	if (!all_visible_according_to_vm && prunestate->all_visible)
+	if (!all_visible_according_to_vm && all_visible)
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
-		if (prunestate->all_frozen)
+		if (all_frozen)
 		{
-			Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
@@ -1808,7 +1801,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
 		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-						  vmbuffer, prunestate->visibility_cutoff_xid,
+						  vmbuffer, visibility_cutoff_xid,
 						  flags);
 	}
 
@@ -1841,7 +1834,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
 	 * however.
 	 */
-	else if (prunestate->has_lpdead_items && PageIsAllVisible(page))
+	else if (lpdead_items > 0 && PageIsAllVisible(page))
 	{
 		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
 			 vacrel->relname, blkno);
@@ -1854,16 +1847,16 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both prunestate fields.
+	 * true, so we must check both all_visible and all_frozen.
 	 */
-	else if (all_visible_according_to_vm && prunestate->all_visible &&
-			 prunestate->all_frozen &&
+	else if (all_visible_according_to_vm && all_visible &&
+			 all_frozen &&
 			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		/*
 		 * Avoid relying on all_visible_according_to_vm as a proxy for the
 		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set in prunestate
+		 * stale -- even when all_visible is set
 		 */
 		if (!PageIsAllVisible(page))
 		{
@@ -1878,7 +1871,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * since a snapshotConflictHorizon sufficient to make everything safe
 		 * for REDO was logged when the page's tuples were frozen.
 		 */
-		Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+		Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
 						  vmbuffer, InvalidTransactionId,
 						  VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5fd46b7bd1f..1edafd0f516 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1405,7 +1405,6 @@ LPVOID
 LPWSTR
 LSEG
 LUID
-LVPagePruneState
 LVRelState
 LVSavedErrInfo
 LWLock
-- 
2.37.2

In reply to: Robert Haas (#18)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 5, 2024 at 12:23 PM Robert Haas <robertmhaas@gmail.com> wrote:

As I think we chatted about before, I eventually would like the option to
remove index entries for a tuple during on-access pruning, for OLTP
workloads. I.e. before removing the tuple, construct the corresponding index
tuple, use it to look up index entries pointing to the tuple. If all the index
entries were found (they might not be, if they already were marked dead during
a lookup, or if an expression wasn't actually immutable), we can prune without
the full index scan. Obviously this would only be suitable for some
workloads, but it could be quite beneficial when you have huge indexes. The
reason I mention this is that then we'd have another source of marking items
unused during pruning.

I will be astonished if you can make this work well enough to avoid
huge regressions in plausible cases. There are plenty of cases where
we do a very thorough job opportunistically removing index tuples.

Right. In particular, bottom-up index deletion works well because it
adds a kind of natural backpressure to one important special case (the
case of non-HOT updates that don't "logically change" any indexed
column). It isn't really all that "opportunistic" in my understanding
of the term -- the overall effect is to *systematically* control bloat
in a way that is actually quite reliable. Like you, I have my doubts
that it would be valuable to be more proactive about deleting dead
index tuples that are just random dead tuples. There may be a great
many dead index tuples diffusely spread across an index -- these can
be quite harmless, and not worth proactively cleaning up (even at a
fairly low cost). What we mostly need to worry about is *concentrated*
build-up of dead index tuples in particular leaf pages.

A natural question to ask is: what cases remain, where we could stand
to add more backpressure? What other "special case" do we not yet
address? I think that retail index tuple deletion could work well as
part of a limited form of "transaction rollback" that cleans up after
a just-aborted transaction, within the backend that executed the
transaction itself. I suspect that this case now has outsized
importance, precisely because it's the one remaining case where the
system accumulates index bloat without any sort of natural
backpressure. Making the transaction/backend that creates bloat
directly responsible for proactively cleaning it up tends to have a
stabilizing effect over time. The system is made to live within its
means.

We could even fully reverse heap page line pointer bloat under this
"transaction rollback" scheme -- I bet that aborted xacts are a
disproportionate source of line pointer bloat. Barring a hard crash,
or a very large transaction, we could "undo" the physical changes to
relations before permitting the backend to retry the transaction from
scratch. This would just work as an optimization.

--
Peter Geoghegan

In reply to: Andres Freund (#20)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 5, 2024 at 12:57 PM Andres Freund <andres@anarazel.de> wrote:

I will be astonished if you can make this work well enough to avoid
huge regressions in plausible cases. There are plenty of cases where
we do a very thorough job opportunistically removing index tuples.

These days the AM is often involved with that, via
table_index_delete_tuples()/heap_index_delete_tuples(). That IIRC has to
happen before physically removing the already-marked-killed index entries. We
can't rely on being able to actually prune the heap page at that point, there
might be other backends pinning it, but often we will be able to. If we were
to prune below heap_index_delete_tuples(), we wouldn't need to recheck that
index again during "individual tuple pruning", if the to-be-marked-unused heap
tuple is one of the tuples passed to heap_index_delete_tuples(). Which
presumably will be very commonly the case.

I don't understand. Making heap_index_delete_tuples() prune heap pages
in passing such that we can ultimately mark dead heap tuples LP_UNUSED
necessitates high level coordination -- it has to happen at a level
much higher than heap_index_delete_tuples(). In other words, making it
all work safely requires the same high level context that makes it
safe for VACUUM to set a stub LP_DEAD line pointer to LP_UNUSED (index
tuples must never be allowed to point to TIDs/heap line pointers that
can be concurrently recycled).

Obviously my idea of "a limited form of transaction rollback" has the
required high-level context available, which is the crucial factor
that allows it to safely reverse all bloat -- even line pointer bloat
(which is traditionally something that only VACUUM can do safely). I
have a hard time imagining a scheme that can do that outside of VACUUM
without directly targeting some special case, such as the case that
I'm calling "transaction rollback". In other words, I have a hard time
imagining how this would ever be practical as part of any truly
opportunistic cleanup process. AFAICT the dependency between indexes
and the heap is just too delicate for such a scheme to ever really be
practical.

At least for nbtree, we are much more aggressive about marking index entries
as killed, than about actually removing the index entries. "individual tuple
pruning" would have to look for killed-but-still-present index entries, not
just for "live" entries.

These days having index tuples directly marked LP_DEAD is surprisingly
unimportant to heap_index_delete_tuples(). The batching optimization
implemented by _bt_simpledel_pass() tends to be very effective in
practice. We only need to have the right *general* idea about which
heap pages to visit -- which heap pages will yield some number of
deletable index tuples.

--
Peter Geoghegan

In reply to: Melanie Plageman (#14)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 5, 2024 at 12:59 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

But you can just as easily turn this argument on its head, can't you?
In general, except for HOT tuples, line pointers are marked dead by
pruning and unused by vacuum. Here you want to turn it on its head and
make pruning do what would normally be vacuum's responsibility.

I actually think we are going to want to stop referring to these steps
as pruning and vacuuming. It is confusing because vacuuming refers to
the whole process of doing garbage collection on the table and also to
the specific step of setting dead line pointers unused. If we called
these steps say, pruning and reaping, that may be more clear.

What about index VACUUM records? Should they be renamed to REAP records, too?

Vacuuming consists of three phases -- the first pass, index vacuuming,
and the second pass. I don't think we should dictate what happens in
each pass. That is, we shouldn't expect only pruning to happen in the
first pass and only reaping to happen in the second pass.

Why not? It's not self-evident that it matters much either way. I
don't see it being worth the complexity (which is not to say that I'm
opposed to what you're trying to do here).

Note that we only need a cleanup for the first heap pass right now
(not the second heap pass). So if you're going to prune in the second
heap pass, you're going to have to add a mechanism that makes it safe
(by acquiring a cleanup lock once we decide that we actually want to
prune, say). Or maybe you'd just give up on the fact that we don't
need cleanup locks for the second hea pass these days instead (which
seems like a step backwards).

For example,
I think Andres has previously proposed doing another round of pruning
after index vacuuming. The second pass/third phase is distinguished
primarily by being after index vacuuming.

I think of the second pass/third phase as being very similar to index vacuuming.

Both processes/phases don't require cleanup locks (actually nbtree
VACUUM does require cleanup locks, but the reasons why are pretty
esoteric, and not shared by every index AM). And, both
processes/phases don't need to generate their own recovery conflicts.
Neither type of WAL record requires a snapshotConflictHorizon field of
its own, since we can safely assume that some PRUNE record must have
taken care of all that earlier on.

--
Peter Geoghegan

#26Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#20)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 5, 2024 at 3:57 PM Andres Freund <andres@anarazel.de> wrote:

I will be astonished if you can make this work well enough to avoid
huge regressions in plausible cases. There are plenty of cases where
we do a very thorough job opportunistically removing index tuples.

These days the AM is often involved with that, via
table_index_delete_tuples()/heap_index_delete_tuples(). That IIRC has to
happen before physically removing the already-marked-killed index entries. We
can't rely on being able to actually prune the heap page at that point, there
might be other backends pinning it, but often we will be able to. If we were
to prune below heap_index_delete_tuples(), we wouldn't need to recheck that
index again during "individual tuple pruning", if the to-be-marked-unused heap
tuple is one of the tuples passed to heap_index_delete_tuples(). Which
presumably will be very commonly the case.

At least for nbtree, we are much more aggressive about marking index entries
as killed, than about actually removing the index entries. "individual tuple
pruning" would have to look for killed-but-still-present index entries, not
just for "live" entries.

I don't want to derail this thread, but I don't really see what you
have in mind here. The first paragraph sounds like you're imagining
that while pruning the index entries we might jump over to the heap
and clean things up there, too, but that seems like it wouldn't work
if the table has more than one index. I thought you were talking about
starting with a heap tuple and bouncing around to every index to see
if we can find index pointers to kill in every one of them. That
*could* work out, but you only need one index to have been
opportunistically cleaned up in order for it to fail to work out.
There might well be some workloads where that's often the case, but
the regressions in the workloads where it isn't the case seem like
they would be rather substantial, because doing an extra lookup in
every index for each heap tuple visited sounds pricey.

--
Robert Haas
EDB: http://www.enterprisedb.com

#27Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#19)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 5, 2024 at 3:34 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Yes, attached is a patch set which does this. My previous patchset
already reduced the number of places we unlock the buffer and update
the freespace map in lazy_scan_heap(). This patchset combines the
lazy_scan_prune() and lazy_scan_noprune() FSM update cases. I also
have a version which moves the freespace map updates into
lazy_scan_prune() and lazy_scan_noprune() -- eliminating all of these
from lazy_scan_heap(). This is arguably more clear. But Andres
mentioned he wanted fewer places unlocking the buffer and updating the
FSM.

Hmm, interesting. I haven't had time to study this fully today, but I
think 0001 looks fine and could just be committed. Hooray for killing
useless variables with dumb names.

This part of 0002 makes me very, very uncomfortable:

+               /*
+                * Update all line pointers per the record, and repair
fragmentation.
+                * We always pass no_indexes as true, because we don't
know whether or
+                * not this option was used when pruning. This reduces
the validation
+                * done on replay in an assert build.
+                */
+               heap_page_prune_execute(buffer, true,

redirected, nredirected,
nowdead, ndead,

nowunused, nunused);

The problem that I have with this is that we're essentially saying
that it's ok to lie to heap_page_prune_execute because we know how
it's going to use the information, and therefore we know that the lie
is harmless. But that's not how things are supposed to work. We should
either find a way to tell the truth, or change the name of the
parameter so that it's not a lie, or change the function so that it
doesn't need this parameter in the first place, or something. I can
occasionally stomach this sort of lying as a last resort when there's
no realistic way of adapting the code being called, but that's surely
not the case here -- this is a newborn parameter, and it shouldn't be
a lie on day 1. Just imagine if some future developer thought that the
no_indexes parameter meant that the relation actually didn't have
indexes (the nerve of them!).

I took a VERY fast look through the rest of the patch set and I think
that the overall direction looks like it's probably reasonable, but
that's a very preliminary conclusion which I reserve the right to
revise after studying further. @Andres: Are you planning to
review/commit any of this? Are you hoping that I'm going to do that?
Somebody else want to jump in here?

--
Robert Haas
EDB: http://www.enterprisedb.com

#28Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#27)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Mon, Jan 08, 2024 at 03:50:47PM -0500, Robert Haas wrote:

Hmm, interesting. I haven't had time to study this fully today, but I
think 0001 looks fine and could just be committed. Hooray for killing
useless variables with dumb names.

I've been looking at 0001 a couple of weeks ago and thought that it
was fine because there's only one caller of lazy_scan_prune() and one
caller of lazy_scan_noprune() so all the code paths were covered.

+   /* rel truncation is unsafe */
+   if (hastup)
+       vacrel->nonempty_pages = blkno + 1;

Except for this comment that I found misleading because this is not
about the fact that truncation is unsafe, it's about correctly
tracking the the last block where we have tuples to ensure a correct
truncation. Perhaps this could just reuse "Remember the location of
the last page with nonremovable tuples"? If people object to that,
feel free.
--
Michael

#29Melanie Plageman
melanieplageman@gmail.com
In reply to: Michael Paquier (#28)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Tue, Jan 9, 2024 at 12:56 AM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Jan 08, 2024 at 03:50:47PM -0500, Robert Haas wrote:

Hmm, interesting. I haven't had time to study this fully today, but I
think 0001 looks fine and could just be committed. Hooray for killing
useless variables with dumb names.

I've been looking at 0001 a couple of weeks ago and thought that it
was fine because there's only one caller of lazy_scan_prune() and one
caller of lazy_scan_noprune() so all the code paths were covered.

+   /* rel truncation is unsafe */
+   if (hastup)
+       vacrel->nonempty_pages = blkno + 1;

Andres had actually said that he didn't like pushing the update of
nonempty_pages into lazy_scan_[no]prune(). So, my v4 patch set
eliminates this.

I can see an argument for doing both the update of
vacrel->nonempty_pages and the FSM updates in lazy_scan_[no]prune()
because it eliminates some of the back-and-forth between the
block-specific functions and lazy_scan_heap().
lazy_scan_new_or_empty() has special logic for deciding how to update
the FSM -- so that remains in lazy_scan_new_or_empty() either way.

On the other hand, the comment above lazy_scan_new_or_empty() says we
can get rid of this special handling if we make relation extension
crash safe. Then it would make more sense to have a consolidated FSM
update in lazy_scan_heap(). However it does still mean that we repeat
the "UnlockReleaseBuffer()" and FSM update code in even more places.

Ultimately I can see arguments for and against. Is it better to avoid
having the same few lines of code in two places or avoid unneeded
communication between page-level functions and lazy_scan_heap()?

Except for this comment that I found misleading because this is not
about the fact that truncation is unsafe, it's about correctly
tracking the the last block where we have tuples to ensure a correct
truncation. Perhaps this could just reuse "Remember the location of
the last page with nonremovable tuples"? If people object to that,
feel free.

I agree the comment could be better. But, simply saying that it tracks
the last page with non-removable tuples makes it less clear how
important this is. It makes it sound like it could be simply for stats
purposes. I'll update the comment to something that includes that
sentiment but is more exact than "rel truncation is unsafe".

- Melanie

#30Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#27)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Mon, Jan 8, 2024 at 3:51 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jan 5, 2024 at 3:34 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

This part of 0002 makes me very, very uncomfortable:

+               /*
+                * Update all line pointers per the record, and repair
fragmentation.
+                * We always pass no_indexes as true, because we don't
know whether or
+                * not this option was used when pruning. This reduces
the validation
+                * done on replay in an assert build.
+                */
+               heap_page_prune_execute(buffer, true,

redirected, nredirected,
nowdead, ndead,

nowunused, nunused);

The problem that I have with this is that we're essentially saying
that it's ok to lie to heap_page_prune_execute because we know how
it's going to use the information, and therefore we know that the lie
is harmless. But that's not how things are supposed to work. We should
either find a way to tell the truth, or change the name of the
parameter so that it's not a lie, or change the function so that it
doesn't need this parameter in the first place, or something. I can
occasionally stomach this sort of lying as a last resort when there's
no realistic way of adapting the code being called, but that's surely
not the case here -- this is a newborn parameter, and it shouldn't be
a lie on day 1. Just imagine if some future developer thought that the
no_indexes parameter meant that the relation actually didn't have
indexes (the nerve of them!).

I agree that this is an issue.

The easiest solution would be to change the name of the parameter to
heap_page_prune_execute()'s from "no_indexes" to something like
"validate_unused", since it is only used in assert builds for
validation.

However, though I wish a name change was the right way to solve this
problem, my gut feeling is that it is not. It seems like we should
rely only on the WAL record itself in recovery. Right now the
parameter is used exclusively for validation, so it isn't so bad. But
what if someone uses this parameter in the future in heap_xlog_prune()
to decide how to modify the page?

It seems like the right solution would be to add a flag into the prune
record indicating what to pass to heap_page_prune_execute(). In the
future, I'd like to add flags for updating the VM to each of the prune
and vacuum records (eliminating the separate VM update record). Thus,
a new flags member of the prune record could have future use. However,
this would add a uint8 to the record. I can go and look for some
padding if you think this is the right direction?

- Melanie

#31Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#29)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Tue, Jan 9, 2024 at 10:56 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Andres had actually said that he didn't like pushing the update of
nonempty_pages into lazy_scan_[no]prune(). So, my v4 patch set
eliminates this.

Mmph - but I was so looking forward to killing hastup!

On the other hand, the comment above lazy_scan_new_or_empty() says we
can get rid of this special handling if we make relation extension
crash safe. Then it would make more sense to have a consolidated FSM
update in lazy_scan_heap(). However it does still mean that we repeat
the "UnlockReleaseBuffer()" and FSM update code in even more places.

I wouldn't hold my breath waiting for relation extension to become
crash-safe. Even if you were a whale, you'd be about four orders of
magnitude short of holding it long enough.

Ultimately I can see arguments for and against. Is it better to avoid
having the same few lines of code in two places or avoid unneeded
communication between page-level functions and lazy_scan_heap()?

To me, the conceptual complexity of an extra structure member is a
bigger cost than duplicating TWO lines of code. If we were talking
about 20 lines of code, I'd say rename it to something less dumb.

Except for this comment that I found misleading because this is not
about the fact that truncation is unsafe, it's about correctly
tracking the the last block where we have tuples to ensure a correct
truncation. Perhaps this could just reuse "Remember the location of
the last page with nonremovable tuples"? If people object to that,
feel free.

I agree the comment could be better. But, simply saying that it tracks
the last page with non-removable tuples makes it less clear how
important this is. It makes it sound like it could be simply for stats
purposes. I'll update the comment to something that includes that
sentiment but is more exact than "rel truncation is unsafe".

I agree that "rel truncation is unsafe" is less clear than desirable,
but I'm not sure that I agree that tracking the last page with
non-removable tuples makes it sound unimportant. However, the comments
also need to make everybody happy, not just me. Maybe something like
"can't truncate away this page" or similar. A long-form comment that
really spells it out is fine too, but I don't know if we really need
it.

--
Robert Haas
EDB: http://www.enterprisedb.com

#32Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#30)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Tue, Jan 9, 2024 at 11:35 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

The easiest solution would be to change the name of the parameter to
heap_page_prune_execute()'s from "no_indexes" to something like
"validate_unused", since it is only used in assert builds for
validation.

Right.

However, though I wish a name change was the right way to solve this
problem, my gut feeling is that it is not. It seems like we should
rely only on the WAL record itself in recovery. Right now the
parameter is used exclusively for validation, so it isn't so bad. But
what if someone uses this parameter in the future in heap_xlog_prune()
to decide how to modify the page?

Exactly.

It seems like the right solution would be to add a flag into the prune
record indicating what to pass to heap_page_prune_execute(). In the
future, I'd like to add flags for updating the VM to each of the prune
and vacuum records (eliminating the separate VM update record). Thus,
a new flags member of the prune record could have future use. However,
this would add a uint8 to the record. I can go and look for some
padding if you think this is the right direction?

I thought about this approach and it might be OK but I don't love it,
because it means making the WAL record bigger on production systems
for the sake of assertion that only fires for developers. Sure, it's
possible that there might be another use in the future, but there
might also not be another use in the future.

How about changing if (no_indexes) to if (ndead == 0) and adding a
comment like this: /* If there are any tuples being marked LP_DEAD,
then the relation must have indexes, so every item being marked unused
must be a heap-only tuple. But if there are no tuples being marked
LP_DEAD, then it's possible that the relation has no indexes, in which
case all we know is that the line pointer shouldn't already be
LP_UNUSED. */

BTW:

+                        * LP_REDIRECT, or LP_DEAD items to LP_UNUSED
during pruning. We
+                        * can't check much here except that, if the
item is LP_NORMAL, it
+                        * should have storage before it is set LP_UNUSED.

Is it really helpful to check this here, or just confusing/grasping at
straws? I mean, the requirement that LP_NORMAL items have storage is a
general one, IIUC, not something that's specific to this situation. It
feels like the equivalent of checking that your children don't set
fire to the couch on Tuesdays.

--
Robert Haas
EDB: http://www.enterprisedb.com

#33Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#31)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Tue, Jan 9, 2024 at 1:31 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 9, 2024 at 10:56 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Andres had actually said that he didn't like pushing the update of
nonempty_pages into lazy_scan_[no]prune(). So, my v4 patch set
eliminates this.

Mmph - but I was so looking forward to killing hastup!

Ultimately I can see arguments for and against. Is it better to avoid
having the same few lines of code in two places or avoid unneeded
communication between page-level functions and lazy_scan_heap()?

To me, the conceptual complexity of an extra structure member is a
bigger cost than duplicating TWO lines of code. If we were talking
about 20 lines of code, I'd say rename it to something less dumb.

Yes, I agree. I thought about it more, and I prefer updating the FSM
and setting nonempty_pages into lazy_scan_[no]prune(). Originally, I
had ordered the patch set with that first (before the patch to do
immediate reaping), but there is no reason for it to be so. Using
hastup can be done in a subsequent commit on top of the immediate
reaping patch. I will post a new version of the immediate reaping
patch which addresses your feedback. Then, separately, I will post a
revised version of the lazy_scan_heap() refactoring patches.

- Melanie

#34Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#33)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Tue, Jan 9, 2024 at 2:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Yes, I agree. I thought about it more, and I prefer updating the FSM
and setting nonempty_pages into lazy_scan_[no]prune(). Originally, I
had ordered the patch set with that first (before the patch to do
immediate reaping), but there is no reason for it to be so. Using
hastup can be done in a subsequent commit on top of the immediate
reaping patch. I will post a new version of the immediate reaping
patch which addresses your feedback. Then, separately, I will post a
revised version of the lazy_scan_heap() refactoring patches.

I kind of liked it first, because I thought we could just do it and
get it out of the way, but if Andres doesn't agree with the idea, it
probably does make sense to push it later, as you say here.

--
Robert Haas
EDB: http://www.enterprisedb.com

#35Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#34)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Hi,

On January 9, 2024 11:33:29 AM PST, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 9, 2024 at 2:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Yes, I agree. I thought about it more, and I prefer updating the FSM
and setting nonempty_pages into lazy_scan_[no]prune(). Originally, I
had ordered the patch set with that first (before the patch to do
immediate reaping), but there is no reason for it to be so. Using
hastup can be done in a subsequent commit on top of the immediate
reaping patch. I will post a new version of the immediate reaping
patch which addresses your feedback. Then, separately, I will post a
revised version of the lazy_scan_heap() refactoring patches.

I kind of liked it first, because I thought we could just do it and
get it out of the way, but if Andres doesn't agree with the idea, it
probably does make sense to push it later, as you say here.

I don't have that strong feelings about it. If both of you think it looks good, go ahead...

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#36Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#32)
1 attachment(s)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

I had already written the patch for immediate reaping addressing the
below feedback before I saw the emails that said everyone is happy
with using hastup in lazy_scan_[no]prune() in a preliminary patch. Let
me know if you have a strong preference for reordering. Otherwise, I
will write the three subsequent patches on top of this one.

On Tue, Jan 9, 2024 at 2:00 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 9, 2024 at 11:35 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

The easiest solution would be to change the name of the parameter to
heap_page_prune_execute()'s from "no_indexes" to something like
"validate_unused", since it is only used in assert builds for
validation.

Right.

However, though I wish a name change was the right way to solve this
problem, my gut feeling is that it is not. It seems like we should
rely only on the WAL record itself in recovery. Right now the
parameter is used exclusively for validation, so it isn't so bad. But
what if someone uses this parameter in the future in heap_xlog_prune()
to decide how to modify the page?

Exactly.

It seems like the right solution would be to add a flag into the prune
record indicating what to pass to heap_page_prune_execute(). In the
future, I'd like to add flags for updating the VM to each of the prune
and vacuum records (eliminating the separate VM update record). Thus,
a new flags member of the prune record could have future use. However,
this would add a uint8 to the record. I can go and look for some
padding if you think this is the right direction?

I thought about this approach and it might be OK but I don't love it,
because it means making the WAL record bigger on production systems
for the sake of assertion that only fires for developers. Sure, it's
possible that there might be another use in the future, but there
might also not be another use in the future.

How about changing if (no_indexes) to if (ndead == 0) and adding a
comment like this: /* If there are any tuples being marked LP_DEAD,
then the relation must have indexes, so every item being marked unused
must be a heap-only tuple. But if there are no tuples being marked
LP_DEAD, then it's possible that the relation has no indexes, in which
case all we know is that the line pointer shouldn't already be
LP_UNUSED. */

Ah, I like this a lot. Attached patch does this. I've added a modified
version of the comment you suggested. My only question is if we are
losing something without this sentence (from the old comment):

- * ... They don't need to be left in place as LP_DEAD items
until VACUUM gets
- * around to doing index vacuuming.

I don't feel like it adds a lot, but it is absent from the new
comment, so thought I would check.

BTW:

+                        * LP_REDIRECT, or LP_DEAD items to LP_UNUSED
during pruning. We
+                        * can't check much here except that, if the
item is LP_NORMAL, it
+                        * should have storage before it is set LP_UNUSED.

Is it really helpful to check this here, or just confusing/grasping at
straws? I mean, the requirement that LP_NORMAL items have storage is a
general one, IIUC, not something that's specific to this situation. It
feels like the equivalent of checking that your children don't set
fire to the couch on Tuesdays.

Hmm. Yes. I suppose I was trying to find something to validate. Is it
worth checking that the line pointer is not already LP_UNUSED? Or is
that a bit ridiculous?

- Melanie

Attachments:

v5-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchDownload
From 2d018672abe9107196a4ff30254bd63a71e5ddb5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 9 Jan 2024 15:00:24 -0500
Subject: [PATCH v5 1/4] Set would-be dead items LP_UNUSED while pruning

If there are no indexes on a relation, items can be marked LP_UNUSED
instead of LP_DEAD during while pruning. This avoids a separate
invocation of lazy_vacuum_heap_page() and saves a vacuum WAL record.

To accomplish this, pass heap_page_prune() a new parameter, no_indexes,
which indicates that dead line pointers should be set to LP_UNUSED
during pruning, allowing earlier reaping of tuples.

Because we don't update the freespace map until after dropping the lock
on the buffer and we need the lock while we update the visibility map,
save our intent to update the freespace map in output parameter
recordfreespace. This is not added to the LVPagePruneState because
future commits will eliminate the LVPagePruneState.

Discussion: https://postgr.es/m/CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  |  61 +++++++++--
 src/backend/access/heap/vacuumlazy.c | 150 ++++++++++-----------------
 src/include/access/heapam.h          |   1 +
 3 files changed, 107 insertions(+), 105 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3e0a1a260e6..9e5ddc80f4a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -35,6 +35,8 @@ typedef struct
 
 	/* tuple visibility test, initialized for the relation */
 	GlobalVisState *vistest;
+	/* whether or not dead items can be set LP_UNUSED during pruning */
+	bool		no_indexes;
 
 	TransactionId new_prune_xid;	/* new prune hint value for page */
 	TransactionId snapshotConflictHorizon;	/* latest xid removed */
@@ -148,7 +150,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			PruneResult presult;
 
-			heap_page_prune(relation, buffer, vistest, &presult, NULL);
+			/*
+			 * For now, pass no_indexes == false regardless of whether or not
+			 * the relation has indexes. In the future we may enable immediate
+			 * reaping for on access pruning.
+			 */
+			heap_page_prune(relation, buffer, vistest, false,
+							&presult, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -193,6 +201,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * (see heap_prune_satisfies_vacuum and
  * HeapTupleSatisfiesVacuum).
  *
+ * no_indexes indicates whether or not dead items can be set LP_UNUSED during
+ * pruning.
+ *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -203,6 +214,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 void
 heap_page_prune(Relation relation, Buffer buffer,
 				GlobalVisState *vistest,
+				bool no_indexes,
 				PruneResult *presult,
 				OffsetNumber *off_loc)
 {
@@ -227,6 +239,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 	prstate.new_prune_xid = InvalidTransactionId;
 	prstate.rel = relation;
 	prstate.vistest = vistest;
+	prstate.no_indexes = no_indexes;
 	prstate.snapshotConflictHorizon = InvalidTransactionId;
 	prstate.nredirected = prstate.ndead = prstate.nunused = 0;
 	memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -306,9 +319,9 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (off_loc)
 			*off_loc = offnum;
 
-		/* Nothing to do if slot is empty or already dead */
+		/* Nothing to do if slot is empty */
 		itemid = PageGetItemId(page, offnum);
-		if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+		if (!ItemIdIsUsed(itemid))
 			continue;
 
 		/* Process this item or chain of items */
@@ -581,7 +594,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * function.)
 		 */
 		if (ItemIdIsDead(lp))
+		{
+			/*
+			 * If the relation has no indexes, we can set dead line pointers
+			 * LP_UNUSED now. We don't increment ndeleted here since the LP
+			 * was already marked dead.
+			 */
+			if (unlikely(prstate->no_indexes))
+				heap_prune_record_unused(prstate, offnum);
+
 			break;
+		}
 
 		Assert(ItemIdIsNormal(lp));
 		htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -726,7 +749,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * item.  This can happen if the loop in heap_page_prune caused us to
 		 * visit the dead successor of a redirect item before visiting the
 		 * redirect item.  We can clean up by setting the redirect item to
-		 * DEAD state.
+		 * DEAD state or LP_UNUSED if the table has no indexes.
 		 */
 		heap_prune_record_dead(prstate, rootoffnum);
 	}
@@ -767,6 +790,17 @@ heap_prune_record_redirect(PruneState *prstate,
 static void
 heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
 {
+	/*
+	 * If the relation has no indexes, we can remove dead tuples during
+	 * pruning instead of marking their line pointers dead. Set this tuple's
+	 * line pointer LP_UNUSED. We hint that tables with indexes are more
+	 * likely.
+	 */
+	if (unlikely(prstate->no_indexes))
+	{
+		heap_prune_record_unused(prstate, offnum);
+		return;
+	}
 	Assert(prstate->ndead < MaxHeapTuplesPerPage);
 	prstate->nowdead[prstate->ndead] = offnum;
 	prstate->ndead++;
@@ -903,13 +937,20 @@ heap_page_prune_execute(Buffer buffer,
 #ifdef USE_ASSERT_CHECKING
 
 		/*
-		 * Only heap-only tuples can become LP_UNUSED during pruning.  They
-		 * don't need to be left in place as LP_DEAD items until VACUUM gets
-		 * around to doing index vacuuming.
+		 * If there are any items being marked LP_DEAD, then the relation must
+		 * have indexes, so every item being marked LP_UNUSED must refer to a
+		 * heap-only tuple. But if there are no items being marked LP_DEAD,
+		 * then it's possible that the relation has no indexes, in which case
+		 * all we know is that the line pointer shouldn't already be
+		 * LP_UNUSED.
 		 */
-		Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
-		htup = (HeapTupleHeader) PageGetItem(page, lp);
-		Assert(HeapTupleHeaderIsHeapOnly(htup));
+		if (ndead > 0)
+		{
+			Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
+			htup = (HeapTupleHeader) PageGetItem(page, lp);
+			Assert(HeapTupleHeaderIsHeapOnly(htup));
+		}
+
 #endif
 
 		ItemIdSetUnused(lp);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index abbba8947fa..e9636b2c47e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -250,7 +250,8 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							LVPagePruneState *prunestate);
+							LVPagePruneState *prunestate,
+							bool *recordfreespace);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *hastup, bool *recordfreespace);
@@ -830,8 +831,10 @@ lazy_scan_heap(LVRelState *vacrel)
 				next_fsm_block_to_vacuum = 0;
 	VacDeadItems *dead_items = vacrel->dead_items;
 	Buffer		vmbuffer = InvalidBuffer;
+	int			tuples_already_deleted;
 	bool		next_unskippable_allvis,
 				skipping_current_range;
+	bool		recordfreespace;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -959,8 +962,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		hastup,
-						recordfreespace;
+			bool		hastup;
 
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
@@ -1010,6 +1012,8 @@ lazy_scan_heap(LVRelState *vacrel)
 			continue;
 		}
 
+		tuples_already_deleted = vacrel->tuples_deleted;
+
 		/*
 		 * Prune, freeze, and count tuples.
 		 *
@@ -1019,7 +1023,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * were pruned some time earlier.  Also considers freezing XIDs in the
 		 * tuple headers of remaining items with storage.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate);
+		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate, &recordfreespace);
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
@@ -1027,69 +1031,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (prunestate.hastup)
 			vacrel->nonempty_pages = blkno + 1;
 
-		if (vacrel->nindexes == 0)
-		{
-			/*
-			 * Consider the need to do page-at-a-time heap vacuuming when
-			 * using the one-pass strategy now.
-			 *
-			 * The one-pass strategy will never call lazy_vacuum().  The steps
-			 * performed here can be thought of as the one-pass equivalent of
-			 * a call to lazy_vacuum().
-			 */
-			if (prunestate.has_lpdead_items)
-			{
-				Size		freespace;
-
-				lazy_vacuum_heap_page(vacrel, blkno, buf, 0, vmbuffer);
-
-				/* Forget the LP_DEAD items that we just vacuumed */
-				dead_items->num_items = 0;
-
-				/*
-				 * Now perform FSM processing for blkno, and move on to next
-				 * page.
-				 *
-				 * Our call to lazy_vacuum_heap_page() will have considered if
-				 * it's possible to set all_visible/all_frozen independently
-				 * of lazy_scan_prune().  Note that prunestate was invalidated
-				 * by lazy_vacuum_heap_page() call.
-				 */
-				freespace = PageGetHeapFreeSpace(page);
-
-				UnlockReleaseBuffer(buf);
-				RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-
-				/*
-				 * Periodically perform FSM vacuuming to make newly-freed
-				 * space visible on upper FSM pages. FreeSpaceMapVacuumRange()
-				 * vacuums the portion of the freespace map covering heap
-				 * pages from start to end - 1. Include the block we just
-				 * vacuumed by passing it blkno + 1. Overflow isn't an issue
-				 * because MaxBlockNumber + 1 is InvalidBlockNumber which
-				 * causes FreeSpaceMapVacuumRange() to vacuum freespace map
-				 * pages covering the remainder of the relation.
-				 */
-				if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
-				{
-					FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-											blkno + 1);
-					next_fsm_block_to_vacuum = blkno + 1;
-				}
-
-				continue;
-			}
-
-			/*
-			 * There was no call to lazy_vacuum_heap_page() because pruning
-			 * didn't encounter/create any LP_DEAD items that needed to be
-			 * vacuumed.  Prune state has not been invalidated, so proceed
-			 * with prunestate-driven visibility map and FSM steps (just like
-			 * the two-pass strategy).
-			 */
-			Assert(dead_items->num_items == 0);
-		}
-
 		/*
 		 * Handle setting visibility map bit based on information from the VM
 		 * (as of last lazy_scan_skip() call), and from prunestate
@@ -1200,38 +1141,45 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
-		 * FSM
+		 * FSM.
+		 *
+		 * If we will likely do index vacuuming, wait until
+		 * lazy_vacuum_heap_rel() to save free space. This doesn't just save
+		 * us some cycles; it also allows us to record any additional free
+		 * space that lazy_vacuum_heap_page() will make available in cases
+		 * where it's possible to truncate the page's line pointer array.
+		 *
+		 * Note: It's not in fact 100% certain that we really will call
+		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
+		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
+		 * because it only happens in emergencies, or when there is very
+		 * little free space anyway. (Besides, we start recording free space
+		 * in the FSM once index vacuuming has been abandoned.)
 		 */
-		if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
-		{
-			/*
-			 * Wait until lazy_vacuum_heap_rel() to save free space.  This
-			 * doesn't just save us some cycles; it also allows us to record
-			 * any additional free space that lazy_vacuum_heap_page() will
-			 * make available in cases where it's possible to truncate the
-			 * page's line pointer array.
-			 *
-			 * Note: It's not in fact 100% certain that we really will call
-			 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip
-			 * index vacuuming (and so must skip heap vacuuming).  This is
-			 * deemed okay because it only happens in emergencies, or when
-			 * there is very little free space anyway. (Besides, we start
-			 * recording free space in the FSM once index vacuuming has been
-			 * abandoned.)
-			 *
-			 * Note: The one-pass (no indexes) case is only supposed to make
-			 * it this far when there were no LP_DEAD items during pruning.
-			 */
-			Assert(vacrel->nindexes > 0);
-			UnlockReleaseBuffer(buf);
-		}
-		else
+		if (recordfreespace)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
 			UnlockReleaseBuffer(buf);
 			RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
 		}
+		else
+			UnlockReleaseBuffer(buf);
+
+		/*
+		 * Periodically perform FSM vacuuming to make newly-freed space
+		 * visible on upper FSM pages. This is done after vacuuming if the
+		 * table has indexes.
+		 */
+		if (vacrel->nindexes == 0 &&
+			vacrel->tuples_deleted > tuples_already_deleted &&
+			(blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES))
+		{
+			FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
+									blkno);
+			next_fsm_block_to_vacuum = blkno;
+		}
+
 	}
 
 	vacrel->blkno = InvalidBlockNumber;
@@ -1543,7 +1491,8 @@ lazy_scan_prune(LVRelState *vacrel,
 				Buffer buf,
 				BlockNumber blkno,
 				Page page,
-				LVPagePruneState *prunestate)
+				LVPagePruneState *prunestate,
+				bool *recordfreespace)
 {
 	Relation	rel = vacrel->rel;
 	OffsetNumber offnum,
@@ -1587,7 +1536,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * lpdead_items's final value can be thought of as the number of tuples
 	 * that were deleted from indexes.
 	 */
-	heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum);
+	heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+					&presult, &vacrel->offnum);
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
@@ -1598,6 +1548,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	prunestate->all_visible = true;
 	prunestate->all_frozen = true;
 	prunestate->visibility_cutoff_xid = InvalidTransactionId;
+	*recordfreespace = false;
 
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
@@ -1918,6 +1869,15 @@ lazy_scan_prune(LVRelState *vacrel,
 	vacrel->lpdead_items += lpdead_items;
 	vacrel->live_tuples += live_tuples;
 	vacrel->recently_dead_tuples += recently_dead_tuples;
+
+	/*
+	 * If we will not do index vacuuming, either because we have no indexes,
+	 * because there is nothing to vacuum, or because do_index_vacuuming is
+	 * false, make sure we update the freespace map.
+	 */
+	if (vacrel->nindexes == 0 ||
+		!vacrel->do_index_vacuuming || lpdead_items == 0)
+		*recordfreespace = true;
 }
 
 /*
@@ -2516,7 +2476,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
 
-	Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
+	Assert(vacrel->do_index_vacuuming);
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 932ec0d6f2b..4a2aebedb57 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -320,6 +320,7 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
+							bool no_indexes,
 							PruneResult *presult,
 							OffsetNumber *off_loc);
 extern void heap_page_prune_execute(Buffer buffer,
-- 
2.37.2

#37Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#36)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Tue, Jan 9, 2024 at 3:13 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I had already written the patch for immediate reaping addressing the
below feedback before I saw the emails that said everyone is happy
with using hastup in lazy_scan_[no]prune() in a preliminary patch. Let
me know if you have a strong preference for reordering. Otherwise, I
will write the three subsequent patches on top of this one.

I don't know if it rises to the level of a strong preference. It's
just a preference.

Ah, I like this a lot. Attached patch does this. I've added a modified
version of the comment you suggested. My only question is if we are
losing something without this sentence (from the old comment):

- * ... They don't need to be left in place as LP_DEAD items
until VACUUM gets
- * around to doing index vacuuming.

I don't feel like it adds a lot, but it is absent from the new
comment, so thought I would check.

I agree that we can leave that out. It wouldn't be bad to include it
if someone had a nice way of doing that, but it doesn't seem critical,
and if forcing it in there makes the comment less clear overall, it's
a net loss IMHO.

Hmm. Yes. I suppose I was trying to find something to validate. Is it
worth checking that the line pointer is not already LP_UNUSED? Or is
that a bit ridiculous?

I think that's worthwhile (hence my proposed wording).

--
Robert Haas
EDB: http://www.enterprisedb.com

#38Jim Nasby
jim.nasby@gmail.com
In reply to: Robert Haas (#26)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On 1/8/24 2:10 PM, Robert Haas wrote:

On Fri, Jan 5, 2024 at 3:57 PM Andres Freund <andres@anarazel.de> wrote:

I will be astonished if you can make this work well enough to avoid
huge regressions in plausible cases. There are plenty of cases where
we do a very thorough job opportunistically removing index tuples.

These days the AM is often involved with that, via
table_index_delete_tuples()/heap_index_delete_tuples(). That IIRC has to
happen before physically removing the already-marked-killed index entries. We
can't rely on being able to actually prune the heap page at that point, there
might be other backends pinning it, but often we will be able to. If we were
to prune below heap_index_delete_tuples(), we wouldn't need to recheck that
index again during "individual tuple pruning", if the to-be-marked-unused heap
tuple is one of the tuples passed to heap_index_delete_tuples(). Which
presumably will be very commonly the case.

At least for nbtree, we are much more aggressive about marking index entries
as killed, than about actually removing the index entries. "individual tuple
pruning" would have to look for killed-but-still-present index entries, not
just for "live" entries.

I don't want to derail this thread, but I don't really see what you
have in mind here. The first paragraph sounds like you're imagining
that while pruning the index entries we might jump over to the heap
and clean things up there, too, but that seems like it wouldn't work
if the table has more than one index. I thought you were talking about
starting with a heap tuple and bouncing around to every index to see
if we can find index pointers to kill in every one of them. That
*could* work out, but you only need one index to have been
opportunistically cleaned up in order for it to fail to work out.
There might well be some workloads where that's often the case, but
the regressions in the workloads where it isn't the case seem like
they would be rather substantial, because doing an extra lookup in
every index for each heap tuple visited sounds pricey.

The idea of probing indexes for tuples that are now dead has come up in
the past, and the concern has always been whether it's actually safe to
do so. An obvious example is an index on a function and now the function
has changed so you can't reliably determine if a particular tuple is
present in the index. That's bad enough during an index scan, but
potentially worse while doing heeap cleanup. Given that operators are
functions, this risk exists to some degree in even simple indexes.

Depending on the gains this might still be worth doing, at least for
some cases. It's hard to conceive of this breaking for indexes on
integers, for example. But we'd still need to be cautious.
--
Jim Nasby, Data Architect, Austin TX

#39Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#37)
5 attachment(s)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Tue, Jan 9, 2024 at 3:40 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 9, 2024 at 3:13 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I had already written the patch for immediate reaping addressing the
below feedback before I saw the emails that said everyone is happy
with using hastup in lazy_scan_[no]prune() in a preliminary patch. Let
me know if you have a strong preference for reordering. Otherwise, I
will write the three subsequent patches on top of this one.

I don't know if it rises to the level of a strong preference. It's
just a preference.

Attached v6 has the immediate reaping patch first followed by the code
to use hastup in lazy_scan_[no]prune(). 0003 and 0004 move the VM
update code into lazy_scan_prune() and eliminate LVPagePruneState
entirely. 0005 moves the FSM update into lazy_scan_[no]prune(),
substantially simplifying lazy_scan_heap().

I agree that we can leave that out. It wouldn't be bad to include it
if someone had a nice way of doing that, but it doesn't seem critical,
and if forcing it in there makes the comment less clear overall, it's
a net loss IMHO.

Hmm. Yes. I suppose I was trying to find something to validate. Is it
worth checking that the line pointer is not already LP_UNUSED? Or is
that a bit ridiculous?

I think that's worthwhile (hence my proposed wording).

Done in attached v6.

- Melanie

Attachments:

v6-0003-Move-VM-update-code-to-lazy_scan_prune.patchtext/x-patch; charset=US-ASCII; name=v6-0003-Move-VM-update-code-to-lazy_scan_prune.patchDownload
From 7c499e72cccd3ce9bf45d3824bc13894ea8d4d3a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 5 Jan 2024 11:12:58 -0500
Subject: [PATCH v6 3/5] Move VM update code to lazy_scan_prune()

The visibility map is updated after lazy_scan_prune() returns in
lazy_scan_heap(). Most of the output parameters of lazy_scan_prune() are
used to update the VM in lazy_scan_heap(). Moving that code into
lazy_scan_prune() simplifies lazy_scan_heap() and requires less
communication between the two. This is a pure lift and shift. No VM
update logic is changed. All of the members of the prunestate are now
obsolete. It will be removed in a future commit.
---
 src/backend/access/heap/vacuumlazy.c | 229 ++++++++++++++-------------
 1 file changed, 116 insertions(+), 113 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a5b9319fbcc..bbfd3437fc7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -249,8 +249,8 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							LVPagePruneState *prunestate,
-							bool *recordfreespace);
+							Buffer vmbuffer, bool all_visible_according_to_vm,
+							LVPagePruneState *prunestate, bool *recordfreespace);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *recordfreespace);
@@ -1022,117 +1022,9 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * tuple headers of remaining items with storage. It also determines
 		 * if truncating this block is safe.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate, &recordfreespace);
-
-		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
-
-		/*
-		 * Handle setting visibility map bit based on information from the VM
-		 * (as of last lazy_scan_skip() call), and from prunestate
-		 */
-		if (!all_visible_according_to_vm && prunestate.all_visible)
-		{
-			uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-			if (prunestate.all_frozen)
-			{
-				Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-				flags |= VISIBILITYMAP_ALL_FROZEN;
-			}
-
-			/*
-			 * It should never be the case that the visibility map page is set
-			 * while the page-level bit is clear, but the reverse is allowed
-			 * (if checksums are not enabled).  Regardless, set both bits so
-			 * that we get back in sync.
-			 *
-			 * NB: If the heap page is all-visible but the VM bit is not set,
-			 * we don't need to dirty the heap page.  However, if checksums
-			 * are enabled, we do need to make sure that the heap page is
-			 * dirtied before passing it to visibilitymap_set(), because it
-			 * may be logged.  Given that this situation should only happen in
-			 * rare cases after a crash, it is not worth optimizing.
-			 */
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, prunestate.visibility_cutoff_xid,
-							  flags);
-		}
-
-		/*
-		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
-		 * the page-level bit is clear.  However, it's possible that the bit
-		 * got cleared after lazy_scan_skip() was called, so we must recheck
-		 * with buffer lock before concluding that the VM is corrupt.
-		 */
-		else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-				 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-		{
-			elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-				 vacrel->relname, blkno);
-			visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-								VISIBILITYMAP_VALID_BITS);
-		}
-
-		/*
-		 * It's possible for the value returned by
-		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-		 * wrong for us to see tuples that appear to not be visible to
-		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
-		 * xmin value never moves backwards, but
-		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
-		 * returns a value that's unnecessarily small, so if we see that
-		 * contradiction it just means that the tuples that we think are not
-		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
-		 * is correct.
-		 *
-		 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE
-		 * set, however.
-		 */
-		else if (prunestate.has_lpdead_items && PageIsAllVisible(page))
-		{
-			elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-				 vacrel->relname, blkno);
-			PageClearAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-								VISIBILITYMAP_VALID_BITS);
-		}
-
-		/*
-		 * If the all-visible page is all-frozen but not marked as such yet,
-		 * mark it as all-frozen.  Note that all_frozen is only valid if
-		 * all_visible is true, so we must check both prunestate fields.
-		 */
-		else if (all_visible_according_to_vm && prunestate.all_visible &&
-				 prunestate.all_frozen &&
-				 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-		{
-			/*
-			 * Avoid relying on all_visible_according_to_vm as a proxy for the
-			 * page-level PD_ALL_VISIBLE bit being set, since it might have
-			 * become stale -- even when all_visible is set in prunestate
-			 */
-			if (!PageIsAllVisible(page))
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buf);
-			}
-
-			/*
-			 * Set the page all-frozen (and all-visible) in the VM.
-			 *
-			 * We can pass InvalidTransactionId as our visibility_cutoff_xid,
-			 * since a snapshotConflictHorizon sufficient to make everything
-			 * safe for REDO was logged when the page's tuples were frozen.
-			 */
-			Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
-		}
+		lazy_scan_prune(vacrel, buf, blkno, page,
+						vmbuffer, all_visible_according_to_vm,
+						&prunestate, &recordfreespace);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1486,6 +1378,8 @@ lazy_scan_prune(LVRelState *vacrel,
 				Buffer buf,
 				BlockNumber blkno,
 				Page page,
+				Buffer vmbuffer,
+				bool all_visible_according_to_vm,
 				LVPagePruneState *prunestate,
 				bool *recordfreespace)
 {
@@ -1877,6 +1771,115 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Can't truncate this page */
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
+
+	Assert(!prunestate->all_visible || !prunestate->has_lpdead_items);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last lazy_scan_skip() call), and from prunestate
+	 */
+	if (!all_visible_according_to_vm && prunestate->all_visible)
+	{
+		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
+
+		if (prunestate->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+			flags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled).  Regardless, set both bits so that we
+		 * get back in sync.
+		 *
+		 * NB: If the heap page is all-visible but the VM bit is not set, we
+		 * don't need to dirty the heap page.  However, if checksums are
+		 * enabled, we do need to make sure that the heap page is dirtied
+		 * before passing it to visibilitymap_set(), because it may be logged.
+		 * Given that this situation should only happen in rare cases after a
+		 * crash, it is not worth optimizing.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, prunestate->visibility_cutoff_xid,
+						  flags);
+	}
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after lazy_scan_skip() was called, so we must recheck with
+	 * buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
+			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 vacrel->relname, blkno);
+		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prunestate->has_lpdead_items && PageIsAllVisible(page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 vacrel->relname, blkno);
+		PageClearAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * If the all-visible page is all-frozen but not marked as such yet, mark
+	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
+	 * true, so we must check both prunestate fields.
+	 */
+	else if (all_visible_according_to_vm && prunestate->all_visible &&
+			 prunestate->all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	{
+		/*
+		 * Avoid relying on all_visible_according_to_vm as a proxy for the
+		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
+		 * stale -- even when all_visible is set in prunestate
+		 */
+		if (!PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
+		/*
+		 * Set the page all-frozen (and all-visible) in the VM.
+		 *
+		 * We can pass InvalidTransactionId as our visibility_cutoff_xid,
+		 * since a snapshotConflictHorizon sufficient to make everything safe
+		 * for REDO was logged when the page's tuples were frozen.
+		 */
+		Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, InvalidTransactionId,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
+	}
 }
 
 /*
-- 
2.37.2

v6-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchtext/x-patch; charset=US-ASCII; name=v6-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchDownload
From b6d9ce973bf60733381260ffaa5d368792cbb419 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 9 Jan 2024 15:00:24 -0500
Subject: [PATCH v6 1/5] Set would-be dead items LP_UNUSED while pruning

If there are no indexes on a relation, items can be marked LP_UNUSED
instead of LP_DEAD during while pruning. This avoids a separate
invocation of lazy_vacuum_heap_page() and saves a vacuum WAL record.

To accomplish this, pass heap_page_prune() a new parameter, no_indexes,
which indicates that dead line pointers should be set to LP_UNUSED
during pruning, allowing earlier reaping of tuples.

Because we don't update the freespace map until after dropping the lock
on the buffer and we need the lock while we update the visibility map,
save our intent to update the freespace map in output parameter
recordfreespace. This parameter will likely be removed, along with the
LVPagePruneState, by future commits pushing per-page processing from
lazy_scan_heap() into lazy_scan_[no]prune().

Discussion: https://postgr.es/m/CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  |  65 ++++++++++--
 src/backend/access/heap/vacuumlazy.c | 150 ++++++++++-----------------
 src/include/access/heapam.h          |   1 +
 3 files changed, 111 insertions(+), 105 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3e0a1a260e6..4ca9d54fe4a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -35,6 +35,8 @@ typedef struct
 
 	/* tuple visibility test, initialized for the relation */
 	GlobalVisState *vistest;
+	/* whether or not dead items can be set LP_UNUSED during pruning */
+	bool		no_indexes;
 
 	TransactionId new_prune_xid;	/* new prune hint value for page */
 	TransactionId snapshotConflictHorizon;	/* latest xid removed */
@@ -148,7 +150,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			PruneResult presult;
 
-			heap_page_prune(relation, buffer, vistest, &presult, NULL);
+			/*
+			 * For now, pass no_indexes == false regardless of whether or not
+			 * the relation has indexes. In the future we may enable immediate
+			 * reaping for on access pruning.
+			 */
+			heap_page_prune(relation, buffer, vistest, false,
+							&presult, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -193,6 +201,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * (see heap_prune_satisfies_vacuum and
  * HeapTupleSatisfiesVacuum).
  *
+ * no_indexes indicates whether or not dead items can be set LP_UNUSED during
+ * pruning.
+ *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -203,6 +214,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 void
 heap_page_prune(Relation relation, Buffer buffer,
 				GlobalVisState *vistest,
+				bool no_indexes,
 				PruneResult *presult,
 				OffsetNumber *off_loc)
 {
@@ -227,6 +239,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 	prstate.new_prune_xid = InvalidTransactionId;
 	prstate.rel = relation;
 	prstate.vistest = vistest;
+	prstate.no_indexes = no_indexes;
 	prstate.snapshotConflictHorizon = InvalidTransactionId;
 	prstate.nredirected = prstate.ndead = prstate.nunused = 0;
 	memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -306,9 +319,9 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (off_loc)
 			*off_loc = offnum;
 
-		/* Nothing to do if slot is empty or already dead */
+		/* Nothing to do if slot is empty */
 		itemid = PageGetItemId(page, offnum);
-		if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+		if (!ItemIdIsUsed(itemid))
 			continue;
 
 		/* Process this item or chain of items */
@@ -581,7 +594,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * function.)
 		 */
 		if (ItemIdIsDead(lp))
+		{
+			/*
+			 * If the relation has no indexes, we can set dead line pointers
+			 * LP_UNUSED now. We don't increment ndeleted here since the LP
+			 * was already marked dead.
+			 */
+			if (unlikely(prstate->no_indexes))
+				heap_prune_record_unused(prstate, offnum);
+
 			break;
+		}
 
 		Assert(ItemIdIsNormal(lp));
 		htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -726,7 +749,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * item.  This can happen if the loop in heap_page_prune caused us to
 		 * visit the dead successor of a redirect item before visiting the
 		 * redirect item.  We can clean up by setting the redirect item to
-		 * DEAD state.
+		 * DEAD state or LP_UNUSED if the table has no indexes.
 		 */
 		heap_prune_record_dead(prstate, rootoffnum);
 	}
@@ -767,6 +790,17 @@ heap_prune_record_redirect(PruneState *prstate,
 static void
 heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
 {
+	/*
+	 * If the relation has no indexes, we can remove dead tuples during
+	 * pruning instead of marking their line pointers dead. Set this tuple's
+	 * line pointer LP_UNUSED. We hint that tables with indexes are more
+	 * likely.
+	 */
+	if (unlikely(prstate->no_indexes))
+	{
+		heap_prune_record_unused(prstate, offnum);
+		return;
+	}
 	Assert(prstate->ndead < MaxHeapTuplesPerPage);
 	prstate->nowdead[prstate->ndead] = offnum;
 	prstate->ndead++;
@@ -903,13 +937,24 @@ heap_page_prune_execute(Buffer buffer,
 #ifdef USE_ASSERT_CHECKING
 
 		/*
-		 * Only heap-only tuples can become LP_UNUSED during pruning.  They
-		 * don't need to be left in place as LP_DEAD items until VACUUM gets
-		 * around to doing index vacuuming.
+		 * If there are any items being marked LP_DEAD, then the relation must
+		 * have indexes, so every item being marked LP_UNUSED must refer to a
+		 * heap-only tuple. But if there are no items being marked LP_DEAD,
+		 * then it's possible that the relation has no indexes, in which case
+		 * all we know is that the line pointer shouldn't already be
+		 * LP_UNUSED.
 		 */
-		Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
-		htup = (HeapTupleHeader) PageGetItem(page, lp);
-		Assert(HeapTupleHeaderIsHeapOnly(htup));
+		if (ndead > 0)
+		{
+			Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
+			htup = (HeapTupleHeader) PageGetItem(page, lp);
+			Assert(HeapTupleHeaderIsHeapOnly(htup));
+		}
+		else
+		{
+			Assert(ItemIdIsUsed(lp));
+		}
+
 #endif
 
 		ItemIdSetUnused(lp);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index abbba8947fa..e9636b2c47e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -250,7 +250,8 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							LVPagePruneState *prunestate);
+							LVPagePruneState *prunestate,
+							bool *recordfreespace);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *hastup, bool *recordfreespace);
@@ -830,8 +831,10 @@ lazy_scan_heap(LVRelState *vacrel)
 				next_fsm_block_to_vacuum = 0;
 	VacDeadItems *dead_items = vacrel->dead_items;
 	Buffer		vmbuffer = InvalidBuffer;
+	int			tuples_already_deleted;
 	bool		next_unskippable_allvis,
 				skipping_current_range;
+	bool		recordfreespace;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -959,8 +962,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		hastup,
-						recordfreespace;
+			bool		hastup;
 
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
@@ -1010,6 +1012,8 @@ lazy_scan_heap(LVRelState *vacrel)
 			continue;
 		}
 
+		tuples_already_deleted = vacrel->tuples_deleted;
+
 		/*
 		 * Prune, freeze, and count tuples.
 		 *
@@ -1019,7 +1023,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * were pruned some time earlier.  Also considers freezing XIDs in the
 		 * tuple headers of remaining items with storage.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate);
+		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate, &recordfreespace);
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
@@ -1027,69 +1031,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		if (prunestate.hastup)
 			vacrel->nonempty_pages = blkno + 1;
 
-		if (vacrel->nindexes == 0)
-		{
-			/*
-			 * Consider the need to do page-at-a-time heap vacuuming when
-			 * using the one-pass strategy now.
-			 *
-			 * The one-pass strategy will never call lazy_vacuum().  The steps
-			 * performed here can be thought of as the one-pass equivalent of
-			 * a call to lazy_vacuum().
-			 */
-			if (prunestate.has_lpdead_items)
-			{
-				Size		freespace;
-
-				lazy_vacuum_heap_page(vacrel, blkno, buf, 0, vmbuffer);
-
-				/* Forget the LP_DEAD items that we just vacuumed */
-				dead_items->num_items = 0;
-
-				/*
-				 * Now perform FSM processing for blkno, and move on to next
-				 * page.
-				 *
-				 * Our call to lazy_vacuum_heap_page() will have considered if
-				 * it's possible to set all_visible/all_frozen independently
-				 * of lazy_scan_prune().  Note that prunestate was invalidated
-				 * by lazy_vacuum_heap_page() call.
-				 */
-				freespace = PageGetHeapFreeSpace(page);
-
-				UnlockReleaseBuffer(buf);
-				RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-
-				/*
-				 * Periodically perform FSM vacuuming to make newly-freed
-				 * space visible on upper FSM pages. FreeSpaceMapVacuumRange()
-				 * vacuums the portion of the freespace map covering heap
-				 * pages from start to end - 1. Include the block we just
-				 * vacuumed by passing it blkno + 1. Overflow isn't an issue
-				 * because MaxBlockNumber + 1 is InvalidBlockNumber which
-				 * causes FreeSpaceMapVacuumRange() to vacuum freespace map
-				 * pages covering the remainder of the relation.
-				 */
-				if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
-				{
-					FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-											blkno + 1);
-					next_fsm_block_to_vacuum = blkno + 1;
-				}
-
-				continue;
-			}
-
-			/*
-			 * There was no call to lazy_vacuum_heap_page() because pruning
-			 * didn't encounter/create any LP_DEAD items that needed to be
-			 * vacuumed.  Prune state has not been invalidated, so proceed
-			 * with prunestate-driven visibility map and FSM steps (just like
-			 * the two-pass strategy).
-			 */
-			Assert(dead_items->num_items == 0);
-		}
-
 		/*
 		 * Handle setting visibility map bit based on information from the VM
 		 * (as of last lazy_scan_skip() call), and from prunestate
@@ -1200,38 +1141,45 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
-		 * FSM
+		 * FSM.
+		 *
+		 * If we will likely do index vacuuming, wait until
+		 * lazy_vacuum_heap_rel() to save free space. This doesn't just save
+		 * us some cycles; it also allows us to record any additional free
+		 * space that lazy_vacuum_heap_page() will make available in cases
+		 * where it's possible to truncate the page's line pointer array.
+		 *
+		 * Note: It's not in fact 100% certain that we really will call
+		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
+		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
+		 * because it only happens in emergencies, or when there is very
+		 * little free space anyway. (Besides, we start recording free space
+		 * in the FSM once index vacuuming has been abandoned.)
 		 */
-		if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
-		{
-			/*
-			 * Wait until lazy_vacuum_heap_rel() to save free space.  This
-			 * doesn't just save us some cycles; it also allows us to record
-			 * any additional free space that lazy_vacuum_heap_page() will
-			 * make available in cases where it's possible to truncate the
-			 * page's line pointer array.
-			 *
-			 * Note: It's not in fact 100% certain that we really will call
-			 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip
-			 * index vacuuming (and so must skip heap vacuuming).  This is
-			 * deemed okay because it only happens in emergencies, or when
-			 * there is very little free space anyway. (Besides, we start
-			 * recording free space in the FSM once index vacuuming has been
-			 * abandoned.)
-			 *
-			 * Note: The one-pass (no indexes) case is only supposed to make
-			 * it this far when there were no LP_DEAD items during pruning.
-			 */
-			Assert(vacrel->nindexes > 0);
-			UnlockReleaseBuffer(buf);
-		}
-		else
+		if (recordfreespace)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
 			UnlockReleaseBuffer(buf);
 			RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
 		}
+		else
+			UnlockReleaseBuffer(buf);
+
+		/*
+		 * Periodically perform FSM vacuuming to make newly-freed space
+		 * visible on upper FSM pages. This is done after vacuuming if the
+		 * table has indexes.
+		 */
+		if (vacrel->nindexes == 0 &&
+			vacrel->tuples_deleted > tuples_already_deleted &&
+			(blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES))
+		{
+			FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
+									blkno);
+			next_fsm_block_to_vacuum = blkno;
+		}
+
 	}
 
 	vacrel->blkno = InvalidBlockNumber;
@@ -1543,7 +1491,8 @@ lazy_scan_prune(LVRelState *vacrel,
 				Buffer buf,
 				BlockNumber blkno,
 				Page page,
-				LVPagePruneState *prunestate)
+				LVPagePruneState *prunestate,
+				bool *recordfreespace)
 {
 	Relation	rel = vacrel->rel;
 	OffsetNumber offnum,
@@ -1587,7 +1536,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * lpdead_items's final value can be thought of as the number of tuples
 	 * that were deleted from indexes.
 	 */
-	heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum);
+	heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+					&presult, &vacrel->offnum);
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
@@ -1598,6 +1548,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	prunestate->all_visible = true;
 	prunestate->all_frozen = true;
 	prunestate->visibility_cutoff_xid = InvalidTransactionId;
+	*recordfreespace = false;
 
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
@@ -1918,6 +1869,15 @@ lazy_scan_prune(LVRelState *vacrel,
 	vacrel->lpdead_items += lpdead_items;
 	vacrel->live_tuples += live_tuples;
 	vacrel->recently_dead_tuples += recently_dead_tuples;
+
+	/*
+	 * If we will not do index vacuuming, either because we have no indexes,
+	 * because there is nothing to vacuum, or because do_index_vacuuming is
+	 * false, make sure we update the freespace map.
+	 */
+	if (vacrel->nindexes == 0 ||
+		!vacrel->do_index_vacuuming || lpdead_items == 0)
+		*recordfreespace = true;
 }
 
 /*
@@ -2516,7 +2476,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
 
-	Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
+	Assert(vacrel->do_index_vacuuming);
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 932ec0d6f2b..4a2aebedb57 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -320,6 +320,7 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
+							bool no_indexes,
 							PruneResult *presult,
 							OffsetNumber *off_loc);
 extern void heap_page_prune_execute(Buffer buffer,
-- 
2.37.2

v6-0002-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patchtext/x-patch; charset=US-ASCII; name=v6-0002-Indicate-rel-truncation-unsafe-in-lazy_scan-no-pr.patchDownload
From 5da8d22ccbdbda1f47f41badb432eb66d7b86206 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 13 Nov 2023 16:47:08 -0500
Subject: [PATCH v6 2/5] Indicate rel truncation unsafe in lazy_scan[no]prune

Both lazy_scan_prune() and lazy_scan_noprune() must determine whether or
not there are tuples on the page making rel truncation unsafe.
LVRelState->nonempty_pages is updated to reflect this.

Previously, both functions set an output parameter or output parameter
member, hastup, to indicate that nonempty_pages should be updated to
reflect the latest non-removable page. There doesn't seem to be any
reason to wait until lazy_scan_[no]prune() returns to update
nonempty_pages. Plenty of other counters in the LVRelState are updated
in lazy_scan_[no]prune(). This allows us to get rid of the output
parameter hastup.

Discussion: https://postgr.es/m/CAAKRu_b_9N2i9fn02P9t_oeLqcEj4%3DE-EK20bZ69nThYWna-Xg%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 52 +++++++++++++++-------------
 1 file changed, 27 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e9636b2c47e..a5b9319fbcc 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -217,7 +217,6 @@ typedef struct LVRelState
  */
 typedef struct LVPagePruneState
 {
-	bool		hastup;			/* Page prevents rel truncation? */
 	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
 
 	/*
@@ -254,7 +253,7 @@ static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							bool *recordfreespace);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
-							  bool *hastup, bool *recordfreespace);
+							  bool *recordfreespace);
 static void lazy_vacuum(LVRelState *vacrel);
 static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
 static void lazy_vacuum_heap_rel(LVRelState *vacrel);
@@ -962,8 +961,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		hastup;
-
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
 			/* Check for new or empty pages before lazy_scan_noprune call */
@@ -974,20 +971,21 @@ lazy_scan_heap(LVRelState *vacrel)
 				continue;
 			}
 
-			/* Collect LP_DEAD items in dead_items array, count tuples */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page, &hastup,
+			/*
+			 * Collect LP_DEAD items in dead_items array, count tuples,
+			 * determine if rel truncation is safe
+			 */
+			if (lazy_scan_noprune(vacrel, buf, blkno, page,
 								  &recordfreespace))
 			{
 				Size		freespace = 0;
 
 				/*
 				 * Processed page successfully (without cleanup lock) -- just
-				 * need to perform rel truncation and FSM steps, much like the
-				 * lazy_scan_prune case.  Don't bother trying to match its
-				 * visibility map setting steps, though.
+				 * need to update the FSM, much like the lazy_scan_prune case.
+				 * Don't bother trying to match its visibility map setting
+				 * steps, though.
 				 */
-				if (hastup)
-					vacrel->nonempty_pages = blkno + 1;
 				if (recordfreespace)
 					freespace = PageGetHeapFreeSpace(page);
 				UnlockReleaseBuffer(buf);
@@ -1021,16 +1019,13 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * dead_items array.  This includes LP_DEAD line pointers that we
 		 * pruned ourselves, as well as existing LP_DEAD line pointers that
 		 * were pruned some time earlier.  Also considers freezing XIDs in the
-		 * tuple headers of remaining items with storage.
+		 * tuple headers of remaining items with storage. It also determines
+		 * if truncating this block is safe.
 		 */
 		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate, &recordfreespace);
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
-		/* Remember the location of the last page with nonremovable tuples */
-		if (prunestate.hastup)
-			vacrel->nonempty_pages = blkno + 1;
-
 		/*
 		 * Handle setting visibility map bit based on information from the VM
 		 * (as of last lazy_scan_skip() call), and from prunestate
@@ -1504,6 +1499,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				live_tuples,
 				recently_dead_tuples;
 	HeapPageFreeze pagefrz;
+	bool		hastup = false;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1543,7 +1539,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * Now scan the page to collect LP_DEAD items and check for tuples
 	 * requiring freezing among remaining tuples with storage
 	 */
-	prunestate->hastup = false;
 	prunestate->has_lpdead_items = false;
 	prunestate->all_visible = true;
 	prunestate->all_frozen = true;
@@ -1571,7 +1566,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		if (ItemIdIsRedirected(itemid))
 		{
 			/* page makes rel truncation unsafe */
-			prunestate->hastup = true;
+			hastup = true;
 			continue;
 		}
 
@@ -1701,7 +1696,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				break;
 		}
 
-		prunestate->hastup = true;	/* page makes rel truncation unsafe */
+		hastup = true;			/* page makes rel truncation unsafe */
 
 		/* Tuple with storage -- consider need to freeze */
 		if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
@@ -1878,6 +1873,10 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0 ||
 		!vacrel->do_index_vacuuming || lpdead_items == 0)
 		*recordfreespace = true;
+
+	/* Can't truncate this page */
+	if (hastup)
+		vacrel->nonempty_pages = blkno + 1;
 }
 
 /*
@@ -1895,7 +1894,6 @@ lazy_scan_prune(LVRelState *vacrel,
  * one or more tuples on the page.  We always return true for non-aggressive
  * callers.
  *
- * See lazy_scan_prune for an explanation of hastup return flag.
  * recordfreespace flag instructs caller on whether or not it should do
  * generic FSM processing for page.
  */
@@ -1904,7 +1902,6 @@ lazy_scan_noprune(LVRelState *vacrel,
 				  Buffer buf,
 				  BlockNumber blkno,
 				  Page page,
-				  bool *hastup,
 				  bool *recordfreespace)
 {
 	OffsetNumber offnum,
@@ -1913,6 +1910,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 				live_tuples,
 				recently_dead_tuples,
 				missed_dead_tuples;
+	bool		hastup;
 	HeapTupleHeader tupleheader;
 	TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
 	MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid;
@@ -1920,7 +1918,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
-	*hastup = false;			/* for now */
+	hastup = false;				/* for now */
 	*recordfreespace = false;	/* for now */
 
 	lpdead_items = 0;
@@ -1944,7 +1942,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 
 		if (ItemIdIsRedirected(itemid))
 		{
-			*hastup = true;
+			hastup = true;
 			continue;
 		}
 
@@ -1958,7 +1956,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 			continue;
 		}
 
-		*hastup = true;			/* page prevents rel truncation */
+		hastup = true;			/* page prevents rel truncation */
 		tupleheader = (HeapTupleHeader) PageGetItem(page, itemid);
 		if (heap_tuple_should_freeze(tupleheader, &vacrel->cutoffs,
 									 &NoFreezePageRelfrozenXid,
@@ -2060,7 +2058,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 			 * but it beats having to maintain specialized heap vacuuming code
 			 * forever, for vanishingly little benefit.)
 			 */
-			*hastup = true;
+			hastup = true;
 			missed_dead_tuples += lpdead_items;
 		}
 
@@ -2116,6 +2114,10 @@ lazy_scan_noprune(LVRelState *vacrel,
 	if (missed_dead_tuples > 0)
 		vacrel->missed_dead_pages++;
 
+	/* Can't truncate this page */
+	if (hastup)
+		vacrel->nonempty_pages = blkno + 1;
+
 	/* Caller won't need to call lazy_scan_prune with same page */
 	return true;
 }
-- 
2.37.2

v6-0005-Move-freespace-map-update-into-lazy_scan_-no-prun.patchtext/x-patch; charset=US-ASCII; name=v6-0005-Move-freespace-map-update-into-lazy_scan_-no-prun.patchDownload
From eba3ec6ec6ab602b4204f35d8b38f1792c3f9f73 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 9 Jan 2024 17:18:06 -0500
Subject: [PATCH v6 5/5] Move freespace map update into lazy_scan_[no]prune()

Previously, lazy_scan_heap() updated the freespace map after calling
both lazy_scan_prune() and lazy_scan_noprune() if any space has been
reclaimed. They both had an output parameter recordfreespace to enable
lazy_scan_heap() to do this. Instead, update the freespace map in
lazy_scan_prune() and lazy_scan_noprune(). With this change, all
page-specific processing is done in lazy_scan_prune(),
lazy_scan_noprune(), and lazy_scan_new_or_empty(). This simplifies
lazy_scan_heap().
---
 src/backend/access/heap/vacuumlazy.c | 118 +++++++++++++--------------
 1 file changed, 58 insertions(+), 60 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a554a650d62..bfcb7d73459 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -232,11 +232,9 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							Buffer vmbuffer, bool all_visible_according_to_vm,
-							bool *recordfreespace);
+							Buffer vmbuffer, bool all_visible_according_to_vm);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
-							  BlockNumber blkno, Page page,
-							  bool *recordfreespace);
+							  BlockNumber blkno, Page page);
 static void lazy_vacuum(LVRelState *vacrel);
 static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
 static void lazy_vacuum_heap_rel(LVRelState *vacrel);
@@ -816,7 +814,6 @@ lazy_scan_heap(LVRelState *vacrel)
 	int			tuples_already_deleted;
 	bool		next_unskippable_allvis,
 				skipping_current_range;
-	bool		recordfreespace;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -955,24 +952,14 @@ lazy_scan_heap(LVRelState *vacrel)
 
 			/*
 			 * Collect LP_DEAD items in dead_items array, count tuples,
-			 * determine if rel truncation is safe
+			 * determine if rel truncation is safe, and update the FSM.
 			 */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page,
-								  &recordfreespace))
+			if (lazy_scan_noprune(vacrel, buf, blkno, page))
 			{
-				Size		freespace = 0;
-
 				/*
-				 * Processed page successfully (without cleanup lock) -- just
-				 * need to update the FSM, much like the lazy_scan_prune case.
-				 * Don't bother trying to match its visibility map setting
-				 * steps, though.
+				 * Processed page successfully (without cleanup lock). Buffer
+				 * lock and pin released.
 				 */
-				if (recordfreespace)
-					freespace = PageGetHeapFreeSpace(page);
-				UnlockReleaseBuffer(buf);
-				if (recordfreespace)
-					RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
 				continue;
 			}
 
@@ -1002,38 +989,11 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * pruned ourselves, as well as existing LP_DEAD line pointers that
 		 * were pruned some time earlier.  Also considers freezing XIDs in the
 		 * tuple headers of remaining items with storage. It also determines
-		 * if truncating this block is safe.
+		 * if truncating this block is safe and updates the FSM (if relevant).
+		 * Buffer is returned with lock and pin released.
 		 */
 		lazy_scan_prune(vacrel, buf, blkno, page,
-						vmbuffer, all_visible_according_to_vm,
-						&recordfreespace);
-
-		/*
-		 * Final steps for block: drop cleanup lock, record free space in the
-		 * FSM.
-		 *
-		 * If we will likely do index vacuuming, wait until
-		 * lazy_vacuum_heap_rel() to save free space. This doesn't just save
-		 * us some cycles; it also allows us to record any additional free
-		 * space that lazy_vacuum_heap_page() will make available in cases
-		 * where it's possible to truncate the page's line pointer array.
-		 *
-		 * Note: It's not in fact 100% certain that we really will call
-		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
-		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
-		 * because it only happens in emergencies, or when there is very
-		 * little free space anyway. (Besides, we start recording free space
-		 * in the FSM once index vacuuming has been abandoned.)
-		 */
-		if (recordfreespace)
-		{
-			Size		freespace = PageGetHeapFreeSpace(page);
-
-			UnlockReleaseBuffer(buf);
-			RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-		}
-		else
-			UnlockReleaseBuffer(buf);
+						vmbuffer, all_visible_according_to_vm);
 
 		/*
 		 * Periodically perform FSM vacuuming to make newly-freed space
@@ -1361,8 +1321,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				BlockNumber blkno,
 				Page page,
 				Buffer vmbuffer,
-				bool all_visible_according_to_vm,
-				bool *recordfreespace)
+				bool all_visible_according_to_vm)
 {
 	Relation	rel = vacrel->rel;
 	OffsetNumber offnum,
@@ -1377,6 +1336,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	bool		hastup = false;
 	bool		all_visible,
 				all_frozen;
+	bool		recordfreespace;
 	TransactionId visibility_cutoff_xid;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1419,7 +1379,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * the visibility cutoff xid for recovery conflicts.
 	 */
 	visibility_cutoff_xid = InvalidTransactionId;
-	*recordfreespace = false;
+	recordfreespace = false;
 
 	/*
 	 * We will update the VM after collecting LP_DEAD items and freezing
@@ -1758,7 +1718,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 */
 	if (vacrel->nindexes == 0 ||
 		!vacrel->do_index_vacuuming || lpdead_items == 0)
-		*recordfreespace = true;
+		recordfreespace = true;
 
 	/* Can't truncate this page */
 	if (hastup)
@@ -1873,6 +1833,32 @@ lazy_scan_prune(LVRelState *vacrel,
 						  VISIBILITYMAP_ALL_VISIBLE |
 						  VISIBILITYMAP_ALL_FROZEN);
 	}
+
+	/*
+	 * Final steps for block: drop cleanup lock, record free space in the FSM.
+	 *
+	 * If we will likely do index vacuuming, wait until lazy_vacuum_heap_rel()
+	 * to save free space. This doesn't just save us some cycles; it also
+	 * allows us to record any additional free space that
+	 * lazy_vacuum_heap_page() will make available in cases where it's
+	 * possible to truncate the page's line pointer array.
+	 *
+	 * Note: It's not in fact 100% certain that we really will call
+	 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
+	 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
+	 * because it only happens in emergencies, or when there is very little
+	 * free space anyway. (Besides, we start recording free space in the FSM
+	 * once index vacuuming has been abandoned.)
+	 */
+	if (recordfreespace)
+	{
+		Size		freespace = PageGetHeapFreeSpace(page);
+
+		UnlockReleaseBuffer(buf);
+		RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
+	}
+	else
+		UnlockReleaseBuffer(buf);
 }
 
 /*
@@ -1897,8 +1883,7 @@ static bool
 lazy_scan_noprune(LVRelState *vacrel,
 				  Buffer buf,
 				  BlockNumber blkno,
-				  Page page,
-				  bool *recordfreespace)
+				  Page page)
 {
 	OffsetNumber offnum,
 				maxoff;
@@ -1907,6 +1892,8 @@ lazy_scan_noprune(LVRelState *vacrel,
 				recently_dead_tuples,
 				missed_dead_tuples;
 	bool		hastup;
+	bool		recordfreespace;
+	Size		freespace = 0;
 	HeapTupleHeader tupleheader;
 	TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
 	MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid;
@@ -1915,7 +1902,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
 	hastup = false;				/* for now */
-	*recordfreespace = false;	/* for now */
+	recordfreespace = false;	/* for now */
 
 	lpdead_items = 0;
 	live_tuples = 0;
@@ -2058,7 +2045,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 			missed_dead_tuples += lpdead_items;
 		}
 
-		*recordfreespace = true;
+		recordfreespace = true;
 	}
 	else if (lpdead_items == 0)
 	{
@@ -2066,7 +2053,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 		 * Won't be vacuuming this page later, so record page's freespace in
 		 * the FSM now
 		 */
-		*recordfreespace = true;
+		recordfreespace = true;
 	}
 	else
 	{
@@ -2098,7 +2085,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 		 * Assume that we'll go on to vacuum this heap page during final pass
 		 * over the heap.  Don't record free space until then.
 		 */
-		*recordfreespace = false;
+		recordfreespace = false;
 	}
 
 	/*
@@ -2114,7 +2101,18 @@ lazy_scan_noprune(LVRelState *vacrel,
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
 
-	/* Caller won't need to call lazy_scan_prune with same page */
+	/*
+	 * Since we processed page successfully without cleanup lock, caller won't
+	 * need to call lazy_scan_prune() with the same page. Now, we just need to
+	 * update the FSM, much like the lazy_scan_prune case. Don't bother trying
+	 * to match its visibility map setting steps, though.
+	 */
+	if (recordfreespace)
+		freespace = PageGetHeapFreeSpace(page);
+	UnlockReleaseBuffer(buf);
+	if (recordfreespace)
+		RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
+
 	return true;
 }
 
-- 
2.37.2

v6-0004-Inline-LVPagePruneState-members-in-lazy_scan_prun.patchtext/x-patch; charset=US-ASCII; name=v6-0004-Inline-LVPagePruneState-members-in-lazy_scan_prun.patchDownload
From 1f636ac362b7a31065eef466965a5aad5862de28 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 9 Jan 2024 17:17:01 -0500
Subject: [PATCH v6 4/5] Inline LVPagePruneState members in lazy_scan_prune()

After moving the code to update the visibility map from lazy_scan_heap()
into lazy_scan_prune(), the LVPagePruneState is obsolete. Replace all
instances of use of its members in lazy_scan_prune() with local
variables and remove the struct.
---
 src/backend/access/heap/vacuumlazy.c | 113 +++++++++++++--------------
 src/tools/pgindent/typedefs.list     |   1 -
 2 files changed, 53 insertions(+), 61 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index bbfd3437fc7..a554a650d62 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -212,23 +212,6 @@ typedef struct LVRelState
 	int64		missed_dead_tuples; /* # removable, but not removed */
 } LVRelState;
 
-/*
- * State returned by lazy_scan_prune()
- */
-typedef struct LVPagePruneState
-{
-	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
-
-	/*
-	 * State describes the proper VM bit states to set for the page following
-	 * pruning and freezing.  all_visible implies !has_lpdead_items, but don't
-	 * trust all_frozen result unless all_visible is also set to true.
-	 */
-	bool		all_visible;	/* Every item visible to all? */
-	bool		all_frozen;		/* provided all_visible is also true */
-	TransactionId visibility_cutoff_xid;	/* For recovery conflicts */
-} LVPagePruneState;
-
 /* Struct for saving and restoring vacuum error information. */
 typedef struct LVSavedErrInfo
 {
@@ -250,7 +233,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
-							LVPagePruneState *prunestate, bool *recordfreespace);
+							bool *recordfreespace);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *recordfreespace);
@@ -856,7 +839,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		Buffer		buf;
 		Page		page;
 		bool		all_visible_according_to_vm;
-		LVPagePruneState prunestate;
 
 		if (blkno == next_unskippable_block)
 		{
@@ -1024,7 +1006,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 */
 		lazy_scan_prune(vacrel, buf, blkno, page,
 						vmbuffer, all_visible_according_to_vm,
-						&prunestate, &recordfreespace);
+						&recordfreespace);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1380,7 +1362,6 @@ lazy_scan_prune(LVRelState *vacrel,
 				Page page,
 				Buffer vmbuffer,
 				bool all_visible_according_to_vm,
-				LVPagePruneState *prunestate,
 				bool *recordfreespace)
 {
 	Relation	rel = vacrel->rel;
@@ -1394,6 +1375,9 @@ lazy_scan_prune(LVRelState *vacrel,
 				recently_dead_tuples;
 	HeapPageFreeze pagefrz;
 	bool		hastup = false;
+	bool		all_visible,
+				all_frozen;
+	TransactionId visibility_cutoff_xid;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1431,14 +1415,22 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
-	 * requiring freezing among remaining tuples with storage
+	 * requiring freezing among remaining tuples with storage. Keep track of
+	 * the visibility cutoff xid for recovery conflicts.
 	 */
-	prunestate->has_lpdead_items = false;
-	prunestate->all_visible = true;
-	prunestate->all_frozen = true;
-	prunestate->visibility_cutoff_xid = InvalidTransactionId;
+	visibility_cutoff_xid = InvalidTransactionId;
 	*recordfreespace = false;
 
+	/*
+	 * We will update the VM after collecting LP_DEAD items and freezing
+	 * tuples. Keep track of whether or not the page is all_visible and
+	 * all_frozen and use this information to update the VM. all_visible
+	 * implies 0 lpdead_items, but don't trust all_frozen result unless
+	 * all_visible is also set to true.
+	 */
+	all_visible = true;
+	all_frozen = true;
+
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
 		 offnum = OffsetNumberNext(offnum))
@@ -1525,13 +1517,13 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * asynchronously. See SetHintBits for more info. Check that
 				 * the tuple is hinted xmin-committed because of that.
 				 */
-				if (prunestate->all_visible)
+				if (all_visible)
 				{
 					TransactionId xmin;
 
 					if (!HeapTupleHeaderXminCommitted(htup))
 					{
-						prunestate->all_visible = false;
+						all_visible = false;
 						break;
 					}
 
@@ -1543,14 +1535,14 @@ lazy_scan_prune(LVRelState *vacrel,
 					if (!TransactionIdPrecedes(xmin,
 											   vacrel->cutoffs.OldestXmin))
 					{
-						prunestate->all_visible = false;
+						all_visible = false;
 						break;
 					}
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, prunestate->visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
 						TransactionIdIsNormal(xmin))
-						prunestate->visibility_cutoff_xid = xmin;
+						visibility_cutoff_xid = xmin;
 				}
 				break;
 			case HEAPTUPLE_RECENTLY_DEAD:
@@ -1561,7 +1553,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * pruning.)
 				 */
 				recently_dead_tuples++;
-				prunestate->all_visible = false;
+				all_visible = false;
 				break;
 			case HEAPTUPLE_INSERT_IN_PROGRESS:
 
@@ -1572,11 +1564,11 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * results.  This assumption is a bit shaky, but it is what
 				 * acquire_sample_rows() does, so be consistent.
 				 */
-				prunestate->all_visible = false;
+				all_visible = false;
 				break;
 			case HEAPTUPLE_DELETE_IN_PROGRESS:
 				/* This is an expected case during concurrent vacuum */
-				prunestate->all_visible = false;
+				all_visible = false;
 
 				/*
 				 * Count such rows as live.  As above, we assume the deleting
@@ -1606,7 +1598,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * definitely cannot be set all-frozen in the visibility map later on
 		 */
 		if (!totally_frozen)
-			prunestate->all_frozen = false;
+			all_frozen = false;
 	}
 
 	/*
@@ -1624,7 +1616,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * page all-frozen afterwards (might not happen until final heap pass).
 	 */
 	if (pagefrz.freeze_required || tuples_frozen == 0 ||
-		(prunestate->all_visible && prunestate->all_frozen &&
+		(all_visible && all_frozen &&
 		 fpi_before != pgWalUsage.wal_fpi))
 	{
 		/*
@@ -1662,11 +1654,11 @@ lazy_scan_prune(LVRelState *vacrel,
 			 * once we're done with it.  Otherwise we generate a conservative
 			 * cutoff by stepping back from OldestXmin.
 			 */
-			if (prunestate->all_visible && prunestate->all_frozen)
+			if (all_visible && all_frozen)
 			{
 				/* Using same cutoff when setting VM is now unnecessary */
-				snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
-				prunestate->visibility_cutoff_xid = InvalidTransactionId;
+				snapshotConflictHorizon = visibility_cutoff_xid;
+				visibility_cutoff_xid = InvalidTransactionId;
 			}
 			else
 			{
@@ -1689,7 +1681,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 */
 		vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
 		vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
-		prunestate->all_frozen = false;
+		all_frozen = false;
 		tuples_frozen = 0;		/* avoid miscounts in instrumentation */
 	}
 
@@ -1702,16 +1694,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	 */
 #ifdef USE_ASSERT_CHECKING
 	/* Note that all_frozen value does not matter when !all_visible */
-	if (prunestate->all_visible && lpdead_items == 0)
+	if (all_visible && lpdead_items == 0)
 	{
-		TransactionId cutoff;
-		bool		all_frozen;
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen))
+		if (!heap_page_is_all_visible(vacrel, buf,
+									  &debug_cutoff, &debug_all_frozen))
 			Assert(false);
 
-		Assert(!TransactionIdIsValid(cutoff) ||
-			   cutoff == prunestate->visibility_cutoff_xid);
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == visibility_cutoff_xid);
 	}
 #endif
 
@@ -1724,7 +1717,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		ItemPointerData tmp;
 
 		vacrel->lpdead_item_pages++;
-		prunestate->has_lpdead_items = true;
 
 		ItemPointerSetBlockNumber(&tmp, blkno);
 
@@ -1749,7 +1741,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * Now that freezing has been finalized, unset all_visible.  It needs
 		 * to reflect the present state of things, as expected by our caller.
 		 */
-		prunestate->all_visible = false;
+		all_visible = false;
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1772,19 +1764,20 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
 
-	Assert(!prunestate->all_visible || !prunestate->has_lpdead_items);
+	Assert(!all_visible || lpdead_items == 0);
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last lazy_scan_skip() call), and from prunestate
+	 * of last lazy_scan_skip() call), and from all_visible and all_frozen
+	 * variables
 	 */
-	if (!all_visible_according_to_vm && prunestate->all_visible)
+	if (!all_visible_according_to_vm && all_visible)
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
-		if (prunestate->all_frozen)
+		if (all_frozen)
 		{
-			Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
@@ -1804,7 +1797,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
 		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-						  vmbuffer, prunestate->visibility_cutoff_xid,
+						  vmbuffer, visibility_cutoff_xid,
 						  flags);
 	}
 
@@ -1837,7 +1830,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
 	 * however.
 	 */
-	else if (prunestate->has_lpdead_items && PageIsAllVisible(page))
+	else if (lpdead_items > 0 && PageIsAllVisible(page))
 	{
 		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
 			 vacrel->relname, blkno);
@@ -1850,16 +1843,16 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both prunestate fields.
+	 * true, so we must check both all_visible and all_frozen.
 	 */
-	else if (all_visible_according_to_vm && prunestate->all_visible &&
-			 prunestate->all_frozen &&
+	else if (all_visible_according_to_vm && all_visible &&
+			 all_frozen &&
 			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		/*
 		 * Avoid relying on all_visible_according_to_vm as a proxy for the
 		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set in prunestate
+		 * stale -- even when all_visible is set
 		 */
 		if (!PageIsAllVisible(page))
 		{
@@ -1874,7 +1867,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * since a snapshotConflictHorizon sufficient to make everything safe
 		 * for REDO was logged when the page's tuples were frozen.
 		 */
-		Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+		Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
 						  vmbuffer, InvalidTransactionId,
 						  VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5fd46b7bd1f..1edafd0f516 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1405,7 +1405,6 @@ LPVOID
 LPWSTR
 LSEG
 LUID
-LVPagePruneState
 LVRelState
 LVSavedErrInfo
 LWLock
-- 
2.37.2

#40Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#39)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Tue, Jan 9, 2024 at 5:42 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Done in attached v6.

Don't kill me, but:

+                       /*
+                        * For now, pass no_indexes == false
regardless of whether or not
+                        * the relation has indexes. In the future we
may enable immediate
+                        * reaping for on access pruning.
+                        */
+                       heap_page_prune(relation, buffer, vistest, false,
+                                                       &presult, NULL);

My previous comments about lying seem equally applicable here.

-               if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+               if (!ItemIdIsUsed(itemid))
                        continue;

There is a bit of overhead here for the !no_indexes case. I assume it
doesn't matter.

 static void
 heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
 {
+       /*
+        * If the relation has no indexes, we can remove dead tuples during
+        * pruning instead of marking their line pointers dead. Set this tuple's
+        * line pointer LP_UNUSED. We hint that tables with indexes are more
+        * likely.
+        */
+       if (unlikely(prstate->no_indexes))
+       {
+               heap_prune_record_unused(prstate, offnum);
+               return;
+       }

I think this should be pushed to the callers. Else you make the
existing function name into another lie.

+ bool recordfreespace;

Not sure if it's necessary to move this to an outer scope like this?
The whole handling of this looks kind of confusing. If we're going to
do it this way, then I think lazy_scan_prune() definitely needs to
document how it handles this function (i.e. might set true to false,
won't set false to true, also meaning). But are we sure we want to let
a local variable with a weird name "leak out" like this?

+ Assert(vacrel->do_index_vacuuming);

Is this related?

--
Robert Haas
EDB: http://www.enterprisedb.com

#41Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#40)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 10, 2024 at 3:54 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 9, 2024 at 5:42 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Done in attached v6.

Don't kill me, but:

+                       /*
+                        * For now, pass no_indexes == false
regardless of whether or not
+                        * the relation has indexes. In the future we
may enable immediate
+                        * reaping for on access pruning.
+                        */
+                       heap_page_prune(relation, buffer, vistest, false,
+                                                       &presult, NULL);

My previous comments about lying seem equally applicable here.

Yes, the options I can think of are:

1) rename the parameter to "immed_reap" or similar and make very clear
in heap_page_prune_opt() that you are not to pass true.
2) make immediate reaping work for on-access pruning. I would need a
low cost way to find out if there are any indexes on the table. Do you
think this is possible? Should we just rename the parameter for now
and think about that later?

-               if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+               if (!ItemIdIsUsed(itemid))
continue;

There is a bit of overhead here for the !no_indexes case. I assume it
doesn't matter.

Right. Should be okay. Alternatively, and I'm not saying this is a
good idea, but we could throw this into the loop in heap_page_prune()
which calls heap_prune_chain():

+       if (ItemIdIsDead(itemid) && prstate.no_indexes)
+       {
+           heap_prune_record_unused(&prstate, offnum);
+           continue;
+       }

I think that is correct?
But, from a consistency perspective, we don't call the
heap_prune_record* functions directly from heap_page_prune() in any
other cases. And it seems like it would be nice to have all of those
in one place?

static void
heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
{
+       /*
+        * If the relation has no indexes, we can remove dead tuples during
+        * pruning instead of marking their line pointers dead. Set this tuple's
+        * line pointer LP_UNUSED. We hint that tables with indexes are more
+        * likely.
+        */
+       if (unlikely(prstate->no_indexes))
+       {
+               heap_prune_record_unused(prstate, offnum);
+               return;
+       }

I think this should be pushed to the callers. Else you make the
existing function name into another lie.

Yes, so I preferred it in the body of heap_prune_chain() (the caller).
Andres didn't like the extra level of indentation. I could wrap
heap_record_dead() and heap_record_unused(), but I couldn't really
think of a good wrapper name. heap_record_dead_or_unused() seems long
and literal. But, it might be the best I can do. I can't think of a
general word which encompasses both the transition to death and to
disposal.

+ bool recordfreespace;

Not sure if it's necessary to move this to an outer scope like this?
The whole handling of this looks kind of confusing. If we're going to
do it this way, then I think lazy_scan_prune() definitely needs to
document how it handles this function (i.e. might set true to false,
won't set false to true, also meaning). But are we sure we want to let
a local variable with a weird name "leak out" like this?

Which function do you mean when you say "document how
lazy_scan_prune() handles this function". And no we definitely don't
want a variable like this to be hanging out in lazy_scan_heap(), IMHO.
The other patches in the stack move the FSM updates into
lazy_scan_[no]prune() and eliminate this parameter. I moved up the
scope because lazy_scan_noprune() already had recordfreespace as an
output parameter and initialized it unconditionally inside. I
initialize it unconditionally in lazy_scan_prune() as well. I mention
in the commit message that this is temporary because we plan to
eliminate recordfreespace as an output parameter by updating the FSM
in lazy_scan_[no]prune(). I could have stuck recordfreespace into the
LVPagePruneState with the other output parameters. But, leaving it as
a separate output parameter made the diffs lovely for the rest of the
patches in the stack.

+ Assert(vacrel->do_index_vacuuming);

Is this related?

Yes, previously the assert was:
Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
And we eliminated the caller of lazy_vacuum_heap_page() with
vacrel->nindexes == 0. Now it should only be called after doing index
vacuuming (thus index vacuuming should definitely be enabled).

- Melanie

#42Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#41)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 10, 2024 at 5:28 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Yes, the options I can think of are:

1) rename the parameter to "immed_reap" or similar and make very clear
in heap_page_prune_opt() that you are not to pass true.
2) make immediate reaping work for on-access pruning. I would need a
low cost way to find out if there are any indexes on the table. Do you
think this is possible? Should we just rename the parameter for now
and think about that later?

I don't think we can implement (2), because:

robert.haas=# create table test (a int);
CREATE TABLE
robert.haas=# begin;
BEGIN
robert.haas=*# select * from test;
a
---
(0 rows)

<in another window>

robert.haas=# create index on test (a);
CREATE INDEX

In English, the lock we hold during regular table access isn't strong
enough to foreclose concurrent addition of an index.

So renaming the parameter seems like the way to go. How about "mark_unused_now"?

-               if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+               if (!ItemIdIsUsed(itemid))
continue;

There is a bit of overhead here for the !no_indexes case. I assume it
doesn't matter.

Right. Should be okay. Alternatively, and I'm not saying this is a
good idea, but we could throw this into the loop in heap_page_prune()
which calls heap_prune_chain():

+       if (ItemIdIsDead(itemid) && prstate.no_indexes)
+       {
+           heap_prune_record_unused(&prstate, offnum);
+           continue;
+       }

I think that is correct?

Wouldn't that be wrong in any case where heap_prune_chain() calls
heap_prune_record_dead()?

Yes, so I preferred it in the body of heap_prune_chain() (the caller).
Andres didn't like the extra level of indentation. I could wrap
heap_record_dead() and heap_record_unused(), but I couldn't really
think of a good wrapper name. heap_record_dead_or_unused() seems long
and literal. But, it might be the best I can do. I can't think of a
general word which encompasses both the transition to death and to
disposal.

I'm sure we could solve the wordsmithing problem but I think it's
clearer if the heap_prune_record_* functions don't get clever.

+ bool recordfreespace;

Not sure if it's necessary to move this to an outer scope like this?
The whole handling of this looks kind of confusing. If we're going to
do it this way, then I think lazy_scan_prune() definitely needs to
document how it handles this function (i.e. might set true to false,
won't set false to true, also meaning). But are we sure we want to let
a local variable with a weird name "leak out" like this?

Which function do you mean when you say "document how
lazy_scan_prune() handles this function".

"function" was a thinko for "variable".

And no we definitely don't
want a variable like this to be hanging out in lazy_scan_heap(), IMHO.
The other patches in the stack move the FSM updates into
lazy_scan_[no]prune() and eliminate this parameter. I moved up the
scope because lazy_scan_noprune() already had recordfreespace as an
output parameter and initialized it unconditionally inside. I
initialize it unconditionally in lazy_scan_prune() as well. I mention
in the commit message that this is temporary because we plan to
eliminate recordfreespace as an output parameter by updating the FSM
in lazy_scan_[no]prune(). I could have stuck recordfreespace into the
LVPagePruneState with the other output parameters. But, leaving it as
a separate output parameter made the diffs lovely for the rest of the
patches in the stack.

I guess I'll have to look at more of the patches to see what I think
of this. Looking at this patch in isolation, it's ugly, IMHO, but it
seems we agree on that.

+ Assert(vacrel->do_index_vacuuming);

Is this related?

Yes, previously the assert was:
Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
And we eliminated the caller of lazy_vacuum_heap_page() with
vacrel->nindexes == 0. Now it should only be called after doing index
vacuuming (thus index vacuuming should definitely be enabled).

Ah.

--
Robert Haas
EDB: http://www.enterprisedb.com

#43Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#35)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Tue, Jan 9, 2024 at 2:35 PM Andres Freund <andres@anarazel.de> wrote:

I don't have that strong feelings about it. If both of you think it looks good, go ahead...

Done.

--
Robert Haas
EDB: http://www.enterprisedb.com

#44Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#42)
4 attachment(s)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

Attached v7 is rebased over the commit you just made to remove
LVPagePruneState->hastup.

On Thu, Jan 11, 2024 at 11:54 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 10, 2024 at 5:28 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Yes, the options I can think of are:

1) rename the parameter to "immed_reap" or similar and make very clear
in heap_page_prune_opt() that you are not to pass true.
2) make immediate reaping work for on-access pruning. I would need a
low cost way to find out if there are any indexes on the table. Do you
think this is possible? Should we just rename the parameter for now
and think about that later?

I don't think we can implement (2), because:

robert.haas=# create table test (a int);
CREATE TABLE
robert.haas=# begin;
BEGIN
robert.haas=*# select * from test;
a
---
(0 rows)

<in another window>

robert.haas=# create index on test (a);
CREATE INDEX

In English, the lock we hold during regular table access isn't strong
enough to foreclose concurrent addition of an index.

Ah, I see.

So renaming the parameter seems like the way to go. How about "mark_unused_now"?

I've done this in attached v7.

-               if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+               if (!ItemIdIsUsed(itemid))
continue;

There is a bit of overhead here for the !no_indexes case. I assume it
doesn't matter.

Right. Should be okay. Alternatively, and I'm not saying this is a
good idea, but we could throw this into the loop in heap_page_prune()
which calls heap_prune_chain():

+       if (ItemIdIsDead(itemid) && prstate.no_indexes)
+       {
+           heap_prune_record_unused(&prstate, offnum);
+           continue;
+       }

I think that is correct?

Wouldn't that be wrong in any case where heap_prune_chain() calls
heap_prune_record_dead()?

Hmm. But previously, we skipped heap_prune_chain() for already dead
line pointers. In my patch, I rely on the test in the loop in
heap_prune_chain() to set those LP_UNUSED if mark_unused_now is true
(previously the below code just broke out of the loop).

/*
* Likewise, a dead line pointer can't be part of the chain. (We
* already eliminated the case of dead root tuple outside this
* function.)
*/
if (ItemIdIsDead(lp))
{
/*
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now. We don't increment ndeleted here since
* the LP was already marked dead.
*/
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);

break;
}

so wouldn't what I suggested simply set the item LP_UNSED that before
invoking heap_prune_chain()? Thereby achieving the same result without
invoking heap_prune_chain() for already dead line pointers? I could be
missing something. That heap_prune_chain() logic sometimes gets me
turned around.

Yes, so I preferred it in the body of heap_prune_chain() (the caller).
Andres didn't like the extra level of indentation. I could wrap
heap_record_dead() and heap_record_unused(), but I couldn't really
think of a good wrapper name. heap_record_dead_or_unused() seems long
and literal. But, it might be the best I can do. I can't think of a
general word which encompasses both the transition to death and to
disposal.

I'm sure we could solve the wordsmithing problem but I think it's
clearer if the heap_prune_record_* functions don't get clever.

I've gone with my heap_record_dead_or_unused() suggestion.

- Melanie

Attachments:

v7-0002-Move-VM-update-code-to-lazy_scan_prune.patchtext/x-patch; charset=US-ASCII; name=v7-0002-Move-VM-update-code-to-lazy_scan_prune.patchDownload
From dedbd202dfd800c3ac8b06b58c5d16a036b4a3e2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 5 Jan 2024 11:12:58 -0500
Subject: [PATCH v7 2/4] Move VM update code to lazy_scan_prune()

The visibility map is updated after lazy_scan_prune() returns in
lazy_scan_heap(). Most of the output parameters of lazy_scan_prune() are
used to update the VM in lazy_scan_heap(). Moving that code into
lazy_scan_prune() simplifies lazy_scan_heap() and requires less
communication between the two. This is a pure lift and shift. No VM
update logic is changed. All of the members of the prunestate are now
obsolete. It will be removed in a future commit.
---
 src/backend/access/heap/vacuumlazy.c | 229 ++++++++++++++-------------
 1 file changed, 116 insertions(+), 113 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 85c0919dc71..82495ef8082 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -249,8 +249,8 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							LVPagePruneState *prunestate,
-							bool *recordfreespace);
+							Buffer vmbuffer, bool all_visible_according_to_vm,
+							LVPagePruneState *prunestate, bool *recordfreespace);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *recordfreespace);
@@ -1022,117 +1022,9 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * tuple headers of remaining items with storage. It also determines
 		 * if truncating this block is safe.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate, &recordfreespace);
-
-		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
-
-		/*
-		 * Handle setting visibility map bit based on information from the VM
-		 * (as of last lazy_scan_skip() call), and from prunestate
-		 */
-		if (!all_visible_according_to_vm && prunestate.all_visible)
-		{
-			uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-			if (prunestate.all_frozen)
-			{
-				Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-				flags |= VISIBILITYMAP_ALL_FROZEN;
-			}
-
-			/*
-			 * It should never be the case that the visibility map page is set
-			 * while the page-level bit is clear, but the reverse is allowed
-			 * (if checksums are not enabled).  Regardless, set both bits so
-			 * that we get back in sync.
-			 *
-			 * NB: If the heap page is all-visible but the VM bit is not set,
-			 * we don't need to dirty the heap page.  However, if checksums
-			 * are enabled, we do need to make sure that the heap page is
-			 * dirtied before passing it to visibilitymap_set(), because it
-			 * may be logged.  Given that this situation should only happen in
-			 * rare cases after a crash, it is not worth optimizing.
-			 */
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, prunestate.visibility_cutoff_xid,
-							  flags);
-		}
-
-		/*
-		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
-		 * the page-level bit is clear.  However, it's possible that the bit
-		 * got cleared after lazy_scan_skip() was called, so we must recheck
-		 * with buffer lock before concluding that the VM is corrupt.
-		 */
-		else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-				 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-		{
-			elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-				 vacrel->relname, blkno);
-			visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-								VISIBILITYMAP_VALID_BITS);
-		}
-
-		/*
-		 * It's possible for the value returned by
-		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-		 * wrong for us to see tuples that appear to not be visible to
-		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
-		 * xmin value never moves backwards, but
-		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
-		 * returns a value that's unnecessarily small, so if we see that
-		 * contradiction it just means that the tuples that we think are not
-		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
-		 * is correct.
-		 *
-		 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE
-		 * set, however.
-		 */
-		else if (prunestate.has_lpdead_items && PageIsAllVisible(page))
-		{
-			elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-				 vacrel->relname, blkno);
-			PageClearAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-								VISIBILITYMAP_VALID_BITS);
-		}
-
-		/*
-		 * If the all-visible page is all-frozen but not marked as such yet,
-		 * mark it as all-frozen.  Note that all_frozen is only valid if
-		 * all_visible is true, so we must check both prunestate fields.
-		 */
-		else if (all_visible_according_to_vm && prunestate.all_visible &&
-				 prunestate.all_frozen &&
-				 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-		{
-			/*
-			 * Avoid relying on all_visible_according_to_vm as a proxy for the
-			 * page-level PD_ALL_VISIBLE bit being set, since it might have
-			 * become stale -- even when all_visible is set in prunestate
-			 */
-			if (!PageIsAllVisible(page))
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buf);
-			}
-
-			/*
-			 * Set the page all-frozen (and all-visible) in the VM.
-			 *
-			 * We can pass InvalidTransactionId as our visibility_cutoff_xid,
-			 * since a snapshotConflictHorizon sufficient to make everything
-			 * safe for REDO was logged when the page's tuples were frozen.
-			 */
-			Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
-		}
+		lazy_scan_prune(vacrel, buf, blkno, page,
+						vmbuffer, all_visible_according_to_vm,
+						&prunestate, &recordfreespace);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1486,6 +1378,8 @@ lazy_scan_prune(LVRelState *vacrel,
 				Buffer buf,
 				BlockNumber blkno,
 				Page page,
+				Buffer vmbuffer,
+				bool all_visible_according_to_vm,
 				LVPagePruneState *prunestate,
 				bool *recordfreespace)
 {
@@ -1881,6 +1775,115 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0 ||
 		!vacrel->do_index_vacuuming || lpdead_items == 0)
 		*recordfreespace = true;
+
+	Assert(!prunestate->all_visible || !prunestate->has_lpdead_items);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last lazy_scan_skip() call), and from prunestate
+	 */
+	if (!all_visible_according_to_vm && prunestate->all_visible)
+	{
+		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
+
+		if (prunestate->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+			flags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled).  Regardless, set both bits so that we
+		 * get back in sync.
+		 *
+		 * NB: If the heap page is all-visible but the VM bit is not set, we
+		 * don't need to dirty the heap page.  However, if checksums are
+		 * enabled, we do need to make sure that the heap page is dirtied
+		 * before passing it to visibilitymap_set(), because it may be logged.
+		 * Given that this situation should only happen in rare cases after a
+		 * crash, it is not worth optimizing.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, prunestate->visibility_cutoff_xid,
+						  flags);
+	}
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after lazy_scan_skip() was called, so we must recheck with
+	 * buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
+			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 vacrel->relname, blkno);
+		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prunestate->has_lpdead_items && PageIsAllVisible(page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 vacrel->relname, blkno);
+		PageClearAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * If the all-visible page is all-frozen but not marked as such yet, mark
+	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
+	 * true, so we must check both prunestate fields.
+	 */
+	else if (all_visible_according_to_vm && prunestate->all_visible &&
+			 prunestate->all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	{
+		/*
+		 * Avoid relying on all_visible_according_to_vm as a proxy for the
+		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
+		 * stale -- even when all_visible is set in prunestate
+		 */
+		if (!PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
+		/*
+		 * Set the page all-frozen (and all-visible) in the VM.
+		 *
+		 * We can pass InvalidTransactionId as our visibility_cutoff_xid,
+		 * since a snapshotConflictHorizon sufficient to make everything safe
+		 * for REDO was logged when the page's tuples were frozen.
+		 */
+		Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, InvalidTransactionId,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
+	}
 }
 
 /*
-- 
2.37.2

v7-0003-Inline-LVPagePruneState-members-in-lazy_scan_prun.patchtext/x-patch; charset=US-ASCII; name=v7-0003-Inline-LVPagePruneState-members-in-lazy_scan_prun.patchDownload
From 34081d04bf63af011ec94ed3203a45a916f96b06 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 9 Jan 2024 17:17:01 -0500
Subject: [PATCH v7 3/4] Inline LVPagePruneState members in lazy_scan_prune()

After moving the code to update the visibility map from lazy_scan_heap()
into lazy_scan_prune(), the LVPagePruneState is obsolete. Replace all
instances of use of its members in lazy_scan_prune() with local
variables and remove the struct.
---
 src/backend/access/heap/vacuumlazy.c | 113 +++++++++++++--------------
 src/tools/pgindent/typedefs.list     |   1 -
 2 files changed, 53 insertions(+), 61 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 82495ef8082..12c2fc78a43 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -212,23 +212,6 @@ typedef struct LVRelState
 	int64		missed_dead_tuples; /* # removable, but not removed */
 } LVRelState;
 
-/*
- * State returned by lazy_scan_prune()
- */
-typedef struct LVPagePruneState
-{
-	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
-
-	/*
-	 * State describes the proper VM bit states to set for the page following
-	 * pruning and freezing.  all_visible implies !has_lpdead_items, but don't
-	 * trust all_frozen result unless all_visible is also set to true.
-	 */
-	bool		all_visible;	/* Every item visible to all? */
-	bool		all_frozen;		/* provided all_visible is also true */
-	TransactionId visibility_cutoff_xid;	/* For recovery conflicts */
-} LVPagePruneState;
-
 /* Struct for saving and restoring vacuum error information. */
 typedef struct LVSavedErrInfo
 {
@@ -250,7 +233,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
-							LVPagePruneState *prunestate, bool *recordfreespace);
+							bool *recordfreespace);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *recordfreespace);
@@ -856,7 +839,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		Buffer		buf;
 		Page		page;
 		bool		all_visible_according_to_vm;
-		LVPagePruneState prunestate;
 
 		if (blkno == next_unskippable_block)
 		{
@@ -1024,7 +1006,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 */
 		lazy_scan_prune(vacrel, buf, blkno, page,
 						vmbuffer, all_visible_according_to_vm,
-						&prunestate, &recordfreespace);
+						&recordfreespace);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1380,7 +1362,6 @@ lazy_scan_prune(LVRelState *vacrel,
 				Page page,
 				Buffer vmbuffer,
 				bool all_visible_according_to_vm,
-				LVPagePruneState *prunestate,
 				bool *recordfreespace)
 {
 	Relation	rel = vacrel->rel;
@@ -1394,6 +1375,9 @@ lazy_scan_prune(LVRelState *vacrel,
 				recently_dead_tuples;
 	HeapPageFreeze pagefrz;
 	bool		hastup = false;
+	bool		all_visible,
+				all_frozen;
+	TransactionId visibility_cutoff_xid;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1435,14 +1419,22 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
-	 * requiring freezing among remaining tuples with storage
+	 * requiring freezing among remaining tuples with storage. Keep track of
+	 * the visibility cutoff xid for recovery conflicts.
 	 */
-	prunestate->has_lpdead_items = false;
-	prunestate->all_visible = true;
-	prunestate->all_frozen = true;
-	prunestate->visibility_cutoff_xid = InvalidTransactionId;
+	visibility_cutoff_xid = InvalidTransactionId;
 	*recordfreespace = false;
 
+	/*
+	 * We will update the VM after collecting LP_DEAD items and freezing
+	 * tuples. Keep track of whether or not the page is all_visible and
+	 * all_frozen and use this information to update the VM. all_visible
+	 * implies 0 lpdead_items, but don't trust all_frozen result unless
+	 * all_visible is also set to true.
+	 */
+	all_visible = true;
+	all_frozen = true;
+
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
 		 offnum = OffsetNumberNext(offnum))
@@ -1529,13 +1521,13 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * asynchronously. See SetHintBits for more info. Check that
 				 * the tuple is hinted xmin-committed because of that.
 				 */
-				if (prunestate->all_visible)
+				if (all_visible)
 				{
 					TransactionId xmin;
 
 					if (!HeapTupleHeaderXminCommitted(htup))
 					{
-						prunestate->all_visible = false;
+						all_visible = false;
 						break;
 					}
 
@@ -1547,14 +1539,14 @@ lazy_scan_prune(LVRelState *vacrel,
 					if (!TransactionIdPrecedes(xmin,
 											   vacrel->cutoffs.OldestXmin))
 					{
-						prunestate->all_visible = false;
+						all_visible = false;
 						break;
 					}
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, prunestate->visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
 						TransactionIdIsNormal(xmin))
-						prunestate->visibility_cutoff_xid = xmin;
+						visibility_cutoff_xid = xmin;
 				}
 				break;
 			case HEAPTUPLE_RECENTLY_DEAD:
@@ -1565,7 +1557,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * pruning.)
 				 */
 				recently_dead_tuples++;
-				prunestate->all_visible = false;
+				all_visible = false;
 				break;
 			case HEAPTUPLE_INSERT_IN_PROGRESS:
 
@@ -1576,11 +1568,11 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * results.  This assumption is a bit shaky, but it is what
 				 * acquire_sample_rows() does, so be consistent.
 				 */
-				prunestate->all_visible = false;
+				all_visible = false;
 				break;
 			case HEAPTUPLE_DELETE_IN_PROGRESS:
 				/* This is an expected case during concurrent vacuum */
-				prunestate->all_visible = false;
+				all_visible = false;
 
 				/*
 				 * Count such rows as live.  As above, we assume the deleting
@@ -1610,7 +1602,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * definitely cannot be set all-frozen in the visibility map later on
 		 */
 		if (!totally_frozen)
-			prunestate->all_frozen = false;
+			all_frozen = false;
 	}
 
 	/*
@@ -1628,7 +1620,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * page all-frozen afterwards (might not happen until final heap pass).
 	 */
 	if (pagefrz.freeze_required || tuples_frozen == 0 ||
-		(prunestate->all_visible && prunestate->all_frozen &&
+		(all_visible && all_frozen &&
 		 fpi_before != pgWalUsage.wal_fpi))
 	{
 		/*
@@ -1666,11 +1658,11 @@ lazy_scan_prune(LVRelState *vacrel,
 			 * once we're done with it.  Otherwise we generate a conservative
 			 * cutoff by stepping back from OldestXmin.
 			 */
-			if (prunestate->all_visible && prunestate->all_frozen)
+			if (all_visible && all_frozen)
 			{
 				/* Using same cutoff when setting VM is now unnecessary */
-				snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
-				prunestate->visibility_cutoff_xid = InvalidTransactionId;
+				snapshotConflictHorizon = visibility_cutoff_xid;
+				visibility_cutoff_xid = InvalidTransactionId;
 			}
 			else
 			{
@@ -1693,7 +1685,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 */
 		vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
 		vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
-		prunestate->all_frozen = false;
+		all_frozen = false;
 		tuples_frozen = 0;		/* avoid miscounts in instrumentation */
 	}
 
@@ -1706,16 +1698,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	 */
 #ifdef USE_ASSERT_CHECKING
 	/* Note that all_frozen value does not matter when !all_visible */
-	if (prunestate->all_visible && lpdead_items == 0)
+	if (all_visible && lpdead_items == 0)
 	{
-		TransactionId cutoff;
-		bool		all_frozen;
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen))
+		if (!heap_page_is_all_visible(vacrel, buf,
+									  &debug_cutoff, &debug_all_frozen))
 			Assert(false);
 
-		Assert(!TransactionIdIsValid(cutoff) ||
-			   cutoff == prunestate->visibility_cutoff_xid);
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == visibility_cutoff_xid);
 	}
 #endif
 
@@ -1728,7 +1721,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		ItemPointerData tmp;
 
 		vacrel->lpdead_item_pages++;
-		prunestate->has_lpdead_items = true;
 
 		ItemPointerSetBlockNumber(&tmp, blkno);
 
@@ -1753,7 +1745,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * Now that freezing has been finalized, unset all_visible.  It needs
 		 * to reflect the present state of things, as expected by our caller.
 		 */
-		prunestate->all_visible = false;
+		all_visible = false;
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1776,19 +1768,20 @@ lazy_scan_prune(LVRelState *vacrel,
 		!vacrel->do_index_vacuuming || lpdead_items == 0)
 		*recordfreespace = true;
 
-	Assert(!prunestate->all_visible || !prunestate->has_lpdead_items);
+	Assert(!all_visible || lpdead_items == 0);
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last lazy_scan_skip() call), and from prunestate
+	 * of last lazy_scan_skip() call), and from all_visible and all_frozen
+	 * variables
 	 */
-	if (!all_visible_according_to_vm && prunestate->all_visible)
+	if (!all_visible_according_to_vm && all_visible)
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
-		if (prunestate->all_frozen)
+		if (all_frozen)
 		{
-			Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
@@ -1808,7 +1801,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
 		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-						  vmbuffer, prunestate->visibility_cutoff_xid,
+						  vmbuffer, visibility_cutoff_xid,
 						  flags);
 	}
 
@@ -1841,7 +1834,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
 	 * however.
 	 */
-	else if (prunestate->has_lpdead_items && PageIsAllVisible(page))
+	else if (lpdead_items > 0 && PageIsAllVisible(page))
 	{
 		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
 			 vacrel->relname, blkno);
@@ -1854,16 +1847,16 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both prunestate fields.
+	 * true, so we must check both all_visible and all_frozen.
 	 */
-	else if (all_visible_according_to_vm && prunestate->all_visible &&
-			 prunestate->all_frozen &&
+	else if (all_visible_according_to_vm && all_visible &&
+			 all_frozen &&
 			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		/*
 		 * Avoid relying on all_visible_according_to_vm as a proxy for the
 		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set in prunestate
+		 * stale -- even when all_visible is set
 		 */
 		if (!PageIsAllVisible(page))
 		{
@@ -1878,7 +1871,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * since a snapshotConflictHorizon sufficient to make everything safe
 		 * for REDO was logged when the page's tuples were frozen.
 		 */
-		Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+		Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
 						  vmbuffer, InvalidTransactionId,
 						  VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f582eb59e7d..e0b0ed53d66 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1405,7 +1405,6 @@ LPVOID
 LPWSTR
 LSEG
 LUID
-LVPagePruneState
 LVRelState
 LVSavedErrInfo
 LWLock
-- 
2.37.2

v7-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchtext/x-patch; charset=US-ASCII; name=v7-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchDownload
From 9edfd48edbbc4022370bc232fd680fd548a9dd67 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 9 Jan 2024 15:00:24 -0500
Subject: [PATCH v7 1/4] Set would-be dead items LP_UNUSED while pruning

If there are no indexes on a relation, items can be marked LP_UNUSED
instead of LP_DEAD during while pruning. This avoids a separate
invocation of lazy_vacuum_heap_page() and saves a vacuum WAL record.

To accomplish this, pass heap_page_prune() a new parameter, mark_unused_now,
which indicates that dead line pointers should be set to LP_UNUSED
during pruning, allowing earlier reaping of tuples.

We only enable this for vacuum's pruning. On access pruning always
passes mark_unused_now == false.

Because we don't update the freespace map until after dropping the lock
on the buffer and we need the lock while we update the visibility map,
save our intent to update the freespace map in output parameter
recordfreespace. This parameter will likely be removed, along with the
LVPagePruneState, by future commits pushing per-page processing from
lazy_scan_heap() into lazy_scan_[no]prune().

Discussion: https://postgr.es/m/CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  |  79 +++++++++++---
 src/backend/access/heap/vacuumlazy.c | 153 ++++++++++-----------------
 src/include/access/heapam.h          |   1 +
 3 files changed, 126 insertions(+), 107 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3e0a1a260e6..be83f7bdf05 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -35,6 +35,8 @@ typedef struct
 
 	/* tuple visibility test, initialized for the relation */
 	GlobalVisState *vistest;
+	/* whether or not dead items can be set LP_UNUSED during pruning */
+	bool		mark_unused_now;
 
 	TransactionId new_prune_xid;	/* new prune hint value for page */
 	TransactionId snapshotConflictHorizon;	/* latest xid removed */
@@ -67,6 +69,7 @@ static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum);
 static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum);
 static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
 static void page_verify_redirects(Page page);
 
@@ -148,7 +151,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			PruneResult presult;
 
-			heap_page_prune(relation, buffer, vistest, &presult, NULL);
+			/*
+			 * For now, pass mark_unused_now == false regardless of whether or
+			 * not the relation has indexes, since we cannot safely determine
+			 * that during on-access pruning with the current implementation.
+			 */
+			heap_page_prune(relation, buffer, vistest, false,
+							&presult, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -193,6 +202,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * (see heap_prune_satisfies_vacuum and
  * HeapTupleSatisfiesVacuum).
  *
+ * mark_unused_now indicates whether or not dead items can be set LP_UNUSED during
+ * pruning.
+ *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -203,6 +215,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 void
 heap_page_prune(Relation relation, Buffer buffer,
 				GlobalVisState *vistest,
+				bool mark_unused_now,
 				PruneResult *presult,
 				OffsetNumber *off_loc)
 {
@@ -227,6 +240,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 	prstate.new_prune_xid = InvalidTransactionId;
 	prstate.rel = relation;
 	prstate.vistest = vistest;
+	prstate.mark_unused_now = mark_unused_now;
 	prstate.snapshotConflictHorizon = InvalidTransactionId;
 	prstate.nredirected = prstate.ndead = prstate.nunused = 0;
 	memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -306,9 +320,9 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (off_loc)
 			*off_loc = offnum;
 
-		/* Nothing to do if slot is empty or already dead */
+		/* Nothing to do if slot is empty */
 		itemid = PageGetItemId(page, offnum);
-		if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+		if (!ItemIdIsUsed(itemid))
 			continue;
 
 		/* Process this item or chain of items */
@@ -581,7 +595,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * function.)
 		 */
 		if (ItemIdIsDead(lp))
+		{
+			/*
+			 * If the caller set mark_unused_now true, we can set dead line
+			 * pointers LP_UNUSED now. We don't increment ndeleted here since
+			 * the LP was already marked dead.
+			 */
+			if (unlikely(prstate->mark_unused_now))
+				heap_prune_record_unused(prstate, offnum);
+
 			break;
+		}
 
 		Assert(ItemIdIsNormal(lp));
 		htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -715,7 +739,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * redirect the root to the correct chain member.
 		 */
 		if (i >= nchain)
-			heap_prune_record_dead(prstate, rootoffnum);
+			heap_prune_record_dead_or_unused(prstate, rootoffnum);
 		else
 			heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
 	}
@@ -726,9 +750,9 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * item.  This can happen if the loop in heap_page_prune caused us to
 		 * visit the dead successor of a redirect item before visiting the
 		 * redirect item.  We can clean up by setting the redirect item to
-		 * DEAD state.
+		 * DEAD state or LP_UNUSED if the caller indicated.
 		 */
-		heap_prune_record_dead(prstate, rootoffnum);
+		heap_prune_record_dead_or_unused(prstate, rootoffnum);
 	}
 
 	return ndeleted;
@@ -774,6 +798,27 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
 	prstate->marked[offnum] = true;
 }
 
+/*
+ * Depending on whether or not the caller set mark_unused_now to true, record that a
+ * line pointer should be marked LP_DEAD or LP_UNUSED. There are other cases in
+ * which we will mark line pointers LP_UNUSED, but we will not mark line
+ * pointers LP_DEAD if mark_unused_now is true.
+ */
+static void
+heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
+{
+	/*
+	 * If the caller set mark_unused_now to true, we can remove dead tuples
+	 * during pruning instead of marking their line pointers dead. Set this
+	 * tuple's line pointer LP_UNUSED. We hint that this option is less
+	 * likely.
+	 */
+	if (unlikely(prstate->mark_unused_now))
+		heap_prune_record_unused(prstate, offnum);
+	else
+		heap_prune_record_dead(prstate, offnum);
+}
+
 /* Record line pointer to be marked unused */
 static void
 heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
@@ -903,13 +948,23 @@ heap_page_prune_execute(Buffer buffer,
 #ifdef USE_ASSERT_CHECKING
 
 		/*
-		 * Only heap-only tuples can become LP_UNUSED during pruning.  They
-		 * don't need to be left in place as LP_DEAD items until VACUUM gets
-		 * around to doing index vacuuming.
+		 * During pruning, the caller may have passed mark_unused_now == true,
+		 * which allows would-be dead items to be set LP_UNUSED instead. This
+		 * is only possible if the relation has no indexes. If there are any
+		 * dead items, then mark_unused_now was not true and every item being
+		 * marked LP_UNUSED must refer to a heap-only tuple.
 		 */
-		Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
-		htup = (HeapTupleHeader) PageGetItem(page, lp);
-		Assert(HeapTupleHeaderIsHeapOnly(htup));
+		if (ndead > 0)
+		{
+			Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
+			htup = (HeapTupleHeader) PageGetItem(page, lp);
+			Assert(HeapTupleHeaderIsHeapOnly(htup));
+		}
+		else
+		{
+			Assert(ItemIdIsUsed(lp));
+		}
+
 #endif
 
 		ItemIdSetUnused(lp);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b63cad1335f..85c0919dc71 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -249,7 +249,8 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							LVPagePruneState *prunestate);
+							LVPagePruneState *prunestate,
+							bool *recordfreespace);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *recordfreespace);
@@ -829,8 +830,10 @@ lazy_scan_heap(LVRelState *vacrel)
 				next_fsm_block_to_vacuum = 0;
 	VacDeadItems *dead_items = vacrel->dead_items;
 	Buffer		vmbuffer = InvalidBuffer;
+	int			tuples_already_deleted;
 	bool		next_unskippable_allvis,
 				skipping_current_range;
+	bool		recordfreespace;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -958,8 +961,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		recordfreespace;
-
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
 			/* Check for new or empty pages before lazy_scan_noprune call */
@@ -1009,6 +1010,8 @@ lazy_scan_heap(LVRelState *vacrel)
 			continue;
 		}
 
+		tuples_already_deleted = vacrel->tuples_deleted;
+
 		/*
 		 * Prune, freeze, and count tuples.
 		 *
@@ -1019,73 +1022,10 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * tuple headers of remaining items with storage. It also determines
 		 * if truncating this block is safe.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate);
+		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate, &recordfreespace);
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
-		if (vacrel->nindexes == 0)
-		{
-			/*
-			 * Consider the need to do page-at-a-time heap vacuuming when
-			 * using the one-pass strategy now.
-			 *
-			 * The one-pass strategy will never call lazy_vacuum().  The steps
-			 * performed here can be thought of as the one-pass equivalent of
-			 * a call to lazy_vacuum().
-			 */
-			if (prunestate.has_lpdead_items)
-			{
-				Size		freespace;
-
-				lazy_vacuum_heap_page(vacrel, blkno, buf, 0, vmbuffer);
-
-				/* Forget the LP_DEAD items that we just vacuumed */
-				dead_items->num_items = 0;
-
-				/*
-				 * Now perform FSM processing for blkno, and move on to next
-				 * page.
-				 *
-				 * Our call to lazy_vacuum_heap_page() will have considered if
-				 * it's possible to set all_visible/all_frozen independently
-				 * of lazy_scan_prune().  Note that prunestate was invalidated
-				 * by lazy_vacuum_heap_page() call.
-				 */
-				freespace = PageGetHeapFreeSpace(page);
-
-				UnlockReleaseBuffer(buf);
-				RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-
-				/*
-				 * Periodically perform FSM vacuuming to make newly-freed
-				 * space visible on upper FSM pages. FreeSpaceMapVacuumRange()
-				 * vacuums the portion of the freespace map covering heap
-				 * pages from start to end - 1. Include the block we just
-				 * vacuumed by passing it blkno + 1. Overflow isn't an issue
-				 * because MaxBlockNumber + 1 is InvalidBlockNumber which
-				 * causes FreeSpaceMapVacuumRange() to vacuum freespace map
-				 * pages covering the remainder of the relation.
-				 */
-				if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
-				{
-					FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-											blkno + 1);
-					next_fsm_block_to_vacuum = blkno + 1;
-				}
-
-				continue;
-			}
-
-			/*
-			 * There was no call to lazy_vacuum_heap_page() because pruning
-			 * didn't encounter/create any LP_DEAD items that needed to be
-			 * vacuumed.  Prune state has not been invalidated, so proceed
-			 * with prunestate-driven visibility map and FSM steps (just like
-			 * the two-pass strategy).
-			 */
-			Assert(dead_items->num_items == 0);
-		}
-
 		/*
 		 * Handle setting visibility map bit based on information from the VM
 		 * (as of last lazy_scan_skip() call), and from prunestate
@@ -1196,38 +1136,45 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
-		 * FSM
+		 * FSM.
+		 *
+		 * If we will likely do index vacuuming, wait until
+		 * lazy_vacuum_heap_rel() to save free space. This doesn't just save
+		 * us some cycles; it also allows us to record any additional free
+		 * space that lazy_vacuum_heap_page() will make available in cases
+		 * where it's possible to truncate the page's line pointer array.
+		 *
+		 * Note: It's not in fact 100% certain that we really will call
+		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
+		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
+		 * because it only happens in emergencies, or when there is very
+		 * little free space anyway. (Besides, we start recording free space
+		 * in the FSM once index vacuuming has been abandoned.)
 		 */
-		if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
-		{
-			/*
-			 * Wait until lazy_vacuum_heap_rel() to save free space.  This
-			 * doesn't just save us some cycles; it also allows us to record
-			 * any additional free space that lazy_vacuum_heap_page() will
-			 * make available in cases where it's possible to truncate the
-			 * page's line pointer array.
-			 *
-			 * Note: It's not in fact 100% certain that we really will call
-			 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip
-			 * index vacuuming (and so must skip heap vacuuming).  This is
-			 * deemed okay because it only happens in emergencies, or when
-			 * there is very little free space anyway. (Besides, we start
-			 * recording free space in the FSM once index vacuuming has been
-			 * abandoned.)
-			 *
-			 * Note: The one-pass (no indexes) case is only supposed to make
-			 * it this far when there were no LP_DEAD items during pruning.
-			 */
-			Assert(vacrel->nindexes > 0);
-			UnlockReleaseBuffer(buf);
-		}
-		else
+		if (recordfreespace)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
 			UnlockReleaseBuffer(buf);
 			RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
 		}
+		else
+			UnlockReleaseBuffer(buf);
+
+		/*
+		 * Periodically perform FSM vacuuming to make newly-freed space
+		 * visible on upper FSM pages. This is done after vacuuming if the
+		 * table has indexes.
+		 */
+		if (vacrel->nindexes == 0 &&
+			vacrel->tuples_deleted > tuples_already_deleted &&
+			(blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES))
+		{
+			FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
+									blkno);
+			next_fsm_block_to_vacuum = blkno;
+		}
+
 	}
 
 	vacrel->blkno = InvalidBlockNumber;
@@ -1539,7 +1486,8 @@ lazy_scan_prune(LVRelState *vacrel,
 				Buffer buf,
 				BlockNumber blkno,
 				Page page,
-				LVPagePruneState *prunestate)
+				LVPagePruneState *prunestate,
+				bool *recordfreespace)
 {
 	Relation	rel = vacrel->rel;
 	OffsetNumber offnum,
@@ -1583,8 +1531,13 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * in presult.ndeleted. It should not be confused with lpdead_items;
 	 * lpdead_items's final value can be thought of as the number of tuples
 	 * that were deleted from indexes.
+	 *
+	 * If the relation has no indexes, we can immediately mark would-be dead
+	 * items LP_UNUSED, so mark_unused_now should be true if no indexes and
+	 * false otherwise.
 	 */
-	heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum);
+	heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+					&presult, &vacrel->offnum);
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
@@ -1594,6 +1547,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	prunestate->all_visible = true;
 	prunestate->all_frozen = true;
 	prunestate->visibility_cutoff_xid = InvalidTransactionId;
+	*recordfreespace = false;
 
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
@@ -1918,6 +1872,15 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Can't truncate this page */
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
+
+	/*
+	 * If we will not do index vacuuming, either because we have no indexes,
+	 * because there is nothing to vacuum, or because do_index_vacuuming is
+	 * false, make sure we update the freespace map.
+	 */
+	if (vacrel->nindexes == 0 ||
+		!vacrel->do_index_vacuuming || lpdead_items == 0)
+		*recordfreespace = true;
 }
 
 /*
@@ -2519,7 +2482,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
 
-	Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
+	Assert(vacrel->do_index_vacuuming);
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 932ec0d6f2b..4b133f68593 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -320,6 +320,7 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
+							bool mark_unused_now,
 							PruneResult *presult,
 							OffsetNumber *off_loc);
 extern void heap_page_prune_execute(Buffer buffer,
-- 
2.37.2

v7-0004-Move-freespace-map-update-into-lazy_scan_-no-prun.patchtext/x-patch; charset=US-ASCII; name=v7-0004-Move-freespace-map-update-into-lazy_scan_-no-prun.patchDownload
From d58851de77c61b70abb6e37b8b7192cba1df756e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 9 Jan 2024 17:18:06 -0500
Subject: [PATCH v7 4/4] Move freespace map update into lazy_scan_[no]prune()

Previously, lazy_scan_heap() updated the freespace map after calling
both lazy_scan_prune() and lazy_scan_noprune() if any space has been
reclaimed. They both had an output parameter recordfreespace to enable
lazy_scan_heap() to do this. Instead, update the freespace map in
lazy_scan_prune() and lazy_scan_noprune(). With this change, all
page-specific processing is done in lazy_scan_prune(),
lazy_scan_noprune(), and lazy_scan_new_or_empty(). This simplifies
lazy_scan_heap().
---
 src/backend/access/heap/vacuumlazy.c | 118 +++++++++++++--------------
 1 file changed, 58 insertions(+), 60 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 12c2fc78a43..cacd87cf0a9 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -232,11 +232,9 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
-							Buffer vmbuffer, bool all_visible_according_to_vm,
-							bool *recordfreespace);
+							Buffer vmbuffer, bool all_visible_according_to_vm);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
-							  BlockNumber blkno, Page page,
-							  bool *recordfreespace);
+							  BlockNumber blkno, Page page);
 static void lazy_vacuum(LVRelState *vacrel);
 static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
 static void lazy_vacuum_heap_rel(LVRelState *vacrel);
@@ -816,7 +814,6 @@ lazy_scan_heap(LVRelState *vacrel)
 	int			tuples_already_deleted;
 	bool		next_unskippable_allvis,
 				skipping_current_range;
-	bool		recordfreespace;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -955,24 +952,14 @@ lazy_scan_heap(LVRelState *vacrel)
 
 			/*
 			 * Collect LP_DEAD items in dead_items array, count tuples,
-			 * determine if rel truncation is safe
+			 * determine if rel truncation is safe, and update the FSM.
 			 */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page,
-								  &recordfreespace))
+			if (lazy_scan_noprune(vacrel, buf, blkno, page))
 			{
-				Size		freespace = 0;
-
 				/*
-				 * Processed page successfully (without cleanup lock) -- just
-				 * need to update the FSM, much like the lazy_scan_prune case.
-				 * Don't bother trying to match its visibility map setting
-				 * steps, though.
+				 * Processed page successfully (without cleanup lock). Buffer
+				 * lock and pin released.
 				 */
-				if (recordfreespace)
-					freespace = PageGetHeapFreeSpace(page);
-				UnlockReleaseBuffer(buf);
-				if (recordfreespace)
-					RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
 				continue;
 			}
 
@@ -1002,38 +989,11 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * pruned ourselves, as well as existing LP_DEAD line pointers that
 		 * were pruned some time earlier.  Also considers freezing XIDs in the
 		 * tuple headers of remaining items with storage. It also determines
-		 * if truncating this block is safe.
+		 * if truncating this block is safe and updates the FSM (if relevant).
+		 * Buffer is returned with lock and pin released.
 		 */
 		lazy_scan_prune(vacrel, buf, blkno, page,
-						vmbuffer, all_visible_according_to_vm,
-						&recordfreespace);
-
-		/*
-		 * Final steps for block: drop cleanup lock, record free space in the
-		 * FSM.
-		 *
-		 * If we will likely do index vacuuming, wait until
-		 * lazy_vacuum_heap_rel() to save free space. This doesn't just save
-		 * us some cycles; it also allows us to record any additional free
-		 * space that lazy_vacuum_heap_page() will make available in cases
-		 * where it's possible to truncate the page's line pointer array.
-		 *
-		 * Note: It's not in fact 100% certain that we really will call
-		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
-		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
-		 * because it only happens in emergencies, or when there is very
-		 * little free space anyway. (Besides, we start recording free space
-		 * in the FSM once index vacuuming has been abandoned.)
-		 */
-		if (recordfreespace)
-		{
-			Size		freespace = PageGetHeapFreeSpace(page);
-
-			UnlockReleaseBuffer(buf);
-			RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-		}
-		else
-			UnlockReleaseBuffer(buf);
+						vmbuffer, all_visible_according_to_vm);
 
 		/*
 		 * Periodically perform FSM vacuuming to make newly-freed space
@@ -1361,8 +1321,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				BlockNumber blkno,
 				Page page,
 				Buffer vmbuffer,
-				bool all_visible_according_to_vm,
-				bool *recordfreespace)
+				bool all_visible_according_to_vm)
 {
 	Relation	rel = vacrel->rel;
 	OffsetNumber offnum,
@@ -1377,6 +1336,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	bool		hastup = false;
 	bool		all_visible,
 				all_frozen;
+	bool		recordfreespace;
 	TransactionId visibility_cutoff_xid;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1423,7 +1383,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * the visibility cutoff xid for recovery conflicts.
 	 */
 	visibility_cutoff_xid = InvalidTransactionId;
-	*recordfreespace = false;
+	recordfreespace = false;
 
 	/*
 	 * We will update the VM after collecting LP_DEAD items and freezing
@@ -1766,7 +1726,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 */
 	if (vacrel->nindexes == 0 ||
 		!vacrel->do_index_vacuuming || lpdead_items == 0)
-		*recordfreespace = true;
+		recordfreespace = true;
 
 	Assert(!all_visible || lpdead_items == 0);
 
@@ -1877,6 +1837,32 @@ lazy_scan_prune(LVRelState *vacrel,
 						  VISIBILITYMAP_ALL_VISIBLE |
 						  VISIBILITYMAP_ALL_FROZEN);
 	}
+
+	/*
+	 * Final steps for block: drop cleanup lock, record free space in the FSM.
+	 *
+	 * If we will likely do index vacuuming, wait until lazy_vacuum_heap_rel()
+	 * to save free space. This doesn't just save us some cycles; it also
+	 * allows us to record any additional free space that
+	 * lazy_vacuum_heap_page() will make available in cases where it's
+	 * possible to truncate the page's line pointer array.
+	 *
+	 * Note: It's not in fact 100% certain that we really will call
+	 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
+	 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
+	 * because it only happens in emergencies, or when there is very little
+	 * free space anyway. (Besides, we start recording free space in the FSM
+	 * once index vacuuming has been abandoned.)
+	 */
+	if (recordfreespace)
+	{
+		Size		freespace = PageGetHeapFreeSpace(page);
+
+		UnlockReleaseBuffer(buf);
+		RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
+	}
+	else
+		UnlockReleaseBuffer(buf);
 }
 
 /*
@@ -1901,8 +1887,7 @@ static bool
 lazy_scan_noprune(LVRelState *vacrel,
 				  Buffer buf,
 				  BlockNumber blkno,
-				  Page page,
-				  bool *recordfreespace)
+				  Page page)
 {
 	OffsetNumber offnum,
 				maxoff;
@@ -1911,6 +1896,8 @@ lazy_scan_noprune(LVRelState *vacrel,
 				recently_dead_tuples,
 				missed_dead_tuples;
 	bool		hastup;
+	bool		recordfreespace;
+	Size		freespace = 0;
 	HeapTupleHeader tupleheader;
 	TransactionId NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
 	MultiXactId NoFreezePageRelminMxid = vacrel->NewRelminMxid;
@@ -1919,7 +1906,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
 	hastup = false;				/* for now */
-	*recordfreespace = false;	/* for now */
+	recordfreespace = false;	/* for now */
 
 	lpdead_items = 0;
 	live_tuples = 0;
@@ -2062,7 +2049,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 			missed_dead_tuples += lpdead_items;
 		}
 
-		*recordfreespace = true;
+		recordfreespace = true;
 	}
 	else if (lpdead_items == 0)
 	{
@@ -2070,7 +2057,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 		 * Won't be vacuuming this page later, so record page's freespace in
 		 * the FSM now
 		 */
-		*recordfreespace = true;
+		recordfreespace = true;
 	}
 	else
 	{
@@ -2102,7 +2089,7 @@ lazy_scan_noprune(LVRelState *vacrel,
 		 * Assume that we'll go on to vacuum this heap page during final pass
 		 * over the heap.  Don't record free space until then.
 		 */
-		*recordfreespace = false;
+		recordfreespace = false;
 	}
 
 	/*
@@ -2118,7 +2105,18 @@ lazy_scan_noprune(LVRelState *vacrel,
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
 
-	/* Caller won't need to call lazy_scan_prune with same page */
+	/*
+	 * Since we processed page successfully without cleanup lock, caller won't
+	 * need to call lazy_scan_prune() with the same page. Now, we just need to
+	 * update the FSM, much like the lazy_scan_prune case. Don't bother trying
+	 * to match its visibility map setting steps, though.
+	 */
+	if (recordfreespace)
+		freespace = PageGetHeapFreeSpace(page);
+	UnlockReleaseBuffer(buf);
+	if (recordfreespace)
+		RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
+
 	return true;
 }
 
-- 
2.37.2

#45Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#44)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 11, 2024 at 2:30 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Attached v7 is rebased over the commit you just made to remove
LVPagePruneState->hastup.

Apologies for making you rebase but I was tired of thinking about that patch.

I'm still kind of hung up on the changes that 0001 makes to vacuumlazy.c.

Say we didn't add the recordfreespace parameter to lazy_scan_prune().
Couldn't the caller just compute it? lpdead_items goes out of scope,
but there's also prstate.has_lpdead_items.

Pressing that gripe a bit more, I just figured out that "Wait until
lazy_vacuum_heap_rel() to save free space" gets turned into "If we
will likely do index vacuuming, wait until lazy_vacuum_heap_rel() to
save free space." That follows the good practice of phrasing the
comment conditionally when the comment is outside the if-statement.
But the if-statement says merely "if (recordfreespace)", which is not
obviously related to "If we will likely do index vacuuming" but under
the hood actually is. But it seems like an unpleasant amount of action
at a distance. If that condition said if (vacrel->nindexes > 0 &&
vacrel->do_index_vacuuming && prstate.has_lpdead_items)
UnlockReleaseBuffer(); else { PageGetHeapFreeSpace;
UnlockReleaseBuffer; RecordPageWithFreeSpace } it would be a lot more
obvious that the code was doing what the comment says.

That's a refactoring question, but I'm also wondering if there's a
functional issue. Pre-patch, if the table has no indexes and some
items are left LP_DEAD, then we mark them unused, forget them,
RecordPageWithFreeSpace(), and if enough pages have been visited,
FreeSpaceMapVacuumRange(). Post-patch, if the table has no indexes, no
items will be left LP_DEAD, and *recordfreespace will be set to true,
so we'll PageRecordFreeSpace(). But this seems to me to mean that
post-patch we'll PageRecordFreeSpace() in more cases than we do now.
Specifically, imagine a case where the current code wouldn't mark any
items LP_DEAD and the page also had no pre-existing items that were
LP_DEAD and the table also has no indexes. Currently, I believe we
wouldn't PageRecordFreeSpace(), but with the patch, I think we would.
Am I wrong?

Note that the call to FreeSpaceMapVacuumRange() seems to try to guard
against this problem by testing for vacrel->tuples_deleted >
tuples_already_deleted. I haven't tried to verify whether that guard
is correct, but the fact that FreeSpaceMapVacuumRange() has such a
guard and RecordPageWithFreeSpace() does not have one makes me
suspicious.

Another interesting effect of the patch is that instead of getting
lazy_vacuum_heap_page()'s handling of the all-visible status of the
page, we get the somewhat more complex handling done by
lazy_scan_heap(). I haven't fully through the consequences of that,
but if you have, I'd be interested to hear your analysis.

--
Robert Haas
EDB: http://www.enterprisedb.com

#46Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#45)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 11, 2024 at 4:49 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 11, 2024 at 2:30 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I'm still kind of hung up on the changes that 0001 makes to vacuumlazy.c.

Say we didn't add the recordfreespace parameter to lazy_scan_prune().
Couldn't the caller just compute it? lpdead_items goes out of scope,
but there's also prstate.has_lpdead_items.

Pressing that gripe a bit more, I just figured out that "Wait until
lazy_vacuum_heap_rel() to save free space" gets turned into "If we
will likely do index vacuuming, wait until lazy_vacuum_heap_rel() to
save free space." That follows the good practice of phrasing the
comment conditionally when the comment is outside the if-statement.
But the if-statement says merely "if (recordfreespace)", which is not
obviously related to "If we will likely do index vacuuming" but under
the hood actually is. But it seems like an unpleasant amount of action
at a distance. If that condition said if (vacrel->nindexes > 0 &&
vacrel->do_index_vacuuming && prstate.has_lpdead_items)
UnlockReleaseBuffer(); else { PageGetHeapFreeSpace;
UnlockReleaseBuffer; RecordPageWithFreeSpace } it would be a lot more
obvious that the code was doing what the comment says.

Yes, this is a good point. Seems like writing maintainable code is
really about never lying and then figuring out when hiding the truth
is also lying. Darn!

My only excuse is that lazy_scan_noprune() has a similarly confusing
output parameter, recordfreespace, both of which I removed in a later
patch in the set. I think we should change it as you say.

That's a refactoring question, but I'm also wondering if there's a
functional issue. Pre-patch, if the table has no indexes and some
items are left LP_DEAD, then we mark them unused, forget them,
RecordPageWithFreeSpace(), and if enough pages have been visited,
FreeSpaceMapVacuumRange(). Post-patch, if the table has no indexes, no
items will be left LP_DEAD, and *recordfreespace will be set to true,
so we'll PageRecordFreeSpace(). But this seems to me to mean that
post-patch we'll PageRecordFreeSpace() in more cases than we do now.
Specifically, imagine a case where the current code wouldn't mark any
items LP_DEAD and the page also had no pre-existing items that were
LP_DEAD and the table also has no indexes. Currently, I believe we
wouldn't PageRecordFreeSpace(), but with the patch, I think we would.
Am I wrong?

Ah! I think you are right. Good catch. I could fix this with logic like this:

bool space_freed = vacrel->tuples_deleted > tuples_already_deleted;
if ((vacrel->nindexes == 0 && space_freed) ||
(vacrel->nindexes > 0 && (space_freed || !vacrel->do_index_vacuuming)))

I think I made this mistake when working on a different version of
this that combined the prune and no prune cases. I noticed that
lazy_scan_noprune() updates the FSM whenever there are no indexes. I
wonder why this is (and why we don't do it in the prune case).

Note that the call to FreeSpaceMapVacuumRange() seems to try to guard
against this problem by testing for vacrel->tuples_deleted >
tuples_already_deleted. I haven't tried to verify whether that guard
is correct, but the fact that FreeSpaceMapVacuumRange() has such a
guard and RecordPageWithFreeSpace() does not have one makes me
suspicious.

FreeSpaceMapVacuumRange() is not called for the no prune case, so I
think this is right.

Another interesting effect of the patch is that instead of getting
lazy_vacuum_heap_page()'s handling of the all-visible status of the
page, we get the somewhat more complex handling done by
lazy_scan_heap(). I haven't fully through the consequences of that,
but if you have, I'd be interested to hear your analysis.

lazy_vacuum_heap_page() calls heap_page_is_all_visible() which does
another HeapTupleSatisfiesVacuum() call -- which is definitely going
to be more expensive than not doing that. In one case, in
lazy_scan_heap(), we might do a visibilitymap_get_status() (via
VM_ALL_FROZEN()) to avoid calling visibilitymap_set() if the page is
already marked all frozen in the VM. But that would pale in comparison
to another HeapTupleSatisfiesVacuum() (I think).

The VM update code in lazy_scan_heap() looks complicated but two out
of four cases are basically to deal with uncommon data corruption.

- Melanie

#47Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#43)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 11, 2024 at 01:43:27PM -0500, Robert Haas wrote:

On Tue, Jan 9, 2024 at 2:35 PM Andres Freund <andres@anarazel.de> wrote:

I don't have that strong feelings about it. If both of you think it
looks good, go ahead...

Done.

Thanks for e2d5b3b9b643.
--
Michael

#48Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#46)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 11, 2024 at 9:05 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Yes, this is a good point. Seems like writing maintainable code is
really about never lying and then figuring out when hiding the truth
is also lying. Darn!

I think that's pretty true. In this particular case, this code has a
fair number of preexisting problems of this type, IMHO. It's been
touched by many hands, each person with their own design ideas, and
the result isn't as coherent as if it were written by a single mind
from scratch. But, because this code is also absolutely critical to
the system not eating user data, any changes have to be approached
with the highest level of caution. I think it's good to be really
careful about this sort of refactoring anywhere, because finding out a
year later that you broke something and having to go back and fix it
is no fun even if the consequences are minor ... here, they might not
be.

Ah! I think you are right. Good catch. I could fix this with logic like this:

bool space_freed = vacrel->tuples_deleted > tuples_already_deleted;
if ((vacrel->nindexes == 0 && space_freed) ||
(vacrel->nindexes > 0 && (space_freed || !vacrel->do_index_vacuuming)))

Perhaps that would be better written as space_freed ||
(vacrel->nindexes > 0 && !vacrel->do_index_vacuuming), at which point
you might not need to introduce the variable.

I think I made this mistake when working on a different version of
this that combined the prune and no prune cases. I noticed that
lazy_scan_noprune() updates the FSM whenever there are no indexes. I
wonder why this is (and why we don't do it in the prune case).

Yeah, this all seems distinctly under-commented. As noted above, this
code has grown organically, and some things just never got written
down.

Looking through the git history, I see that this behavior seems to
date back to 44fa84881fff4529d68e2437a58ad2c906af5805 which introduced
lazy_scan_noprune(). The comments don't explain the reasoning, but my
guess is that it was just an accident. It's not entirely evident to me
whether there might ever be good reasons to update the freespace map
for a page where we haven't freed up any space -- after all, the free
space map isn't crash-safe, so you could always postulate that
updating it will correct an existing inaccuracy. But I really doubt
that there's any good reason for lazy_scan_prune() and
lazy_scan_noprune() to have different ideas about whether to update
the FSM or not, especially in an obscure corner case like this.

--
Robert Haas
EDB: http://www.enterprisedb.com

In reply to: Robert Haas (#48)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 9:44 AM Robert Haas <robertmhaas@gmail.com> wrote:

Looking through the git history, I see that this behavior seems to
date back to 44fa84881fff4529d68e2437a58ad2c906af5805 which introduced
lazy_scan_noprune(). The comments don't explain the reasoning, but my
guess is that it was just an accident. It's not entirely evident to me
whether there might ever be good reasons to update the freespace map
for a page where we haven't freed up any space -- after all, the free
space map isn't crash-safe, so you could always postulate that
updating it will correct an existing inaccuracy. But I really doubt
that there's any good reason for lazy_scan_prune() and
lazy_scan_noprune() to have different ideas about whether to update
the FSM or not, especially in an obscure corner case like this.

Why do you think that lazy_scan_prune() and lazy_scan_noprune() have
different ideas about whether to update the FSM or not?

Barring certain failsafe edge cases, we always call
PageGetHeapFreeSpace() exactly once for each scanned page. While it's
true that we won't always do that in the first heap pass (in cases
where VACUUM has indexes), that's only because we expect to do it in
the second heap pass instead -- since we only want to do it once.

It's true that lazy_scan_noprune unconditionally calls
PageGetHeapFreeSpace() when vacrel->nindexes == 0. But that's the same
behavior as lazy_scan_prune when vacrel-> nindexes == 0. In both cases
we know that there won't be any second heap pass, and so in both cases
we always call PageGetHeapFreeSpace() in the first heap pass. It's
just that it's a bit harder to see that in the lazy_scan_prune case.
No?

--
Peter Geoghegan

#50Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#48)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 9:44 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 11, 2024 at 9:05 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Ah! I think you are right. Good catch. I could fix this with logic like this:

bool space_freed = vacrel->tuples_deleted > tuples_already_deleted;
if ((vacrel->nindexes == 0 && space_freed) ||
(vacrel->nindexes > 0 && (space_freed || !vacrel->do_index_vacuuming)))

Perhaps that would be better written as space_freed ||
(vacrel->nindexes > 0 && !vacrel->do_index_vacuuming), at which point
you might not need to introduce the variable.

As I revisited this, I realized why I had written this logic for
updating the FSM:

if (vacrel->nindexes == 0 ||
!vacrel->do_index_vacuuming || lpdead_items == 0)

It is because in master, in lazy_scan_heap(), when there are no
indexes and no space has been freed, we actually keep going and hit
the other logic to update the FSM at the bottom of the loop:

if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)

The effect is that if there are no indexes and no space is freed we
always update the FSM. When combined, that means that if there are no
indexes, we always update the FSM. With the logic you suggested, we
fail a pageinspect test which expects the FSM to exist when it
doesn't. So, adding the logic:

if (space_freed ||
(vacrel->nindexes > 0 && !vacrel->do_index_vacuuming) ||
vacrel->nindexes == 0)

We pass that pageinspect test. But, we fail a freespacemap test which
expects some freespace reported which isn't:

        id        | blkno | is_avail
 -----------------+-------+----------
- freespace_tab   |     0 | t
+ freespace_tab   |     0 | f

which I presume is because space_freed is not the same as !has_lpdead_items.

I can't really see what logic would be right here.

I think I made this mistake when working on a different version of
this that combined the prune and no prune cases. I noticed that
lazy_scan_noprune() updates the FSM whenever there are no indexes. I
wonder why this is (and why we don't do it in the prune case).

Yeah, this all seems distinctly under-commented. As noted above, this
code has grown organically, and some things just never got written
down.

Looking through the git history, I see that this behavior seems to
date back to 44fa84881fff4529d68e2437a58ad2c906af5805 which introduced
lazy_scan_noprune(). The comments don't explain the reasoning, but my
guess is that it was just an accident. It's not entirely evident to me
whether there might ever be good reasons to update the freespace map
for a page where we haven't freed up any space -- after all, the free
space map isn't crash-safe, so you could always postulate that
updating it will correct an existing inaccuracy. But I really doubt
that there's any good reason for lazy_scan_prune() and
lazy_scan_noprune() to have different ideas about whether to update
the FSM or not, especially in an obscure corner case like this.

All of this makes me wonder why we would update the FSM in vacuum when
no space was freed. I thought that RelationAddBlocks() and
RelationGetBufferForTuple() would handle making sure the FSM gets
created and updated if the reason there is freespace is because the
page hasn't been filled yet.

- Melanie

#51Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#49)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 11:50 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Fri, Jan 12, 2024 at 9:44 AM Robert Haas <robertmhaas@gmail.com> wrote:

Looking through the git history, I see that this behavior seems to
date back to 44fa84881fff4529d68e2437a58ad2c906af5805 which introduced
lazy_scan_noprune(). The comments don't explain the reasoning, but my
guess is that it was just an accident. It's not entirely evident to me
whether there might ever be good reasons to update the freespace map
for a page where we haven't freed up any space -- after all, the free
space map isn't crash-safe, so you could always postulate that
updating it will correct an existing inaccuracy. But I really doubt
that there's any good reason for lazy_scan_prune() and
lazy_scan_noprune() to have different ideas about whether to update
the FSM or not, especially in an obscure corner case like this.

Why do you think that lazy_scan_prune() and lazy_scan_noprune() have
different ideas about whether to update the FSM or not?

Barring certain failsafe edge cases, we always call
PageGetHeapFreeSpace() exactly once for each scanned page. While it's
true that we won't always do that in the first heap pass (in cases
where VACUUM has indexes), that's only because we expect to do it in
the second heap pass instead -- since we only want to do it once.

It's true that lazy_scan_noprune unconditionally calls
PageGetHeapFreeSpace() when vacrel->nindexes == 0. But that's the same
behavior as lazy_scan_prune when vacrel-> nindexes == 0. In both cases
we know that there won't be any second heap pass, and so in both cases
we always call PageGetHeapFreeSpace() in the first heap pass. It's
just that it's a bit harder to see that in the lazy_scan_prune case.
No?

So, I think this is the logic in master:

Prune case, first pass

- indexes == 0 && space_freed -> update FSM
- indexes == 0 && (!space_freed || !index_vacuuming) -> update FSM
- indexes > 0 && (!space_freed || !index_vacuuming) -> update FSM

which reduces to:

- indexes == 0 || !space_freed || !index_vacuuming

No Prune (except aggressive vacuum), first pass:

- indexes == 0 -> update FSM
- indexes > 0 && !space_freed -> update FSM

which reduces to:

- indexes == 0 || !space_freed

which is, during the first pass, if we will not do index vacuuming,
either because we have no indexes, because there is nothing to vacuum,
or because do_index_vacuuming is false, make sure we update the
freespace map now.

but what about no prune when do_index_vacuuming is false?

I still don't understand why vacuum is responsible for updating the
FSM per page when no line pointers have been set unused. That is how
PageGetFreeSpace() figures out if there is free space, right?

- Melanie

In reply to: Melanie Plageman (#51)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 12:33 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

So, I think this is the logic in master:

Prune case, first pass

...
- indexes > 0 && (!space_freed || !index_vacuuming) -> update FSM

What is "space_freed"? Isn't that something from your uncommitted patch?

As I said, the aim is to call PageGetHeapFreeSpace() (*not*
PageGetFreeSpace(), which is only used for index pages) exactly once
per heap page scanned. This is supposed to happen independently of
whatever specific work was/will be required for the heap page. In
general, we don't ever trust that the FSM is already up-to-date.
Presumably because the FSM isn't crash safe.

On master, prunestate.has_lpdead_items may be set true when our VACUUM
wasn't actually the thing that performed pruning that freed tuple
storage -- typically when some other backend was the one that did all
required pruning at some earlier point in time, often via
opportunistic pruning. For better or worse, the only thing that VACUUM
aims to do is make sure that PageGetHeapFreeSpace() gets called
exactly once per scanned page.

which is, during the first pass, if we will not do index vacuuming,
either because we have no indexes, because there is nothing to vacuum,
or because do_index_vacuuming is false, make sure we update the
freespace map now.

but what about no prune when do_index_vacuuming is false?

Note that do_index_vacuuming cannot actually affect the "nindexes ==
0" case. We have an assertion that's kinda relevant to this point:

if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
{
/*
* Wait until lazy_vacuum_heap_rel() to save free space.
* ....
*/
Assert(vacrel->nindexes > 0);
UnlockReleaseBuffer(buf);
}
else
{
....
}

We should never get this far down in the lazy_scan_heap() loop when
"nindexes == 0 && prunestate.has_lpdead_items". That's handled in the
special "single heap pass" branch after lazy_scan_prune().

As for the case where we use lazy_scan_noprune() for a "nindexes == 0"
VACUUM's heap page, we also can't get this far down. (Actually,
lazy_scan_noprune lies if it has to in order to make sure that
lazy_scan_heap never has to deal with a "nindexes == 0" VACUUM with
LP_DEAD items from a no-cleanup-lock page. This is a kludge -- see
comments about how this is "dishonest" inside lazy_scan_noprune().)

I still don't understand why vacuum is responsible for updating the
FSM per page when no line pointers have been set unused. That is how
PageGetFreeSpace() figures out if there is free space, right?

You mean PageGetHeapFreeSpace? Not really. (Though even pruning can
set line pointers unused, or heap-only tuples.)

Even if pruning doesn't happen in VACUUM, that doesn't mean that the
FSM is up-to-date.

In short, we do these things with the free space map because it is a
map of free space (which isn't crash safe) -- nothing more. I happen
to agree that that general design has a lot of problems, but those
seem out of scope here.

--
Peter Geoghegan

#53Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#49)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 11:50 AM Peter Geoghegan <pg@bowt.ie> wrote:

Why do you think that lazy_scan_prune() and lazy_scan_noprune() have
different ideas about whether to update the FSM or not?

When lazy_scan_prune() is called, we call RecordPageWithFreeSpace if
vacrel->nindexes == 0 && prunestate.has_lpdead_items. See the code
near the comment that begins "Consider the need to do page-at-a-time
heap vacuuming".

When lazy_scan_noprune() is called, whether we call
RecordPageWithFreeSpace depends on the output parameter
recordfreespace. That is set to true whenever vacrel->nindexes == 0 ||
lpdead_items == 0. See the code near the comment that begins "Save any
LP_DEAD items".

The case where I thought there was a behavior difference is when
vacrel->nindexes == 0 and lpdead_items == 0 and thus
prunestate.has_lpdead_items is false. But now I see (from Melanie's
email) that this isn't really true, because in that case we fall
through to the logic that we use when indexes are present, giving us a
second chance to call RecordPageWithFreeSpace(), which we take when
(prunestate.has_lpdead_items && vacrel->do_index_vacuuming) comes out
false, as it always does in the scenario that I postulated.

P.S. to Melanie: I'll respond to your further emails next but I wanted
to respond to this one from Peter first so he didn't think I was
rudely ignoring him. :-)

P.P.S. to everyone: Yikes, this logic is really confusing.

--
Robert Haas
EDB: http://www.enterprisedb.com

#54Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#52)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 1:07 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Fri, Jan 12, 2024 at 12:33 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

So, I think this is the logic in master:

Prune case, first pass

...
- indexes > 0 && (!space_freed || !index_vacuuming) -> update FSM

What is "space_freed"? Isn't that something from your uncommitted patch?

Yes, I was mixing the two together.

I just want to make sure that we agree that, on master, when
lazy_scan_prune() is called, the logic for whether or not to update
the FSM after the first pass is:

indexes == 0 || !has_lpdead_items || !index_vacuuming

and when lazy_scan_noprune() is called, the logic for whether or not
to update the FSM after the first pass is:

indexes == 0 || !has_lpdead_items

Those seem different to me.

- Melanie

#55Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#52)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 1:07 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Fri, Jan 12, 2024 at 12:33 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

So, I think this is the logic in master:

Prune case, first pass

...
- indexes > 0 && (!space_freed || !index_vacuuming) -> update FSM

What is "space_freed"? Isn't that something from your uncommitted patch?

As I said, the aim is to call PageGetHeapFreeSpace() (*not*
PageGetFreeSpace(), which is only used for index pages) exactly once
per heap page scanned. This is supposed to happen independently of
whatever specific work was/will be required for the heap page. In
general, we don't ever trust that the FSM is already up-to-date.
Presumably because the FSM isn't crash safe.

On master, prunestate.has_lpdead_items may be set true when our VACUUM
wasn't actually the thing that performed pruning that freed tuple
storage -- typically when some other backend was the one that did all
required pruning at some earlier point in time, often via
opportunistic pruning. For better or worse, the only thing that VACUUM
aims to do is make sure that PageGetHeapFreeSpace() gets called
exactly once per scanned page.

...

I still don't understand why vacuum is responsible for updating the
FSM per page when no line pointers have been set unused. That is how
PageGetFreeSpace() figures out if there is free space, right?

You mean PageGetHeapFreeSpace? Not really. (Though even pruning can
set line pointers unused, or heap-only tuples.)

Even if pruning doesn't happen in VACUUM, that doesn't mean that the
FSM is up-to-date.

In short, we do these things with the free space map because it is a
map of free space (which isn't crash safe) -- nothing more. I happen
to agree that that general design has a lot of problems, but those
seem out of scope here.

So, there are 3 issues I am trying to understand:

1) How often should vacuum update the FSM (not vacuum as in the second
pass but vacuum as in the whole thing that is happening in
lazy_scan_heap())?
2) What is the exact logic in master that ensures that vacuum
implements the cadence in 1)?
3) How can the logic in 2) be replicated exactly in my patch that sets
would-be dead items LP_UNUSED during pruning?

From what Peter is saying, I think 1) is decided and is once per page
(across all passes)
For 2), see my previous email. And for 3), TBD until 2) is agreed upon.

- Melanie

#56Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#54)
1 attachment(s)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 1:52 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Yes, I was mixing the two together.

I just want to make sure that we agree that, on master, when
lazy_scan_prune() is called, the logic for whether or not to update
the FSM after the first pass is:

indexes == 0 || !has_lpdead_items || !index_vacuuming

and when lazy_scan_noprune() is called, the logic for whether or not
to update the FSM after the first pass is:

indexes == 0 || !has_lpdead_items

Those seem different to me.

This analysis seems correct to me, except that "when
lazy_scan_noprune() is called" should really say "when
lazy_scan_noprune() is called (and returns true)", because when it
returns false we fall through and call lazy_scan_prune() afterwards.

Here's a draft patch to clean up the inconsistency here. It also gets
rid of recordfreespace, because ISTM that recordfreespace is adding to
the confusion here rather than helping anything.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v1-0001-Be-more-consistent-about-whether-to-update-the-FS.patch.nocfbotapplication/octet-stream; name=v1-0001-Be-more-consistent-about-whether-to-update-the-FS.patch.nocfbotDownload
From 09745c1131674b829038892317f13fc3670cb32d Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 12 Jan 2024 14:25:22 -0500
Subject: [PATCH v1] Be more consistent about whether to update the FSM while
 vacuuming.

Previously, when lazy_scan_noprune() was called and returned true, we would
update the FSM immediately if the relation had no indexes or if the page
contained no dead items. On the other hand, when lazy_scan_prune() was
called, we would update the FSM if either of those things was true or
if index vacuuming was disabled. Eliminate that behavioral difference by
considering vacrel->do_index_vacuuming in both cases.

Also, instead of having lazy_scan_noprune() make the decision internally
and pass it back to the caller via *recordfreespace, just make the
decision directly in the caller. That seems less confusing, since the
caller also decides in the lazy_scan_prune() case; moreover, this way,
the whole test is in one place, instead of spread out.
---
 src/backend/access/heap/vacuumlazy.c | 51 ++++++++++++----------------
 1 file changed, 21 insertions(+), 30 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b63cad1335..ea452f4415 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -251,8 +251,7 @@ static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							LVPagePruneState *prunestate);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
-							  BlockNumber blkno, Page page,
-							  bool *recordfreespace);
+							  BlockNumber blkno, Page page);
 static void lazy_vacuum(LVRelState *vacrel);
 static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
 static void lazy_vacuum_heap_rel(LVRelState *vacrel);
@@ -958,8 +957,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		recordfreespace;
-
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
 			/* Check for new or empty pages before lazy_scan_noprune call */
@@ -974,17 +971,29 @@ lazy_scan_heap(LVRelState *vacrel)
 			 * Collect LP_DEAD items in dead_items array, count tuples,
 			 * determine if rel truncation is safe
 			 */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page,
-								  &recordfreespace))
+			if (lazy_scan_noprune(vacrel, buf, blkno, page))
 			{
 				Size		freespace = 0;
+				bool		recordfreespace;
 
 				/*
-				 * Processed page successfully (without cleanup lock) -- just
-				 * need to update the FSM, much like the lazy_scan_prune case.
-				 * Don't bother trying to match its visibility map setting
-				 * steps, though.
+				 * We processed the page successfully (without a cleanup lock).
+				 *
+				 * Update the FSM, just as we would in the case where
+				 * lazy_scan_prune() is called. Our goal is to update the
+				 * freespace map the last time we touch the page. If the
+				 * relation has no indexes, or if index vacuuming is disabled,
+				 * there will be no second heap pass; if this particular page
+				 * has no dead items, the second heap pass will not touch this
+				 * page. So, in those cases, update the FSM now.
+				 *
+				 * After a call to lazy_scan_prune(), we would also try to
+				 * adjust the page-level all-visible bit and the visibility
+				 * map, but we skip that step in this path.
 				 */
+				recordfreespace = vacrel->nindexes == 0
+					|| !vacrel->do_index_vacuuming
+					|| !prunestate.has_lpdead_items;
 				if (recordfreespace)
 					freespace = PageGetHeapFreeSpace(page);
 				UnlockReleaseBuffer(buf);
@@ -1942,8 +1951,7 @@ static bool
 lazy_scan_noprune(LVRelState *vacrel,
 				  Buffer buf,
 				  BlockNumber blkno,
-				  Page page,
-				  bool *recordfreespace)
+				  Page page)
 {
 	OffsetNumber offnum,
 				maxoff;
@@ -1960,7 +1968,6 @@ lazy_scan_noprune(LVRelState *vacrel,
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
 	hastup = false;				/* for now */
-	*recordfreespace = false;	/* for now */
 
 	lpdead_items = 0;
 	live_tuples = 0;
@@ -2102,18 +2109,8 @@ lazy_scan_noprune(LVRelState *vacrel,
 			hastup = true;
 			missed_dead_tuples += lpdead_items;
 		}
-
-		*recordfreespace = true;
-	}
-	else if (lpdead_items == 0)
-	{
-		/*
-		 * Won't be vacuuming this page later, so record page's freespace in
-		 * the FSM now
-		 */
-		*recordfreespace = true;
 	}
-	else
+	else if (lpdead_items != 0)
 	{
 		VacDeadItems *dead_items = vacrel->dead_items;
 		ItemPointerData tmp;
@@ -2138,12 +2135,6 @@ lazy_scan_noprune(LVRelState *vacrel,
 									 dead_items->num_items);
 
 		vacrel->lpdead_items += lpdead_items;
-
-		/*
-		 * Assume that we'll go on to vacuum this heap page during final pass
-		 * over the heap.  Don't record free space until then.
-		 */
-		*recordfreespace = false;
 	}
 
 	/*
-- 
2.39.3 (Apple Git-145)

In reply to: Robert Haas (#56)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 2:32 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jan 12, 2024 at 1:52 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
This analysis seems correct to me, except that "when
lazy_scan_noprune() is called" should really say "when
lazy_scan_noprune() is called (and returns true)", because when it
returns false we fall through and call lazy_scan_prune() afterwards.

Now that I see your patch, I understand what Melanie must have meant.
I agree that there is a small inconsistency here, that we could well
do without.

In general I am in favor of religiously eliminating such
inconsistencies (between lazy_scan_prune and lazy_scan_noprune),
unless there is a reason not to. Not because it's necessarily
important. More because it's just too hard to be sure whether it might
matter. It's usually easier to defensively assume that it matters.

Here's a draft patch to clean up the inconsistency here. It also gets
rid of recordfreespace, because ISTM that recordfreespace is adding to
the confusion here rather than helping anything.

You're using "!prunestate.has_lpdead_items" as part of your test that
sets "recordfreespace". But lazy_scan_noprune doesn't get passed a
pointer to prunestate, so clearly you'll need to detect the same
condition some other way.

--
Peter Geoghegan

#58Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#57)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 2:43 PM Peter Geoghegan <pg@bowt.ie> wrote:

You're using "!prunestate.has_lpdead_items" as part of your test that
sets "recordfreespace". But lazy_scan_noprune doesn't get passed a
pointer to prunestate, so clearly you'll need to detect the same
condition some other way.

OOPS. Thanks.

--
Robert Haas
EDB: http://www.enterprisedb.com

In reply to: Melanie Plageman (#54)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 1:52 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Fri, Jan 12, 2024 at 1:07 PM Peter Geoghegan <pg@bowt.ie> wrote:

What is "space_freed"? Isn't that something from your uncommitted patch?

Yes, I was mixing the two together.

An understandable mistake.

I just want to make sure that we agree that, on master, when
lazy_scan_prune() is called, the logic for whether or not to update
the FSM after the first pass is:

indexes == 0 || !has_lpdead_items || !index_vacuuming

and when lazy_scan_noprune() is called, the logic for whether or not
to update the FSM after the first pass is:

indexes == 0 || !has_lpdead_items

Those seem different to me.

Right. As I said to Robert just now, I can now see that they're
slightly different conditions.

FWIW my brain was just ignoring " || !index_vacuuming". I dismissed it
as an edge-case, only relevant when the failsafe has kicked in. Which
it is. But that's still no reason to allow an inconsistency that we
can easily just avoid.

--
Peter Geoghegan

#60Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#58)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 2:47 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jan 12, 2024 at 2:43 PM Peter Geoghegan <pg@bowt.ie> wrote:

You're using "!prunestate.has_lpdead_items" as part of your test that
sets "recordfreespace". But lazy_scan_noprune doesn't get passed a
pointer to prunestate, so clearly you'll need to detect the same
condition some other way.

OOPS. Thanks.

Also, I think you should combine these in lazy_scan_noprune() now

/* Save any LP_DEAD items found on the page in dead_items array */
if (vacrel->nindexes == 0)
{
/* Using one-pass strategy (since table has no indexes) */
if (lpdead_items > 0)
{

Since we don't set recordfreespace in the outer if statement anymore

And I noticed you missed a reference to recordfreespace output
parameter in the function comment above lazy_scan_noprune().

- Melanie

#61Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#60)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 3:04 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Also, I think you should combine these in lazy_scan_noprune() now

/* Save any LP_DEAD items found on the page in dead_items array */
if (vacrel->nindexes == 0)
{
/* Using one-pass strategy (since table has no indexes) */
if (lpdead_items > 0)
{

Since we don't set recordfreespace in the outer if statement anymore

Well, maybe, but there's an else clause attached to the outer "if", so
you have to be a bit careful. I didn't think it was critical to
further rejigger this.

And I noticed you missed a reference to recordfreespace output
parameter in the function comment above lazy_scan_noprune().

OK.

So what's the best way to solve the problem that Peter pointed out?
Should we pass in the prunestate? Maybe just replace bool
*recordfreespace with bool *has_lpdead_items?

--
Robert Haas
EDB: http://www.enterprisedb.com

#62Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#61)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 3:22 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jan 12, 2024 at 3:04 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

So what's the best way to solve the problem that Peter pointed out?
Should we pass in the prunestate? Maybe just replace bool
*recordfreespace with bool *has_lpdead_items?

Yea, that works for now. I mean, I think the way we should do it is
update the FSM in lazy_scan_noprune(), but, for the purposes of this
patch, yes. has_lpdead_items output parameter seems fine to me.

- Melanie

#63Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#62)
1 attachment(s)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 12, 2024 at 4:05 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Yea, that works for now. I mean, I think the way we should do it is
update the FSM in lazy_scan_noprune(), but, for the purposes of this
patch, yes. has_lpdead_items output parameter seems fine to me.

Here's v2.

It's not exactly clear to me why you want to update the FSM in
lazy_scan_[no]prune(). When I first looked at v7-0004, I said to
myself "well, this is dumb, because now we're just duplicating
something that is common to both cases". But then I realized that the
logic was *already* duplicated in lazy_scan_heap(), and that by
pushing the duplication down to lazy_scan_[no]prune(), you made the
two copies identical rather than, as at present, having two copies
that differ from each other. Perhaps that's a sufficient reason to
make the change all by itself, but it seems like what would be really
good is if we only needed one copy of the logic. I don't know if
that's achievable, though.

More generally, I somewhat question whether it's really right to push
things from lazy_scan_heap() and into lazy_scan_[no]prune(), mostly
because of the risk of having to duplicate logic. I'm not even really
convinced that it's good for us to have both of those functions.
There's an awful lot of code duplication between them already. Each
has a loop that tests the status of each item and then, for LP_USED,
switches on htsv_get_valid_status or HeapTupleSatisfiesVacuum. It
doesn't seem trivial to unify all that code, but it doesn't seem very
nice to have two copies of it, either.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v2-0001-Be-more-consistent-about-whether-to-update-the-FS.patchapplication/octet-stream; name=v2-0001-Be-more-consistent-about-whether-to-update-the-FS.patchDownload
From 32684f41d1dd50f726aa0dfe8a5d816aa5c42d64 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 15 Jan 2024 12:05:52 -0500
Subject: [PATCH v2] Be more consistent about whether to update the FSM while
 vacuuming.

Previously, when lazy_scan_noprune() was called and returned true, we would
update the FSM immediately if the relation had no indexes or if the page
contained no dead items. On the other hand, when lazy_scan_prune() was
called, we would update the FSM if either of those things was true or
if index vacuuming was disabled. Eliminate that behavioral difference by
considering vacrel->do_index_vacuuming in both cases.

Also, instead of having lazy_scan_noprune() make the decision
internally and pass it back to the caller via *recordfreespace, just
have it pass the number of LP_DEAD items back to the caller, and then
let the caller make the decision.  That seems less confusing, since
the caller also decides in the lazy_scan_prune() case; moreover, this
way, the whole test is in one place, instead of spread out.

Patch by me, reviewed by Peter Geoghegan and Melanie Plageman.
---
 src/backend/access/heap/vacuumlazy.c | 58 ++++++++++++++--------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b63cad1335..f17816b81d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -252,7 +252,7 @@ static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							LVPagePruneState *prunestate);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
-							  bool *recordfreespace);
+							  bool *has_lpdead_items);
 static void lazy_vacuum(LVRelState *vacrel);
 static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
 static void lazy_vacuum_heap_rel(LVRelState *vacrel);
@@ -958,7 +958,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		recordfreespace;
+			bool		has_lpdead_items;
 
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
@@ -974,17 +974,29 @@ lazy_scan_heap(LVRelState *vacrel)
 			 * Collect LP_DEAD items in dead_items array, count tuples,
 			 * determine if rel truncation is safe
 			 */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page,
-								  &recordfreespace))
+			if (lazy_scan_noprune(vacrel, buf, blkno, page, &has_lpdead_items))
 			{
 				Size		freespace = 0;
+				bool		recordfreespace;
 
 				/*
-				 * Processed page successfully (without cleanup lock) -- just
-				 * need to update the FSM, much like the lazy_scan_prune case.
-				 * Don't bother trying to match its visibility map setting
-				 * steps, though.
+				 * We processed the page successfully (without a cleanup lock).
+				 *
+				 * Update the FSM, just as we would in the case where
+				 * lazy_scan_prune() is called. Our goal is to update the
+				 * freespace map the last time we touch the page. If the
+				 * relation has no indexes, or if index vacuuming is disabled,
+				 * there will be no second heap pass; if this particular page
+				 * has no dead items, the second heap pass will not touch this
+				 * page. So, in those cases, update the FSM now.
+				 *
+				 * After a call to lazy_scan_prune(), we would also try to
+				 * adjust the page-level all-visible bit and the visibility
+				 * map, but we skip that step in this path.
 				 */
+				recordfreespace = vacrel->nindexes == 0
+					|| !vacrel->do_index_vacuuming
+					|| !has_lpdead_items;
 				if (recordfreespace)
 					freespace = PageGetHeapFreeSpace(page);
 				UnlockReleaseBuffer(buf);
@@ -1935,15 +1947,17 @@ lazy_scan_prune(LVRelState *vacrel,
  * one or more tuples on the page.  We always return true for non-aggressive
  * callers.
  *
- * recordfreespace flag instructs caller on whether or not it should do
- * generic FSM processing for page.
+ * If this function returns true, *has_lpdead_items gets set to true or false
+ * depending on whether, upon return from this function, any LP_DEAD items are
+ * present on the page. If this function returns false, *has_lpdead_items
+ * is not updated.
  */
 static bool
 lazy_scan_noprune(LVRelState *vacrel,
 				  Buffer buf,
 				  BlockNumber blkno,
 				  Page page,
-				  bool *recordfreespace)
+				  bool *has_lpdead_items)
 {
 	OffsetNumber offnum,
 				maxoff;
@@ -1960,7 +1974,6 @@ lazy_scan_noprune(LVRelState *vacrel,
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
 	hastup = false;				/* for now */
-	*recordfreespace = false;	/* for now */
 
 	lpdead_items = 0;
 	live_tuples = 0;
@@ -2102,18 +2115,8 @@ lazy_scan_noprune(LVRelState *vacrel,
 			hastup = true;
 			missed_dead_tuples += lpdead_items;
 		}
-
-		*recordfreespace = true;
-	}
-	else if (lpdead_items == 0)
-	{
-		/*
-		 * Won't be vacuuming this page later, so record page's freespace in
-		 * the FSM now
-		 */
-		*recordfreespace = true;
 	}
-	else
+	else if (lpdead_items != 0)
 	{
 		VacDeadItems *dead_items = vacrel->dead_items;
 		ItemPointerData tmp;
@@ -2138,12 +2141,6 @@ lazy_scan_noprune(LVRelState *vacrel,
 									 dead_items->num_items);
 
 		vacrel->lpdead_items += lpdead_items;
-
-		/*
-		 * Assume that we'll go on to vacuum this heap page during final pass
-		 * over the heap.  Don't record free space until then.
-		 */
-		*recordfreespace = false;
 	}
 
 	/*
@@ -2159,6 +2156,9 @@ lazy_scan_noprune(LVRelState *vacrel,
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
 
+	/* Did we find LP_DEAD items? */
+	*has_lpdead_items = (lpdead_items > 0);
+
 	/* Caller won't need to call lazy_scan_prune with same page */
 	return true;
 }
-- 
2.39.3 (Apple Git-145)

#64Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#63)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Mon, Jan 15, 2024 at 12:29:57PM -0500, Robert Haas wrote:

On Fri, Jan 12, 2024 at 4:05 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Yea, that works for now. I mean, I think the way we should do it is
update the FSM in lazy_scan_noprune(), but, for the purposes of this
patch, yes. has_lpdead_items output parameter seems fine to me.

Here's v2.

It's not exactly clear to me why you want to update the FSM in
lazy_scan_[no]prune(). When I first looked at v7-0004, I said to
myself "well, this is dumb, because now we're just duplicating
something that is common to both cases". But then I realized that the
logic was *already* duplicated in lazy_scan_heap(), and that by
pushing the duplication down to lazy_scan_[no]prune(), you made the
two copies identical rather than, as at present, having two copies
that differ from each other. Perhaps that's a sufficient reason to
make the change all by itself, but it seems like what would be really
good is if we only needed one copy of the logic. I don't know if
that's achievable, though.

Upthread in v4-0004, I do have a version which combines the two FSM
updates for lazy_scan_prune() and lazy_scan_noprune().

If you move the VM updates from lazy_scan_heap() into lazy_scan_prune(),
then there is little difference between the prune and no prune cases in
lazy_scan_heap(). The main difference is that, when lazy_scan_noprune()
returns true, you are supposed to avoid calling lazy_scan_new_or_empty()
again and have to avoid calling lazy_scan_prune(). I solved this with a
local variable "do_prune" and checked it before calling
lazy_scan_new_or_empty() and lazy_scan_prune().

I moved away from this approach because it felt odd to test "do_prune"
before calling lazy_scan_new_or_empty(). Though, perhaps it doesn't
matter if we just call lazy_scan_new_or_empty() again. We do that when
lazy_scan_noprune() returns false anyway.

I thought perhaps we could circumvent this issue by moving
lazy_scan_new_or_empty() into lazy_scan_prune(). But, this seemed wrong
to me because, as it stands now, if lazy_scan_new_or_empty() returns
true, we would want to skip the VM update code and FSM update code in
lazy_scan_heap() after lazy_scan_prune(). That would mean
lazy_scan_prune() would have to return something to indicate that.

More generally, I somewhat question whether it's really right to push
things from lazy_scan_heap() and into lazy_scan_[no]prune(), mostly
because of the risk of having to duplicate logic. I'm not even really
convinced that it's good for us to have both of those functions.
There's an awful lot of code duplication between them already. Each
has a loop that tests the status of each item and then, for LP_USED,
switches on htsv_get_valid_status or HeapTupleSatisfiesVacuum. It
doesn't seem trivial to unify all that code, but it doesn't seem very
nice to have two copies of it, either.

I agree that the duplicated logic in both places is undesirable.
I think the biggest issue with combining them would be that when
lazy_scan_noprune() returns false, it needs to go get a cleanup lock and
then invoke the logic of lazy_scan_prune() for all the tuples on the
page. This seems a little hard to get right in a single function.

Then there are more trivial-to-solve differences like invoking
heap_tuple_should_freeze() and not bothering with the whole
heap_prepare_freeze_tuple() if we only have the share lock.

I am willing to try and write a version of this to see if it is better.
I will say, though, my agenda was to eventually push the actions taken
in the loop in lazy_scan_prune() into heap_page_prune() and
heap_prune_chain().

From 32684f41d1dd50f726aa0dfe8a5d816aa5c42d64 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 15 Jan 2024 12:05:52 -0500
Subject: [PATCH v2] Be more consistent about whether to update the FSM while
vacuuming.

Few small review comments below, but, overall, LGTM.

Previously, when lazy_scan_noprune() was called and returned true, we would
update the FSM immediately if the relation had no indexes or if the page
contained no dead items. On the other hand, when lazy_scan_prune() was
called, we would update the FSM if either of those things was true or
if index vacuuming was disabled. Eliminate that behavioral difference by
considering vacrel->do_index_vacuuming in both cases.

Also, instead of having lazy_scan_noprune() make the decision
internally and pass it back to the caller via *recordfreespace, just
have it pass the number of LP_DEAD items back to the caller, and then

It doesn't pass the number of LP_DEAD items back to the caller. It
passes a boolean.

let the caller make the decision. That seems less confusing, since
the caller also decides in the lazy_scan_prune() case; moreover, this
way, the whole test is in one place, instead of spread out.

Perhaps it isn't important, but I find this wording confusing. You
mention lazy_scan_prune() and then mention that "the whole test is in
one place instead of spread out" -- which kind of makes it sound like
you are consolidating FSM updates for both the lazy_scan_noprune() and
lazy_scan_prune() cases. Perhaps simply flipping the order of the "since
the caller" and "moreover, this way" conjunctions would solve it. I
defer to your judgment.

---
src/backend/access/heap/vacuumlazy.c | 58 ++++++++++++++--------------
1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b63cad1335..f17816b81d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1960,7 +1974,6 @@ lazy_scan_noprune(LVRelState *vacrel,
Assert(BufferGetBlockNumber(buf) == blkno);

hastup = false; /* for now */
- *recordfreespace = false; /* for now */

lpdead_items = 0;
live_tuples = 0;
@@ -2102,18 +2115,8 @@ lazy_scan_noprune(LVRelState *vacrel,
hastup = true;
missed_dead_tuples += lpdead_items;
}
-
-		*recordfreespace = true;
-	}
-	else if (lpdead_items == 0)
-	{
-		/*
-		 * Won't be vacuuming this page later, so record page's freespace in
-		 * the FSM now
-		 */
-		*recordfreespace = true;
}
-	else
+	else if (lpdead_items != 0)

It stuck out to me a bit that this test is if lpdead_items != 0 and the
one above it:

/* Save any LP_DEAD items found on the page in dead_items array */
if (vacrel->nindexes == 0)
{
/* Using one-pass strategy (since table has no indexes) */
if (lpdead_items > 0)
{

tests if lpdead_items > 0. It is more the inconsistency that bothered me
than the fact that lpdead_items is signed.

@@ -2159,6 +2156,9 @@ lazy_scan_noprune(LVRelState *vacrel,
if (hastup)
vacrel->nonempty_pages = blkno + 1;

+	/* Did we find LP_DEAD items? */
+	*has_lpdead_items = (lpdead_items > 0);

I would drop this comment. The code doesn't really need it, and the
reason we care if there are LP_DEAD items is not because they are dead
but because we want to know if we'll touch this page again. You don't
need to rehash all that here, so I think omitting the comment is enough.

- Melanie

#65Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#64)
1 attachment(s)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Mon, Jan 15, 2024 at 4:03 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

It doesn't pass the number of LP_DEAD items back to the caller. It
passes a boolean.

Oops.

Perhaps it isn't important, but I find this wording confusing. You
mention lazy_scan_prune() and then mention that "the whole test is in
one place instead of spread out" -- which kind of makes it sound like
you are consolidating FSM updates for both the lazy_scan_noprune() and
lazy_scan_prune() cases. Perhaps simply flipping the order of the "since
the caller" and "moreover, this way" conjunctions would solve it. I
defer to your judgment.

I rewrote the commit message a bit. See what you think of this version.

tests if lpdead_items > 0. It is more the inconsistency that bothered me
than the fact that lpdead_items is signed.

Changed.

@@ -2159,6 +2156,9 @@ lazy_scan_noprune(LVRelState *vacrel,
if (hastup)
vacrel->nonempty_pages = blkno + 1;

+     /* Did we find LP_DEAD items? */
+     *has_lpdead_items = (lpdead_items > 0);

I would drop this comment. The code doesn't really need it, and the
reason we care if there are LP_DEAD items is not because they are dead
but because we want to know if we'll touch this page again. You don't
need to rehash all that here, so I think omitting the comment is enough.

I want to keep the comment. I guess it's a pet peeve of mine, but I
hate it when people do this:

/* some comment */
some_code();

some_more_code();

/* some other comment */
even_more_code();

IMV, this makes it unclear whether /* some comment */ is describing
both some_code() and some_more_code(), or just the former. To be fair,
there is often no practical confusion, because if the comment is good
and the code is nothing too complicated then you understand which way
the author meant it. But sometimes the comment is bad or out of date
and sometimes the code is difficult to understand and then you're left
scratching your head as to what the author meant. I prefer to insert a
comment above some_more_code() in such cases, even if it's a bit
perfunctory. I think it makes the code easier to read.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v3-0001-Be-more-consistent-about-whether-to-update-the-FS.patchapplication/octet-stream; name=v3-0001-Be-more-consistent-about-whether-to-update-the-FS.patchDownload
From 32d14a1f38a72eafa0704361d768f965df82996f Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 16 Jan 2024 10:11:34 -0500
Subject: [PATCH v3] Be more consistent about whether to update the FSM while
 vacuuming.

Previously, when lazy_scan_noprune() was called and returned true, we would
update the FSM immediately if the relation had no indexes or if the page
contained no dead items. On the other hand, when lazy_scan_prune() was
called, we would update the FSM if either of those things was true or
if index vacuuming was disabled. Eliminate that behavioral difference by
considering vacrel->do_index_vacuuming in both cases.

Also, make lazy_scan_heap() responsible for deciding whether to update
the FSM, instead of doing it inside lazy_scan_noprune(). This is
more consistent with the lazy_scan_prune() case. lazy_scan_noprune()
still needs an output parameter for whether there are LP_DEAD items
on the page, but the real decision-making now happens in the caller.

Patch by me, reviewed by Peter Geoghegan and Melanie Plageman.
---
 src/backend/access/heap/vacuumlazy.c | 59 ++++++++++++++--------------
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b63cad1335..e7a942e183 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -252,7 +252,7 @@ static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							LVPagePruneState *prunestate);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
-							  bool *recordfreespace);
+							  bool *has_lpdead_items);
 static void lazy_vacuum(LVRelState *vacrel);
 static bool lazy_vacuum_all_indexes(LVRelState *vacrel);
 static void lazy_vacuum_heap_rel(LVRelState *vacrel);
@@ -958,7 +958,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		recordfreespace;
+			bool		has_lpdead_items;
 
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
@@ -974,17 +974,30 @@ lazy_scan_heap(LVRelState *vacrel)
 			 * Collect LP_DEAD items in dead_items array, count tuples,
 			 * determine if rel truncation is safe
 			 */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page,
-								  &recordfreespace))
+			if (lazy_scan_noprune(vacrel, buf, blkno, page, &has_lpdead_items))
 			{
 				Size		freespace = 0;
+				bool		recordfreespace;
 
 				/*
-				 * Processed page successfully (without cleanup lock) -- just
-				 * need to update the FSM, much like the lazy_scan_prune case.
-				 * Don't bother trying to match its visibility map setting
-				 * steps, though.
+				 * We processed the page successfully (without a cleanup
+				 * lock).
+				 *
+				 * Update the FSM, just as we would in the case where
+				 * lazy_scan_prune() is called. Our goal is to update the
+				 * freespace map the last time we touch the page. If the
+				 * relation has no indexes, or if index vacuuming is disabled,
+				 * there will be no second heap pass; if this particular page
+				 * has no dead items, the second heap pass will not touch this
+				 * page. So, in those cases, update the FSM now.
+				 *
+				 * After a call to lazy_scan_prune(), we would also try to
+				 * adjust the page-level all-visible bit and the visibility
+				 * map, but we skip that step in this path.
 				 */
+				recordfreespace = vacrel->nindexes == 0
+					|| !vacrel->do_index_vacuuming
+					|| !has_lpdead_items;
 				if (recordfreespace)
 					freespace = PageGetHeapFreeSpace(page);
 				UnlockReleaseBuffer(buf);
@@ -1935,15 +1948,17 @@ lazy_scan_prune(LVRelState *vacrel,
  * one or more tuples on the page.  We always return true for non-aggressive
  * callers.
  *
- * recordfreespace flag instructs caller on whether or not it should do
- * generic FSM processing for page.
+ * If this function returns true, *has_lpdead_items gets set to true or false
+ * depending on whether, upon return from this function, any LP_DEAD items are
+ * present on the page. If this function returns false, *has_lpdead_items
+ * is not updated.
  */
 static bool
 lazy_scan_noprune(LVRelState *vacrel,
 				  Buffer buf,
 				  BlockNumber blkno,
 				  Page page,
-				  bool *recordfreespace)
+				  bool *has_lpdead_items)
 {
 	OffsetNumber offnum,
 				maxoff;
@@ -1960,7 +1975,6 @@ lazy_scan_noprune(LVRelState *vacrel,
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
 	hastup = false;				/* for now */
-	*recordfreespace = false;	/* for now */
 
 	lpdead_items = 0;
 	live_tuples = 0;
@@ -2102,18 +2116,8 @@ lazy_scan_noprune(LVRelState *vacrel,
 			hastup = true;
 			missed_dead_tuples += lpdead_items;
 		}
-
-		*recordfreespace = true;
-	}
-	else if (lpdead_items == 0)
-	{
-		/*
-		 * Won't be vacuuming this page later, so record page's freespace in
-		 * the FSM now
-		 */
-		*recordfreespace = true;
 	}
-	else
+	else if (lpdead_items > 0)
 	{
 		VacDeadItems *dead_items = vacrel->dead_items;
 		ItemPointerData tmp;
@@ -2138,12 +2142,6 @@ lazy_scan_noprune(LVRelState *vacrel,
 									 dead_items->num_items);
 
 		vacrel->lpdead_items += lpdead_items;
-
-		/*
-		 * Assume that we'll go on to vacuum this heap page during final pass
-		 * over the heap.  Don't record free space until then.
-		 */
-		*recordfreespace = false;
 	}
 
 	/*
@@ -2159,6 +2157,9 @@ lazy_scan_noprune(LVRelState *vacrel,
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
 
+	/* Did we find LP_DEAD items? */
+	*has_lpdead_items = (lpdead_items > 0);
+
 	/* Caller won't need to call lazy_scan_prune with same page */
 	return true;
 }
-- 
2.39.3 (Apple Git-145)

#66Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#65)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Tue, Jan 16, 2024 at 10:24 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jan 15, 2024 at 4:03 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Perhaps it isn't important, but I find this wording confusing. You
mention lazy_scan_prune() and then mention that "the whole test is in
one place instead of spread out" -- which kind of makes it sound like
you are consolidating FSM updates for both the lazy_scan_noprune() and
lazy_scan_prune() cases. Perhaps simply flipping the order of the "since
the caller" and "moreover, this way" conjunctions would solve it. I
defer to your judgment.

I rewrote the commit message a bit. See what you think of this version.

All LGTM.

#67Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#66)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Tue, Jan 16, 2024 at 11:28 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

All LGTM.

Committed.

--
Robert Haas
EDB: http://www.enterprisedb.com

#68Jim Nasby
jim.nasby@gmail.com
In reply to: Robert Haas (#53)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On 1/12/24 12:45 PM, Robert Haas wrote:

P.P.S. to everyone: Yikes, this logic is really confusing.

Having studied all this code several years ago when it was even simpler
- it was *still* very hard to grok even back then. I *greatly
appreciate* the effort that y'all are putting into increasing the
clarity here.

BTW, back in the day the whole "no indexes" optimization was a really
tiny amount of code... I think it amounted to 2 or 3 if statements. I
haven't yet attempted to grok this patchset, but I'm definitely
wondering how much it's worth continuing to optimize that case. Clearly
it'd be very expensive to memoize dead tuples just to trawl that list a
single time to clean the heap, but outside of that I'm not sure other
optimazations are worth it given the amount of code
complexity/duplication they seem to require - especially for code where
correctness is so crucial.
--
Jim Nasby, Data Architect, Austin TX

#69Robert Haas
robertmhaas@gmail.com
In reply to: Jim Nasby (#68)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Tue, Jan 16, 2024 at 4:28 PM Jim Nasby <jim.nasby@gmail.com> wrote:

On 1/12/24 12:45 PM, Robert Haas wrote:

P.P.S. to everyone: Yikes, this logic is really confusing.

Having studied all this code several years ago when it was even simpler
- it was *still* very hard to grok even back then. I *greatly
appreciate* the effort that y'all are putting into increasing the
clarity here.

Thanks. And yeah, I agree.

BTW, back in the day the whole "no indexes" optimization was a really
tiny amount of code... I think it amounted to 2 or 3 if statements. I
haven't yet attempted to grok this patchset, but I'm definitely
wondering how much it's worth continuing to optimize that case. Clearly
it'd be very expensive to memoize dead tuples just to trawl that list a
single time to clean the heap, but outside of that I'm not sure other
optimazations are worth it given the amount of code
complexity/duplication they seem to require - especially for code where
correctness is so crucial.

Personally, I don't think throwing away that optimization is the way
to go. The idea isn't intrinsically complicated, I believe. It's just
that the code has become messy because of too many hands touching it.
At least, that's my read.

--
Robert Haas
EDB: http://www.enterprisedb.com

#70Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#67)
4 attachment(s)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Tue, Jan 16, 2024 at 2:23 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 16, 2024 at 11:28 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

All LGTM.

Committed.

Attached v8 patch set is rebased over this.

In 0004, I've taken the approach you seem to favor and combined the FSM
updates from the prune and no prune cases in lazy_scan_heap() instead
of pushing the FSM updates into lazy_scan_prune() and
lazy_scan_noprune().

I did not guard against calling lazy_scan_new_or_empty() a second time
in the case that lazy_scan_noprune() was called. I can do this. I
mentioned upthread I found it confusing for lazy_scan_new_or_empty()
to be guarded by if (do_prune). The overhead of calling it wouldn't be
terribly high. I can change that based on your opinion of what is
better.

The big open question/functional change is when we consider vacuuming
the FSM. Previously, we only considered vacuuming the FSM in the no
indexes, has dead items case. After combining the FSM updates from
lazy_scan_prune()'s no indexes/has lpdead items case,
lazy_scan_prune()'s regular case, and lazy_scan_noprune(), all of them
consider vacuuming the FSM. I could guard against this, but I wasn't
sure why we would want to only vacuum the FSM in the no indexes/has
dead items case.

I also noticed while rebasing something I missed while reviewing
45d395cd75ffc5b -- has_lpdead_items is set in a slightly different
place in lazy_scan_noprune() than lazy_scan_prune() (with otherwise
identical surrounding code). Both are correct, but it would have been
nice for them to be the same. If the patches below are committed, we
could standardize on the location in lazy_scan_noprune().

- Melanie

Attachments:

v8-0002-Move-VM-update-code-to-lazy_scan_prune.patchapplication/x-patch; name=v8-0002-Move-VM-update-code-to-lazy_scan_prune.patchDownload
From 1aaf95e70997f363b4783d023f92ad16f796e7b4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 5 Jan 2024 11:12:58 -0500
Subject: [PATCH v8 2/4] Move VM update code to lazy_scan_prune()

The visibility map is updated after lazy_scan_prune() returns in
lazy_scan_heap(). Most of the output parameters of lazy_scan_prune() are
used to update the VM in lazy_scan_heap(). Moving that code into
lazy_scan_prune() simplifies lazy_scan_heap() and requires less
communication between the two. This is a pure lift and shift. No VM
update logic is changed. All but one of the members of the prunestate
are now obsolete. It will be removed in a future commit.
---
 src/backend/access/heap/vacuumlazy.c | 226 ++++++++++++++-------------
 1 file changed, 115 insertions(+), 111 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c7ef879be4e..6535d8bdeea 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -249,6 +249,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
+							Buffer vmbuffer, bool all_visible_according_to_vm,
 							LVPagePruneState *prunestate);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
@@ -1032,117 +1033,9 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * tuple headers of remaining items with storage. It also determines
 		 * if truncating this block is safe.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate);
-
-		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
-
-		/*
-		 * Handle setting visibility map bit based on information from the VM
-		 * (as of last lazy_scan_skip() call), and from prunestate
-		 */
-		if (!all_visible_according_to_vm && prunestate.all_visible)
-		{
-			uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-			if (prunestate.all_frozen)
-			{
-				Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-				flags |= VISIBILITYMAP_ALL_FROZEN;
-			}
-
-			/*
-			 * It should never be the case that the visibility map page is set
-			 * while the page-level bit is clear, but the reverse is allowed
-			 * (if checksums are not enabled).  Regardless, set both bits so
-			 * that we get back in sync.
-			 *
-			 * NB: If the heap page is all-visible but the VM bit is not set,
-			 * we don't need to dirty the heap page.  However, if checksums
-			 * are enabled, we do need to make sure that the heap page is
-			 * dirtied before passing it to visibilitymap_set(), because it
-			 * may be logged.  Given that this situation should only happen in
-			 * rare cases after a crash, it is not worth optimizing.
-			 */
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, prunestate.visibility_cutoff_xid,
-							  flags);
-		}
-
-		/*
-		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
-		 * the page-level bit is clear.  However, it's possible that the bit
-		 * got cleared after lazy_scan_skip() was called, so we must recheck
-		 * with buffer lock before concluding that the VM is corrupt.
-		 */
-		else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-				 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-		{
-			elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-				 vacrel->relname, blkno);
-			visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-								VISIBILITYMAP_VALID_BITS);
-		}
-
-		/*
-		 * It's possible for the value returned by
-		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-		 * wrong for us to see tuples that appear to not be visible to
-		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
-		 * xmin value never moves backwards, but
-		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
-		 * returns a value that's unnecessarily small, so if we see that
-		 * contradiction it just means that the tuples that we think are not
-		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
-		 * is correct.
-		 *
-		 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE
-		 * set, however.
-		 */
-		else if (prunestate.has_lpdead_items && PageIsAllVisible(page))
-		{
-			elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-				 vacrel->relname, blkno);
-			PageClearAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-								VISIBILITYMAP_VALID_BITS);
-		}
-
-		/*
-		 * If the all-visible page is all-frozen but not marked as such yet,
-		 * mark it as all-frozen.  Note that all_frozen is only valid if
-		 * all_visible is true, so we must check both prunestate fields.
-		 */
-		else if (all_visible_according_to_vm && prunestate.all_visible &&
-				 prunestate.all_frozen &&
-				 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-		{
-			/*
-			 * Avoid relying on all_visible_according_to_vm as a proxy for the
-			 * page-level PD_ALL_VISIBLE bit being set, since it might have
-			 * become stale -- even when all_visible is set in prunestate
-			 */
-			if (!PageIsAllVisible(page))
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buf);
-			}
-
-			/*
-			 * Set the page all-frozen (and all-visible) in the VM.
-			 *
-			 * We can pass InvalidTransactionId as our visibility_cutoff_xid,
-			 * since a snapshotConflictHorizon sufficient to make everything
-			 * safe for REDO was logged when the page's tuples were frozen.
-			 */
-			Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
-		}
+		lazy_scan_prune(vacrel, buf, blkno, page,
+						vmbuffer, all_visible_according_to_vm,
+						&prunestate);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1495,6 +1388,8 @@ lazy_scan_prune(LVRelState *vacrel,
 				Buffer buf,
 				BlockNumber blkno,
 				Page page,
+				Buffer vmbuffer,
+				bool all_visible_according_to_vm,
 				LVPagePruneState *prunestate)
 {
 	Relation	rel = vacrel->rel;
@@ -1879,6 +1774,115 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Can't truncate this page */
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
+
+	Assert(!prunestate->all_visible || !prunestate->has_lpdead_items);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last lazy_scan_skip() call), and from prunestate
+	 */
+	if (!all_visible_according_to_vm && prunestate->all_visible)
+	{
+		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
+
+		if (prunestate->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+			flags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled).  Regardless, set both bits so that we
+		 * get back in sync.
+		 *
+		 * NB: If the heap page is all-visible but the VM bit is not set, we
+		 * don't need to dirty the heap page.  However, if checksums are
+		 * enabled, we do need to make sure that the heap page is dirtied
+		 * before passing it to visibilitymap_set(), because it may be logged.
+		 * Given that this situation should only happen in rare cases after a
+		 * crash, it is not worth optimizing.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, prunestate->visibility_cutoff_xid,
+						  flags);
+	}
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after lazy_scan_skip() was called, so we must recheck with
+	 * buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
+			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 vacrel->relname, blkno);
+		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prunestate->has_lpdead_items && PageIsAllVisible(page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 vacrel->relname, blkno);
+		PageClearAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * If the all-visible page is all-frozen but not marked as such yet, mark
+	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
+	 * true, so we must check both prunestate fields.
+	 */
+	else if (all_visible_according_to_vm && prunestate->all_visible &&
+			 prunestate->all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	{
+		/*
+		 * Avoid relying on all_visible_according_to_vm as a proxy for the
+		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
+		 * stale -- even when all_visible is set in prunestate
+		 */
+		if (!PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
+		/*
+		 * Set the page all-frozen (and all-visible) in the VM.
+		 *
+		 * We can pass InvalidTransactionId as our visibility_cutoff_xid,
+		 * since a snapshotConflictHorizon sufficient to make everything safe
+		 * for REDO was logged when the page's tuples were frozen.
+		 */
+		Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, InvalidTransactionId,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
+	}
 }
 
 /*
-- 
2.37.2

v8-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchapplication/x-patch; name=v8-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchDownload
From 82d7ec9dbfd663c77a2dfbeea80eea00ca62deed Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Jan 2024 17:41:55 -0500
Subject: [PATCH v8 1/4] Set would-be dead items LP_UNUSED while pruning

If there are no indexes on a relation, items can be marked LP_UNUSED
instead of LP_DEAD during while pruning. This avoids a separate
invocation of lazy_vacuum_heap_page() and saves a vacuum WAL record.

To accomplish this, pass heap_page_prune() a new parameter, mark_unused_now,
which indicates that dead line pointers should be set to LP_UNUSED
during pruning, allowing earlier reaping of tuples.

We only enable this for vacuum's pruning. On access pruning always
passes mark_unused_now == false.

Discussion: https://postgr.es/m/CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  |  79 ++++++++++++++---
 src/backend/access/heap/vacuumlazy.c | 128 ++++++++-------------------
 src/include/access/heapam.h          |   1 +
 3 files changed, 106 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3e0a1a260e6..be83f7bdf05 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -35,6 +35,8 @@ typedef struct
 
 	/* tuple visibility test, initialized for the relation */
 	GlobalVisState *vistest;
+	/* whether or not dead items can be set LP_UNUSED during pruning */
+	bool		mark_unused_now;
 
 	TransactionId new_prune_xid;	/* new prune hint value for page */
 	TransactionId snapshotConflictHorizon;	/* latest xid removed */
@@ -67,6 +69,7 @@ static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum);
 static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum);
 static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
 static void page_verify_redirects(Page page);
 
@@ -148,7 +151,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			PruneResult presult;
 
-			heap_page_prune(relation, buffer, vistest, &presult, NULL);
+			/*
+			 * For now, pass mark_unused_now == false regardless of whether or
+			 * not the relation has indexes, since we cannot safely determine
+			 * that during on-access pruning with the current implementation.
+			 */
+			heap_page_prune(relation, buffer, vistest, false,
+							&presult, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -193,6 +202,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * (see heap_prune_satisfies_vacuum and
  * HeapTupleSatisfiesVacuum).
  *
+ * mark_unused_now indicates whether or not dead items can be set LP_UNUSED during
+ * pruning.
+ *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -203,6 +215,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 void
 heap_page_prune(Relation relation, Buffer buffer,
 				GlobalVisState *vistest,
+				bool mark_unused_now,
 				PruneResult *presult,
 				OffsetNumber *off_loc)
 {
@@ -227,6 +240,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 	prstate.new_prune_xid = InvalidTransactionId;
 	prstate.rel = relation;
 	prstate.vistest = vistest;
+	prstate.mark_unused_now = mark_unused_now;
 	prstate.snapshotConflictHorizon = InvalidTransactionId;
 	prstate.nredirected = prstate.ndead = prstate.nunused = 0;
 	memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -306,9 +320,9 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (off_loc)
 			*off_loc = offnum;
 
-		/* Nothing to do if slot is empty or already dead */
+		/* Nothing to do if slot is empty */
 		itemid = PageGetItemId(page, offnum);
-		if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+		if (!ItemIdIsUsed(itemid))
 			continue;
 
 		/* Process this item or chain of items */
@@ -581,7 +595,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * function.)
 		 */
 		if (ItemIdIsDead(lp))
+		{
+			/*
+			 * If the caller set mark_unused_now true, we can set dead line
+			 * pointers LP_UNUSED now. We don't increment ndeleted here since
+			 * the LP was already marked dead.
+			 */
+			if (unlikely(prstate->mark_unused_now))
+				heap_prune_record_unused(prstate, offnum);
+
 			break;
+		}
 
 		Assert(ItemIdIsNormal(lp));
 		htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -715,7 +739,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * redirect the root to the correct chain member.
 		 */
 		if (i >= nchain)
-			heap_prune_record_dead(prstate, rootoffnum);
+			heap_prune_record_dead_or_unused(prstate, rootoffnum);
 		else
 			heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
 	}
@@ -726,9 +750,9 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * item.  This can happen if the loop in heap_page_prune caused us to
 		 * visit the dead successor of a redirect item before visiting the
 		 * redirect item.  We can clean up by setting the redirect item to
-		 * DEAD state.
+		 * DEAD state or LP_UNUSED if the caller indicated.
 		 */
-		heap_prune_record_dead(prstate, rootoffnum);
+		heap_prune_record_dead_or_unused(prstate, rootoffnum);
 	}
 
 	return ndeleted;
@@ -774,6 +798,27 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
 	prstate->marked[offnum] = true;
 }
 
+/*
+ * Depending on whether or not the caller set mark_unused_now to true, record that a
+ * line pointer should be marked LP_DEAD or LP_UNUSED. There are other cases in
+ * which we will mark line pointers LP_UNUSED, but we will not mark line
+ * pointers LP_DEAD if mark_unused_now is true.
+ */
+static void
+heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
+{
+	/*
+	 * If the caller set mark_unused_now to true, we can remove dead tuples
+	 * during pruning instead of marking their line pointers dead. Set this
+	 * tuple's line pointer LP_UNUSED. We hint that this option is less
+	 * likely.
+	 */
+	if (unlikely(prstate->mark_unused_now))
+		heap_prune_record_unused(prstate, offnum);
+	else
+		heap_prune_record_dead(prstate, offnum);
+}
+
 /* Record line pointer to be marked unused */
 static void
 heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
@@ -903,13 +948,23 @@ heap_page_prune_execute(Buffer buffer,
 #ifdef USE_ASSERT_CHECKING
 
 		/*
-		 * Only heap-only tuples can become LP_UNUSED during pruning.  They
-		 * don't need to be left in place as LP_DEAD items until VACUUM gets
-		 * around to doing index vacuuming.
+		 * During pruning, the caller may have passed mark_unused_now == true,
+		 * which allows would-be dead items to be set LP_UNUSED instead. This
+		 * is only possible if the relation has no indexes. If there are any
+		 * dead items, then mark_unused_now was not true and every item being
+		 * marked LP_UNUSED must refer to a heap-only tuple.
 		 */
-		Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
-		htup = (HeapTupleHeader) PageGetItem(page, lp);
-		Assert(HeapTupleHeaderIsHeapOnly(htup));
+		if (ndead > 0)
+		{
+			Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
+			htup = (HeapTupleHeader) PageGetItem(page, lp);
+			Assert(HeapTupleHeaderIsHeapOnly(htup));
+		}
+		else
+		{
+			Assert(ItemIdIsUsed(lp));
+		}
+
 #endif
 
 		ItemIdSetUnused(lp);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e7a942e1835..c7ef879be4e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1036,69 +1036,6 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
-		if (vacrel->nindexes == 0)
-		{
-			/*
-			 * Consider the need to do page-at-a-time heap vacuuming when
-			 * using the one-pass strategy now.
-			 *
-			 * The one-pass strategy will never call lazy_vacuum().  The steps
-			 * performed here can be thought of as the one-pass equivalent of
-			 * a call to lazy_vacuum().
-			 */
-			if (prunestate.has_lpdead_items)
-			{
-				Size		freespace;
-
-				lazy_vacuum_heap_page(vacrel, blkno, buf, 0, vmbuffer);
-
-				/* Forget the LP_DEAD items that we just vacuumed */
-				dead_items->num_items = 0;
-
-				/*
-				 * Now perform FSM processing for blkno, and move on to next
-				 * page.
-				 *
-				 * Our call to lazy_vacuum_heap_page() will have considered if
-				 * it's possible to set all_visible/all_frozen independently
-				 * of lazy_scan_prune().  Note that prunestate was invalidated
-				 * by lazy_vacuum_heap_page() call.
-				 */
-				freespace = PageGetHeapFreeSpace(page);
-
-				UnlockReleaseBuffer(buf);
-				RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-
-				/*
-				 * Periodically perform FSM vacuuming to make newly-freed
-				 * space visible on upper FSM pages. FreeSpaceMapVacuumRange()
-				 * vacuums the portion of the freespace map covering heap
-				 * pages from start to end - 1. Include the block we just
-				 * vacuumed by passing it blkno + 1. Overflow isn't an issue
-				 * because MaxBlockNumber + 1 is InvalidBlockNumber which
-				 * causes FreeSpaceMapVacuumRange() to vacuum freespace map
-				 * pages covering the remainder of the relation.
-				 */
-				if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
-				{
-					FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-											blkno + 1);
-					next_fsm_block_to_vacuum = blkno + 1;
-				}
-
-				continue;
-			}
-
-			/*
-			 * There was no call to lazy_vacuum_heap_page() because pruning
-			 * didn't encounter/create any LP_DEAD items that needed to be
-			 * vacuumed.  Prune state has not been invalidated, so proceed
-			 * with prunestate-driven visibility map and FSM steps (just like
-			 * the two-pass strategy).
-			 */
-			Assert(dead_items->num_items == 0);
-		}
-
 		/*
 		 * Handle setting visibility map bit based on information from the VM
 		 * (as of last lazy_scan_skip() call), and from prunestate
@@ -1209,38 +1146,44 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
-		 * FSM
+		 * FSM.
+		 *
+		 * If we will likely do index vacuuming, wait until
+		 * lazy_vacuum_heap_rel() to save free space. This doesn't just save
+		 * us some cycles; it also allows us to record any additional free
+		 * space that lazy_vacuum_heap_page() will make available in cases
+		 * where it's possible to truncate the page's line pointer array.
+		 *
+		 * Note: It's not in fact 100% certain that we really will call
+		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
+		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
+		 * because it only happens in emergencies, or when there is very
+		 * little free space anyway. (Besides, we start recording free space
+		 * in the FSM once index vacuuming has been abandoned.)
 		 */
-		if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
-		{
-			/*
-			 * Wait until lazy_vacuum_heap_rel() to save free space.  This
-			 * doesn't just save us some cycles; it also allows us to record
-			 * any additional free space that lazy_vacuum_heap_page() will
-			 * make available in cases where it's possible to truncate the
-			 * page's line pointer array.
-			 *
-			 * Note: It's not in fact 100% certain that we really will call
-			 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip
-			 * index vacuuming (and so must skip heap vacuuming).  This is
-			 * deemed okay because it only happens in emergencies, or when
-			 * there is very little free space anyway. (Besides, we start
-			 * recording free space in the FSM once index vacuuming has been
-			 * abandoned.)
-			 *
-			 * Note: The one-pass (no indexes) case is only supposed to make
-			 * it this far when there were no LP_DEAD items during pruning.
-			 */
-			Assert(vacrel->nindexes > 0);
-			UnlockReleaseBuffer(buf);
-		}
-		else
+		if (vacrel->nindexes == 0
+			|| !vacrel->do_index_vacuuming
+			|| !prunestate.has_lpdead_items)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
 			UnlockReleaseBuffer(buf);
 			RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
+
+			/*
+			 * Periodically perform FSM vacuuming to make newly-freed space
+			 * visible on upper FSM pages. This is done after vacuuming if the
+			 * table has indexes.
+			 */
+			if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+			{
+				FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
+										blkno);
+				next_fsm_block_to_vacuum = blkno;
+			}
 		}
+		else
+			UnlockReleaseBuffer(buf);
 	}
 
 	vacrel->blkno = InvalidBlockNumber;
@@ -1596,8 +1539,13 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * in presult.ndeleted. It should not be confused with lpdead_items;
 	 * lpdead_items's final value can be thought of as the number of tuples
 	 * that were deleted from indexes.
+	 *
+	 * If the relation has no indexes, we can immediately mark would-be dead
+	 * items LP_UNUSED, so mark_unused_now should be true if no indexes and
+	 * false otherwise.
 	 */
-	heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum);
+	heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+					&presult, &vacrel->offnum);
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
@@ -2520,7 +2468,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
 
-	Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
+	Assert(vacrel->do_index_vacuuming);
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 932ec0d6f2b..4b133f68593 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -320,6 +320,7 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
+							bool mark_unused_now,
 							PruneResult *presult,
 							OffsetNumber *off_loc);
 extern void heap_page_prune_execute(Buffer buffer,
-- 
2.37.2

v8-0004-Combine-FSM-updates-for-prune-and-no-prune-cases.patchapplication/x-patch; name=v8-0004-Combine-FSM-updates-for-prune-and-no-prune-cases.patchDownload
From d39ebcdc835f840d36f79a3fa05db6c2d5918cf0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Jan 2024 17:28:07 -0500
Subject: [PATCH v8 4/4] Combine FSM updates for prune and no prune cases

The goal is to update the FSM once for each page vacuumed. The logic for
ensuring this is the same for lazy_scan_prune() and lazy_scan_noprune().
Combine those FSM updates.

There is one functional difference. Now, the FSM may be vacuumed
whenever we update the FSM in lazy_scan_heap(). Previously, it was not
considered if lazy_scan_noprune() returned true or if we called
lazy_scan_prune() for a block of a relation with indexes (or a relation
with no indexes but which did not have any dead items after pruning).
---
 src/backend/access/heap/vacuumlazy.c | 66 ++++++++++++----------------
 1 file changed, 27 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 610740c44e7..86b087016d2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -816,6 +816,7 @@ lazy_scan_heap(LVRelState *vacrel)
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		next_unskippable_allvis,
 				skipping_current_range;
+	bool		do_prune;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -954,48 +955,28 @@ lazy_scan_heap(LVRelState *vacrel)
 
 			/*
 			 * Collect LP_DEAD items in dead_items array, count tuples,
-			 * determine if rel truncation is safe
+			 * determine if rel truncation is safe. If lazy_scan_noprune()
+			 * returns false, we must get a cleanup lock and call
+			 * lazy_scan_prune().
 			 */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page, &has_lpdead_items))
-			{
-				Size		freespace = 0;
-				bool		recordfreespace;
-
-				/*
-				 * We processed the page successfully (without a cleanup
-				 * lock).
-				 *
-				 * Update the FSM, just as we would in the case where
-				 * lazy_scan_prune() is called. Our goal is to update the
-				 * freespace map the last time we touch the page. If the
-				 * relation has no indexes, or if index vacuuming is disabled,
-				 * there will be no second heap pass; if this particular page
-				 * has no dead items, the second heap pass will not touch this
-				 * page. So, in those cases, update the FSM now.
-				 *
-				 * After a call to lazy_scan_prune(), we would also try to
-				 * adjust the page-level all-visible bit and the visibility
-				 * map, but we skip that step in this path.
-				 */
-				recordfreespace = vacrel->nindexes == 0
-					|| !vacrel->do_index_vacuuming
-					|| !has_lpdead_items;
-				if (recordfreespace)
-					freespace = PageGetHeapFreeSpace(page);
-				UnlockReleaseBuffer(buf);
-				if (recordfreespace)
-					RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-				continue;
-			}
+			do_prune = !lazy_scan_noprune(vacrel, buf, blkno, page,
+										  &has_lpdead_items);
 
 			/*
 			 * lazy_scan_noprune could not do all required processing.  Wait
-			 * for a cleanup lock, and call lazy_scan_prune in the usual way.
+			 * for a cleanup lock, and call lazy_scan_prune() in the usual
+			 * way.
 			 */
-			Assert(vacrel->aggressive);
-			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-			LockBufferForCleanup(buf);
+			if (do_prune)
+			{
+				Assert(vacrel->aggressive);
+				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+				LockBufferForCleanup(buf);
+			}
 		}
+		/* If we do get a cleanup lock, we will definitely prune */
+		else
+			do_prune = true;
 
 		/* Check for new or empty pages before lazy_scan_prune call */
 		if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, false, vmbuffer))
@@ -1014,9 +995,10 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * tuple headers of remaining items with storage. It also determines
 		 * if truncating this block is safe.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page,
-						vmbuffer, all_visible_according_to_vm,
-						&has_lpdead_items);
+		if (do_prune)
+			lazy_scan_prune(vacrel, buf, blkno, page,
+							vmbuffer, all_visible_according_to_vm,
+							&has_lpdead_items);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1028,6 +1010,12 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * space that lazy_vacuum_heap_page() will make available in cases
 		 * where it's possible to truncate the page's line pointer array.
 		 *
+		 * Our goal is to update the freespace map the last time we touch the
+		 * page. If the relation has no indexes, or if index vacuuming is
+		 * disabled, there will be no second heap pass; if this particular
+		 * page has no dead items, the second heap pass will not touch this
+		 * page. So, in those cases, update the FSM now.
+		 *
 		 * Note: It's not in fact 100% certain that we really will call
 		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
 		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
-- 
2.37.2

v8-0003-Inline-LVPagePruneState-members-in-lazy_scan_prun.patchapplication/x-patch; name=v8-0003-Inline-LVPagePruneState-members-in-lazy_scan_prun.patchDownload
From 615400a228dbfd6e0a48364d8b571cd94b422e43 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 9 Jan 2024 17:17:01 -0500
Subject: [PATCH v8 3/4] Inline LVPagePruneState members in lazy_scan_prune()

After moving the code to update the visibility map from lazy_scan_heap()
into lazy_scan_prune(), the LVPagePruneState is mainly obsolete. Replace
all instances of use of its members in lazy_scan_prune() with local
variables and remove the struct.

Only has_lpdead_items remains in use.
---
 src/backend/access/heap/vacuumlazy.c | 131 ++++++++++++++-------------
 src/tools/pgindent/typedefs.list     |   1 -
 2 files changed, 68 insertions(+), 64 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6535d8bdeea..610740c44e7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -212,23 +212,6 @@ typedef struct LVRelState
 	int64		missed_dead_tuples; /* # removable, but not removed */
 } LVRelState;
 
-/*
- * State returned by lazy_scan_prune()
- */
-typedef struct LVPagePruneState
-{
-	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
-
-	/*
-	 * State describes the proper VM bit states to set for the page following
-	 * pruning and freezing.  all_visible implies !has_lpdead_items, but don't
-	 * trust all_frozen result unless all_visible is also set to true.
-	 */
-	bool		all_visible;	/* Every item visible to all? */
-	bool		all_frozen;		/* provided all_visible is also true */
-	TransactionId visibility_cutoff_xid;	/* For recovery conflicts */
-} LVPagePruneState;
-
 /* Struct for saving and restoring vacuum error information. */
 typedef struct LVSavedErrInfo
 {
@@ -250,7 +233,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
-							LVPagePruneState *prunestate);
+							bool *has_lpdead_items);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *has_lpdead_items);
@@ -828,6 +811,7 @@ lazy_scan_heap(LVRelState *vacrel)
 				blkno,
 				next_unskippable_block,
 				next_fsm_block_to_vacuum = 0;
+	bool		has_lpdead_items;
 	VacDeadItems *dead_items = vacrel->dead_items;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		next_unskippable_allvis,
@@ -854,7 +838,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		Buffer		buf;
 		Page		page;
 		bool		all_visible_according_to_vm;
-		LVPagePruneState prunestate;
 
 		if (blkno == next_unskippable_block)
 		{
@@ -959,8 +942,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		has_lpdead_items;
-
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
 			/* Check for new or empty pages before lazy_scan_noprune call */
@@ -1035,7 +1016,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 */
 		lazy_scan_prune(vacrel, buf, blkno, page,
 						vmbuffer, all_visible_according_to_vm,
-						&prunestate);
+						&has_lpdead_items);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1056,7 +1037,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 */
 		if (vacrel->nindexes == 0
 			|| !vacrel->do_index_vacuuming
-			|| !prunestate.has_lpdead_items)
+			|| !has_lpdead_items)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
@@ -1382,6 +1363,14 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
  * right after this operation completes instead of in the middle of it. Note that
  * any tuple that becomes dead after the call to heap_page_prune() can't need to
  * be frozen, because it was visible to another session when vacuum started.
+ *
+ * vmbuffer is the buffer containing the VM block with visibility information
+ * for the heap block, blkno. all_visible_according_to_vm is the saved
+ * visibility status of the heap block looked up earlier by the caller. We
+ * won't rely entirely on this status, as it may be out of date.
+ *
+ * *has_lpdead_items is set to true or false depending on whether, upon return
+ * from this function, any LP_DEAD items are still present on the page.
  */
 static void
 lazy_scan_prune(LVRelState *vacrel,
@@ -1390,7 +1379,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				Page page,
 				Buffer vmbuffer,
 				bool all_visible_according_to_vm,
-				LVPagePruneState *prunestate)
+				bool *has_lpdead_items)
 {
 	Relation	rel = vacrel->rel;
 	OffsetNumber offnum,
@@ -1403,6 +1392,9 @@ lazy_scan_prune(LVRelState *vacrel,
 				recently_dead_tuples;
 	HeapPageFreeze pagefrz;
 	bool		hastup = false;
+	bool		all_visible,
+				all_frozen;
+	TransactionId visibility_cutoff_xid;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1444,12 +1436,21 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
-	 * requiring freezing among remaining tuples with storage
+	 * requiring freezing among remaining tuples with storage. Keep track of
+	 * the visibility cutoff xid for recovery conflicts.
 	 */
-	prunestate->has_lpdead_items = false;
-	prunestate->all_visible = true;
-	prunestate->all_frozen = true;
-	prunestate->visibility_cutoff_xid = InvalidTransactionId;
+	*has_lpdead_items = false;
+	visibility_cutoff_xid = InvalidTransactionId;
+
+	/*
+	 * We will update the VM after collecting LP_DEAD items and freezing
+	 * tuples. Keep track of whether or not the page is all_visible and
+	 * all_frozen and use this information to update the VM. all_visible
+	 * implies 0 lpdead_items, but don't trust all_frozen result unless
+	 * all_visible is also set to true.
+	 */
+	all_visible = true;
+	all_frozen = true;
 
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
@@ -1537,13 +1538,13 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * asynchronously. See SetHintBits for more info. Check that
 				 * the tuple is hinted xmin-committed because of that.
 				 */
-				if (prunestate->all_visible)
+				if (all_visible)
 				{
 					TransactionId xmin;
 
 					if (!HeapTupleHeaderXminCommitted(htup))
 					{
-						prunestate->all_visible = false;
+						all_visible = false;
 						break;
 					}
 
@@ -1555,14 +1556,14 @@ lazy_scan_prune(LVRelState *vacrel,
 					if (!TransactionIdPrecedes(xmin,
 											   vacrel->cutoffs.OldestXmin))
 					{
-						prunestate->all_visible = false;
+						all_visible = false;
 						break;
 					}
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, prunestate->visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
 						TransactionIdIsNormal(xmin))
-						prunestate->visibility_cutoff_xid = xmin;
+						visibility_cutoff_xid = xmin;
 				}
 				break;
 			case HEAPTUPLE_RECENTLY_DEAD:
@@ -1573,7 +1574,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * pruning.)
 				 */
 				recently_dead_tuples++;
-				prunestate->all_visible = false;
+				all_visible = false;
 				break;
 			case HEAPTUPLE_INSERT_IN_PROGRESS:
 
@@ -1584,11 +1585,11 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * results.  This assumption is a bit shaky, but it is what
 				 * acquire_sample_rows() does, so be consistent.
 				 */
-				prunestate->all_visible = false;
+				all_visible = false;
 				break;
 			case HEAPTUPLE_DELETE_IN_PROGRESS:
 				/* This is an expected case during concurrent vacuum */
-				prunestate->all_visible = false;
+				all_visible = false;
 
 				/*
 				 * Count such rows as live.  As above, we assume the deleting
@@ -1618,7 +1619,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * definitely cannot be set all-frozen in the visibility map later on
 		 */
 		if (!totally_frozen)
-			prunestate->all_frozen = false;
+			all_frozen = false;
 	}
 
 	/*
@@ -1636,7 +1637,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * page all-frozen afterwards (might not happen until final heap pass).
 	 */
 	if (pagefrz.freeze_required || tuples_frozen == 0 ||
-		(prunestate->all_visible && prunestate->all_frozen &&
+		(all_visible && all_frozen &&
 		 fpi_before != pgWalUsage.wal_fpi))
 	{
 		/*
@@ -1674,11 +1675,11 @@ lazy_scan_prune(LVRelState *vacrel,
 			 * once we're done with it.  Otherwise we generate a conservative
 			 * cutoff by stepping back from OldestXmin.
 			 */
-			if (prunestate->all_visible && prunestate->all_frozen)
+			if (all_visible && all_frozen)
 			{
 				/* Using same cutoff when setting VM is now unnecessary */
-				snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
-				prunestate->visibility_cutoff_xid = InvalidTransactionId;
+				snapshotConflictHorizon = visibility_cutoff_xid;
+				visibility_cutoff_xid = InvalidTransactionId;
 			}
 			else
 			{
@@ -1701,7 +1702,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 */
 		vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
 		vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
-		prunestate->all_frozen = false;
+		all_frozen = false;
 		tuples_frozen = 0;		/* avoid miscounts in instrumentation */
 	}
 
@@ -1714,16 +1715,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	 */
 #ifdef USE_ASSERT_CHECKING
 	/* Note that all_frozen value does not matter when !all_visible */
-	if (prunestate->all_visible && lpdead_items == 0)
+	if (all_visible && lpdead_items == 0)
 	{
-		TransactionId cutoff;
-		bool		all_frozen;
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen))
+		if (!heap_page_is_all_visible(vacrel, buf,
+									  &debug_cutoff, &debug_all_frozen))
 			Assert(false);
 
-		Assert(!TransactionIdIsValid(cutoff) ||
-			   cutoff == prunestate->visibility_cutoff_xid);
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == visibility_cutoff_xid);
 	}
 #endif
 
@@ -1736,7 +1738,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		ItemPointerData tmp;
 
 		vacrel->lpdead_item_pages++;
-		prunestate->has_lpdead_items = true;
 
 		ItemPointerSetBlockNumber(&tmp, blkno);
 
@@ -1761,7 +1762,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * Now that freezing has been finalized, unset all_visible.  It needs
 		 * to reflect the present state of things, as expected by our caller.
 		 */
-		prunestate->all_visible = false;
+		all_visible = false;
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1775,19 +1776,23 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
 
-	Assert(!prunestate->all_visible || !prunestate->has_lpdead_items);
+	/* Did we find LP_DEAD items? */
+	*has_lpdead_items = (lpdead_items > 0);
+
+	Assert(!all_visible || !(*has_lpdead_items));
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last lazy_scan_skip() call), and from prunestate
+	 * of last lazy_scan_skip() call), and from all_visible and all_frozen
+	 * variables
 	 */
-	if (!all_visible_according_to_vm && prunestate->all_visible)
+	if (!all_visible_according_to_vm && all_visible)
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
-		if (prunestate->all_frozen)
+		if (all_frozen)
 		{
-			Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
@@ -1807,7 +1812,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
 		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-						  vmbuffer, prunestate->visibility_cutoff_xid,
+						  vmbuffer, visibility_cutoff_xid,
 						  flags);
 	}
 
@@ -1840,7 +1845,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
 	 * however.
 	 */
-	else if (prunestate->has_lpdead_items && PageIsAllVisible(page))
+	else if (lpdead_items > 0 && PageIsAllVisible(page))
 	{
 		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
 			 vacrel->relname, blkno);
@@ -1853,16 +1858,16 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both prunestate fields.
+	 * true, so we must check both all_visible and all_frozen.
 	 */
-	else if (all_visible_according_to_vm && prunestate->all_visible &&
-			 prunestate->all_frozen &&
+	else if (all_visible_according_to_vm && all_visible &&
+			 all_frozen &&
 			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		/*
 		 * Avoid relying on all_visible_according_to_vm as a proxy for the
 		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set in prunestate
+		 * stale -- even when all_visible is set
 		 */
 		if (!PageIsAllVisible(page))
 		{
@@ -1877,7 +1882,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * since a snapshotConflictHorizon sufficient to make everything safe
 		 * for REDO was logged when the page's tuples were frozen.
 		 */
-		Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+		Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
 						  vmbuffer, InvalidTransactionId,
 						  VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f582eb59e7d..e0b0ed53d66 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1405,7 +1405,6 @@ LPVOID
 LPWSTR
 LSEG
 LUID
-LVPagePruneState
 LVRelState
 LVSavedErrInfo
 LWLock
-- 
2.37.2

#71Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#70)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Tue, Jan 16, 2024 at 6:07 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Attached v8 patch set is rebased over this.

Reviewing 0001, consider the case where a table has no indexes.
Pre-patch, PageTruncateLinePointerArray() will happen when
lazy_vacuum_heap_page() is called; post-patch, it will not occur.
That's a loss. Should we care? On the plus side, visibility map
repair, if required, can now take place. That's a gain.

I'm otherwise satisfied with this patch now, except for some extremely
minor nitpicking:

+ * For now, pass mark_unused_now == false regardless of whether or

Personally, i would write "pass mark_unused_now as false" here,
because we're not testing equality. Or else "pass mark_unused_now =
false". This is not an equality test.

+ * During pruning, the caller may have passed mark_unused_now == true,

Again here, but also, this is referring to the name of a parameter to
a function whose name is not given. I think this this should either
talk fully in terms of code ("When heap_page_prune was called,
mark_unused_now may have been passed as true, which allows would-be
LP_DEAD items to be made LP_USED instead.") or fully in English
("During pruning, we may have decided to mark would-be dead items as
unused.").

In 0004, I've taken the approach you seem to favor and combined the FSM
updates from the prune and no prune cases in lazy_scan_heap() instead
of pushing the FSM updates into lazy_scan_prune() and
lazy_scan_noprune().

I do like that approach.

I think do_prune can be declared one scope inward, in the per-block
for loop. I would probably initialize it to true so I could drop the
stubby else block:

+               /* If we do get a cleanup lock, we will definitely prune */
+               else
+                       do_prune = true;

And then I'd probably write the call as if (!lazy_scan_noprune())
doprune = true.

If I wanted to stick with not initializing do_prune, I'd put the
comment inside as /* We got the cleanup lock, so we will definitely
prune */ and add braces since that makes it a two-line block.

I did not guard against calling lazy_scan_new_or_empty() a second time
in the case that lazy_scan_noprune() was called. I can do this. I
mentioned upthread I found it confusing for lazy_scan_new_or_empty()
to be guarded by if (do_prune). The overhead of calling it wouldn't be
terribly high. I can change that based on your opinion of what is
better.

To me, the relevant question here isn't reader confusion, because that
can be cleared up with a comment explaining why we do or do not test
do_prune. Indeed, I'd say it's a textbook example of when you should
comment a test: when it might look wrong to the reader but you have a
good reason for doing it.

But that brings me to what I think the real question is here: do we,
uh, have a good reason for doing it? At first blush the structure
looks a bit odd here. lazy_scan_new_or_empty() is intended to handle
PageIsNew() and PageIsEmpty() cases, lazy_scan_noprune() the cases
where we process the page without a cleanup lock, and
lazy_scan_prune() the regular case. So you might think that
lazy_scan_new_or_empty() would always be applied *first*, that we
would then conditionally apply lazy_scan_noprune(), and finally
conditionally apply lazy_scan_prune(). Like this:

bool got_cleanup_lock = ConditionalLockBufferForCleanup(buf);
if (lazy_scan_new_or_empty())
continue;
if (!got_cleanup_lock && !lazy_scan_noprune())
{
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
LockBufferForCleanup(buf);
got_cleanup_lock = true;
}
if (got_cleanup_lock)
lazy_scan_prune();

The current organization of the code seems to imply that we don't need
to worry about the PageIsNew() and PageIsEmpty() cases before calling
lazy_scan_noprune(), and at the moment I'm not understanding why that
should be the case. I wonder if this is a bug, or if I'm just
confused.

The big open question/functional change is when we consider vacuuming
the FSM. Previously, we only considered vacuuming the FSM in the no
indexes, has dead items case. After combining the FSM updates from
lazy_scan_prune()'s no indexes/has lpdead items case,
lazy_scan_prune()'s regular case, and lazy_scan_noprune(), all of them
consider vacuuming the FSM. I could guard against this, but I wasn't
sure why we would want to only vacuum the FSM in the no indexes/has
dead items case.

I don't get it. Conceptually, I agree that we don't want to be
inconsistent here without some good reason. One of the big advantages
of unifying different code paths is that you avoid being accidentally
inconsistent. If different things are different it shows up as a test
in the code instead of just having different code paths in different
places that may or may not match.

But I thought the whole point of
45d395cd75ffc5b4c824467140127a5d11696d4c was to iron out the existing
inconsistencies so that we could unify this code without having to
change any more behavior. In particular, I thought we just made it
consistently adhere to the principle Peter articulated, where we
record free space when we're touching the page for presumptively the
last time. I gather that you think it's still not consistent, but I
don't understand what the remaining inconsistency is. Can you explain
further?

--
Robert Haas
EDB: http://www.enterprisedb.com

#72Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#71)
4 attachment(s)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 17, 2024 at 12:17 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 16, 2024 at 6:07 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Attached v8 patch set is rebased over this.

Reviewing 0001, consider the case where a table has no indexes.
Pre-patch, PageTruncateLinePointerArray() will happen when
lazy_vacuum_heap_page() is called; post-patch, it will not occur.
That's a loss. Should we care? On the plus side, visibility map
repair, if required, can now take place. That's a gain.

I thought that this wasn't an issue because heap_page_prune_execute()
calls PageRepairFragmentation() which similarly modifies pd_lower and
sets the hint bit about free line pointers.

I'm otherwise satisfied with this patch now, except for some extremely
minor nitpicking:

+ * For now, pass mark_unused_now == false regardless of whether or

Personally, i would write "pass mark_unused_now as false" here,
because we're not testing equality. Or else "pass mark_unused_now =
false". This is not an equality test.

+ * During pruning, the caller may have passed mark_unused_now == true,

Again here, but also, this is referring to the name of a parameter to
a function whose name is not given. I think this this should either
talk fully in terms of code ("When heap_page_prune was called,
mark_unused_now may have been passed as true, which allows would-be
LP_DEAD items to be made LP_USED instead.") or fully in English
("During pruning, we may have decided to mark would-be dead items as
unused.").

Fixed both of the above issues as suggested in attached v9

In 0004, I've taken the approach you seem to favor and combined the FSM
updates from the prune and no prune cases in lazy_scan_heap() instead
of pushing the FSM updates into lazy_scan_prune() and
lazy_scan_noprune().

I do like that approach.

I think do_prune can be declared one scope inward, in the per-block
for loop. I would probably initialize it to true so I could drop the
stubby else block:

+               /* If we do get a cleanup lock, we will definitely prune */
+               else
+                       do_prune = true;

And then I'd probably write the call as if (!lazy_scan_noprune())
doprune = true.

If I wanted to stick with not initializing do_prune, I'd put the
comment inside as /* We got the cleanup lock, so we will definitely
prune */ and add braces since that makes it a two-line block.

If we don't unconditionally set do_prune using the result of
lazy_scan_noprune(), then we cannot leave do_prune uninitialized. I
preferred having it uninitialized, as it didn't imply that doing
pruning was the default. Also, it made it simpler to have that comment
about always pruning when we get the cleanup lock.

However, with the changes you mentioned below (got_cleanup_lock), this
discussion is moot.

I did not guard against calling lazy_scan_new_or_empty() a second time
in the case that lazy_scan_noprune() was called. I can do this. I
mentioned upthread I found it confusing for lazy_scan_new_or_empty()
to be guarded by if (do_prune). The overhead of calling it wouldn't be
terribly high. I can change that based on your opinion of what is
better.

To me, the relevant question here isn't reader confusion, because that
can be cleared up with a comment explaining why we do or do not test
do_prune. Indeed, I'd say it's a textbook example of when you should
comment a test: when it might look wrong to the reader but you have a
good reason for doing it.

But that brings me to what I think the real question is here: do we,
uh, have a good reason for doing it? At first blush the structure
looks a bit odd here. lazy_scan_new_or_empty() is intended to handle
PageIsNew() and PageIsEmpty() cases, lazy_scan_noprune() the cases
where we process the page without a cleanup lock, and
lazy_scan_prune() the regular case. So you might think that
lazy_scan_new_or_empty() would always be applied *first*, that we
would then conditionally apply lazy_scan_noprune(), and finally
conditionally apply lazy_scan_prune(). Like this:

bool got_cleanup_lock = ConditionalLockBufferForCleanup(buf);
if (lazy_scan_new_or_empty())
continue;
if (!got_cleanup_lock && !lazy_scan_noprune())
{
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
LockBufferForCleanup(buf);
got_cleanup_lock = true;
}
if (got_cleanup_lock)
lazy_scan_prune();

The current organization of the code seems to imply that we don't need
to worry about the PageIsNew() and PageIsEmpty() cases before calling
lazy_scan_noprune(), and at the moment I'm not understanding why that
should be the case. I wonder if this is a bug, or if I'm just
confused.

Yes, I also spent some time thinking about this. In master, we do
always call lazy_scan_new_or_empty() before calling
lazy_scan_noprune(). The code is aiming to ensure we call
lazy_scan_new_or_empty() once before calling either of
lazy_scan_noprune() or lazy_scan_prune(). I think it didn't call
lazy_scan_new_or_empty() unconditionally first because of the
different lock types expected. But, your structure has solved that.
I've used a version of your example code above in attached v9. It is
much nicer.

The big open question/functional change is when we consider vacuuming
the FSM. Previously, we only considered vacuuming the FSM in the no
indexes, has dead items case. After combining the FSM updates from
lazy_scan_prune()'s no indexes/has lpdead items case,
lazy_scan_prune()'s regular case, and lazy_scan_noprune(), all of them
consider vacuuming the FSM. I could guard against this, but I wasn't
sure why we would want to only vacuum the FSM in the no indexes/has
dead items case.

I don't get it. Conceptually, I agree that we don't want to be
inconsistent here without some good reason. One of the big advantages
of unifying different code paths is that you avoid being accidentally
inconsistent. If different things are different it shows up as a test
in the code instead of just having different code paths in different
places that may or may not match.

But I thought the whole point of
45d395cd75ffc5b4c824467140127a5d11696d4c was to iron out the existing
inconsistencies so that we could unify this code without having to
change any more behavior. In particular, I thought we just made it
consistently adhere to the principle Peter articulated, where we
record free space when we're touching the page for presumptively the
last time. I gather that you think it's still not consistent, but I
don't understand what the remaining inconsistency is. Can you explain
further?

Ah, I realize I was not clear. I am now talking about inconsistencies
in vacuuming the FSM itself. FreeSpaceMapVacuumRange(). Not updating
the freespace map during the course of vacuuming the heap relation.

- Melanie

Attachments:

v9-0002-Move-VM-update-code-to-lazy_scan_prune.patchtext/x-patch; charset=US-ASCII; name=v9-0002-Move-VM-update-code-to-lazy_scan_prune.patchDownload
From 1806d60e2eaf968cf816509c564715562836c49d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 5 Jan 2024 11:12:58 -0500
Subject: [PATCH v9 2/4] Move VM update code to lazy_scan_prune()

The visibility map is updated after lazy_scan_prune() returns in
lazy_scan_heap(). Most of the output parameters of lazy_scan_prune() are
used to update the VM in lazy_scan_heap(). Moving that code into
lazy_scan_prune() simplifies lazy_scan_heap() and requires less
communication between the two. This is a pure lift and shift. No VM
update logic is changed. All but one of the members of the prunestate
are now obsolete. It will be removed in a future commit.
---
 src/backend/access/heap/vacuumlazy.c | 226 ++++++++++++++-------------
 1 file changed, 115 insertions(+), 111 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c7ef879be4..6535d8bdee 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -249,6 +249,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
+							Buffer vmbuffer, bool all_visible_according_to_vm,
 							LVPagePruneState *prunestate);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
@@ -1032,117 +1033,9 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * tuple headers of remaining items with storage. It also determines
 		 * if truncating this block is safe.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate);
-
-		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
-
-		/*
-		 * Handle setting visibility map bit based on information from the VM
-		 * (as of last lazy_scan_skip() call), and from prunestate
-		 */
-		if (!all_visible_according_to_vm && prunestate.all_visible)
-		{
-			uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-			if (prunestate.all_frozen)
-			{
-				Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-				flags |= VISIBILITYMAP_ALL_FROZEN;
-			}
-
-			/*
-			 * It should never be the case that the visibility map page is set
-			 * while the page-level bit is clear, but the reverse is allowed
-			 * (if checksums are not enabled).  Regardless, set both bits so
-			 * that we get back in sync.
-			 *
-			 * NB: If the heap page is all-visible but the VM bit is not set,
-			 * we don't need to dirty the heap page.  However, if checksums
-			 * are enabled, we do need to make sure that the heap page is
-			 * dirtied before passing it to visibilitymap_set(), because it
-			 * may be logged.  Given that this situation should only happen in
-			 * rare cases after a crash, it is not worth optimizing.
-			 */
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, prunestate.visibility_cutoff_xid,
-							  flags);
-		}
-
-		/*
-		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
-		 * the page-level bit is clear.  However, it's possible that the bit
-		 * got cleared after lazy_scan_skip() was called, so we must recheck
-		 * with buffer lock before concluding that the VM is corrupt.
-		 */
-		else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-				 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-		{
-			elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-				 vacrel->relname, blkno);
-			visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-								VISIBILITYMAP_VALID_BITS);
-		}
-
-		/*
-		 * It's possible for the value returned by
-		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-		 * wrong for us to see tuples that appear to not be visible to
-		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
-		 * xmin value never moves backwards, but
-		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
-		 * returns a value that's unnecessarily small, so if we see that
-		 * contradiction it just means that the tuples that we think are not
-		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
-		 * is correct.
-		 *
-		 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE
-		 * set, however.
-		 */
-		else if (prunestate.has_lpdead_items && PageIsAllVisible(page))
-		{
-			elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-				 vacrel->relname, blkno);
-			PageClearAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-								VISIBILITYMAP_VALID_BITS);
-		}
-
-		/*
-		 * If the all-visible page is all-frozen but not marked as such yet,
-		 * mark it as all-frozen.  Note that all_frozen is only valid if
-		 * all_visible is true, so we must check both prunestate fields.
-		 */
-		else if (all_visible_according_to_vm && prunestate.all_visible &&
-				 prunestate.all_frozen &&
-				 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-		{
-			/*
-			 * Avoid relying on all_visible_according_to_vm as a proxy for the
-			 * page-level PD_ALL_VISIBLE bit being set, since it might have
-			 * become stale -- even when all_visible is set in prunestate
-			 */
-			if (!PageIsAllVisible(page))
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buf);
-			}
-
-			/*
-			 * Set the page all-frozen (and all-visible) in the VM.
-			 *
-			 * We can pass InvalidTransactionId as our visibility_cutoff_xid,
-			 * since a snapshotConflictHorizon sufficient to make everything
-			 * safe for REDO was logged when the page's tuples were frozen.
-			 */
-			Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
-		}
+		lazy_scan_prune(vacrel, buf, blkno, page,
+						vmbuffer, all_visible_according_to_vm,
+						&prunestate);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1495,6 +1388,8 @@ lazy_scan_prune(LVRelState *vacrel,
 				Buffer buf,
 				BlockNumber blkno,
 				Page page,
+				Buffer vmbuffer,
+				bool all_visible_according_to_vm,
 				LVPagePruneState *prunestate)
 {
 	Relation	rel = vacrel->rel;
@@ -1879,6 +1774,115 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Can't truncate this page */
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
+
+	Assert(!prunestate->all_visible || !prunestate->has_lpdead_items);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last lazy_scan_skip() call), and from prunestate
+	 */
+	if (!all_visible_according_to_vm && prunestate->all_visible)
+	{
+		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
+
+		if (prunestate->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+			flags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled).  Regardless, set both bits so that we
+		 * get back in sync.
+		 *
+		 * NB: If the heap page is all-visible but the VM bit is not set, we
+		 * don't need to dirty the heap page.  However, if checksums are
+		 * enabled, we do need to make sure that the heap page is dirtied
+		 * before passing it to visibilitymap_set(), because it may be logged.
+		 * Given that this situation should only happen in rare cases after a
+		 * crash, it is not worth optimizing.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, prunestate->visibility_cutoff_xid,
+						  flags);
+	}
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after lazy_scan_skip() was called, so we must recheck with
+	 * buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
+			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 vacrel->relname, blkno);
+		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prunestate->has_lpdead_items && PageIsAllVisible(page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 vacrel->relname, blkno);
+		PageClearAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * If the all-visible page is all-frozen but not marked as such yet, mark
+	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
+	 * true, so we must check both prunestate fields.
+	 */
+	else if (all_visible_according_to_vm && prunestate->all_visible &&
+			 prunestate->all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	{
+		/*
+		 * Avoid relying on all_visible_according_to_vm as a proxy for the
+		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
+		 * stale -- even when all_visible is set in prunestate
+		 */
+		if (!PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
+		/*
+		 * Set the page all-frozen (and all-visible) in the VM.
+		 *
+		 * We can pass InvalidTransactionId as our visibility_cutoff_xid,
+		 * since a snapshotConflictHorizon sufficient to make everything safe
+		 * for REDO was logged when the page's tuples were frozen.
+		 */
+		Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, InvalidTransactionId,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
+	}
 }
 
 /*
-- 
2.37.2

v9-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchtext/x-patch; charset=US-ASCII; name=v9-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchDownload
From 05305fa4c6c13bfdbc1b15bfb7bde4e2d1bd70f0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Jan 2024 17:41:55 -0500
Subject: [PATCH v9 1/4] Set would-be dead items LP_UNUSED while pruning

If there are no indexes on a relation, items can be marked LP_UNUSED
instead of LP_DEAD during while pruning. This avoids a separate
invocation of lazy_vacuum_heap_page() and saves a vacuum WAL record.

To accomplish this, pass heap_page_prune() a new parameter, mark_unused_now,
which indicates that dead line pointers should be set to LP_UNUSED
during pruning, allowing earlier reaping of tuples.

We only enable this for vacuum's pruning. On access pruning always
passes mark_unused_now == false.

Discussion: https://postgr.es/m/CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  |  80 ++++++++++++++---
 src/backend/access/heap/vacuumlazy.c | 128 ++++++++-------------------
 src/include/access/heapam.h          |   1 +
 3 files changed, 107 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3e0a1a260e..5917633567 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -35,6 +35,8 @@ typedef struct
 
 	/* tuple visibility test, initialized for the relation */
 	GlobalVisState *vistest;
+	/* whether or not dead items can be set LP_UNUSED during pruning */
+	bool		mark_unused_now;
 
 	TransactionId new_prune_xid;	/* new prune hint value for page */
 	TransactionId snapshotConflictHorizon;	/* latest xid removed */
@@ -67,6 +69,7 @@ static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum);
 static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum);
 static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
 static void page_verify_redirects(Page page);
 
@@ -148,7 +151,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			PruneResult presult;
 
-			heap_page_prune(relation, buffer, vistest, &presult, NULL);
+			/*
+			 * For now, pass mark_unused_now as false regardless of whether or
+			 * not the relation has indexes, since we cannot safely determine
+			 * that during on-access pruning with the current implementation.
+			 */
+			heap_page_prune(relation, buffer, vistest, false,
+							&presult, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -193,6 +202,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * (see heap_prune_satisfies_vacuum and
  * HeapTupleSatisfiesVacuum).
  *
+ * mark_unused_now indicates whether or not dead items can be set LP_UNUSED during
+ * pruning.
+ *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -203,6 +215,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 void
 heap_page_prune(Relation relation, Buffer buffer,
 				GlobalVisState *vistest,
+				bool mark_unused_now,
 				PruneResult *presult,
 				OffsetNumber *off_loc)
 {
@@ -227,6 +240,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 	prstate.new_prune_xid = InvalidTransactionId;
 	prstate.rel = relation;
 	prstate.vistest = vistest;
+	prstate.mark_unused_now = mark_unused_now;
 	prstate.snapshotConflictHorizon = InvalidTransactionId;
 	prstate.nredirected = prstate.ndead = prstate.nunused = 0;
 	memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -306,9 +320,9 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (off_loc)
 			*off_loc = offnum;
 
-		/* Nothing to do if slot is empty or already dead */
+		/* Nothing to do if slot is empty */
 		itemid = PageGetItemId(page, offnum);
-		if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+		if (!ItemIdIsUsed(itemid))
 			continue;
 
 		/* Process this item or chain of items */
@@ -581,7 +595,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * function.)
 		 */
 		if (ItemIdIsDead(lp))
+		{
+			/*
+			 * If the caller set mark_unused_now true, we can set dead line
+			 * pointers LP_UNUSED now. We don't increment ndeleted here since
+			 * the LP was already marked dead.
+			 */
+			if (unlikely(prstate->mark_unused_now))
+				heap_prune_record_unused(prstate, offnum);
+
 			break;
+		}
 
 		Assert(ItemIdIsNormal(lp));
 		htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -715,7 +739,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * redirect the root to the correct chain member.
 		 */
 		if (i >= nchain)
-			heap_prune_record_dead(prstate, rootoffnum);
+			heap_prune_record_dead_or_unused(prstate, rootoffnum);
 		else
 			heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
 	}
@@ -726,9 +750,9 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * item.  This can happen if the loop in heap_page_prune caused us to
 		 * visit the dead successor of a redirect item before visiting the
 		 * redirect item.  We can clean up by setting the redirect item to
-		 * DEAD state.
+		 * DEAD state or LP_UNUSED if the caller indicated.
 		 */
-		heap_prune_record_dead(prstate, rootoffnum);
+		heap_prune_record_dead_or_unused(prstate, rootoffnum);
 	}
 
 	return ndeleted;
@@ -774,6 +798,27 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
 	prstate->marked[offnum] = true;
 }
 
+/*
+ * Depending on whether or not the caller set mark_unused_now to true, record that a
+ * line pointer should be marked LP_DEAD or LP_UNUSED. There are other cases in
+ * which we will mark line pointers LP_UNUSED, but we will not mark line
+ * pointers LP_DEAD if mark_unused_now is true.
+ */
+static void
+heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
+{
+	/*
+	 * If the caller set mark_unused_now to true, we can remove dead tuples
+	 * during pruning instead of marking their line pointers dead. Set this
+	 * tuple's line pointer LP_UNUSED. We hint that this option is less
+	 * likely.
+	 */
+	if (unlikely(prstate->mark_unused_now))
+		heap_prune_record_unused(prstate, offnum);
+	else
+		heap_prune_record_dead(prstate, offnum);
+}
+
 /* Record line pointer to be marked unused */
 static void
 heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
@@ -903,13 +948,24 @@ heap_page_prune_execute(Buffer buffer,
 #ifdef USE_ASSERT_CHECKING
 
 		/*
-		 * Only heap-only tuples can become LP_UNUSED during pruning.  They
-		 * don't need to be left in place as LP_DEAD items until VACUUM gets
-		 * around to doing index vacuuming.
+		 * When heap_page_prune() was called, mark_unused_now may have been
+		 * passed as true, which allows would-be LP_DEAD items to be made
+		 * LP_UNUSED instead. This is only possible if the relation has no
+		 * indexes. If there are any dead items, then mark_unused_now was not
+		 * true and every item being marked LP_UNUSED must refer to a
+		 * heap-only tuple.
 		 */
-		Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
-		htup = (HeapTupleHeader) PageGetItem(page, lp);
-		Assert(HeapTupleHeaderIsHeapOnly(htup));
+		if (ndead > 0)
+		{
+			Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
+			htup = (HeapTupleHeader) PageGetItem(page, lp);
+			Assert(HeapTupleHeaderIsHeapOnly(htup));
+		}
+		else
+		{
+			Assert(ItemIdIsUsed(lp));
+		}
+
 #endif
 
 		ItemIdSetUnused(lp);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e7a942e183..c7ef879be4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1036,69 +1036,6 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
-		if (vacrel->nindexes == 0)
-		{
-			/*
-			 * Consider the need to do page-at-a-time heap vacuuming when
-			 * using the one-pass strategy now.
-			 *
-			 * The one-pass strategy will never call lazy_vacuum().  The steps
-			 * performed here can be thought of as the one-pass equivalent of
-			 * a call to lazy_vacuum().
-			 */
-			if (prunestate.has_lpdead_items)
-			{
-				Size		freespace;
-
-				lazy_vacuum_heap_page(vacrel, blkno, buf, 0, vmbuffer);
-
-				/* Forget the LP_DEAD items that we just vacuumed */
-				dead_items->num_items = 0;
-
-				/*
-				 * Now perform FSM processing for blkno, and move on to next
-				 * page.
-				 *
-				 * Our call to lazy_vacuum_heap_page() will have considered if
-				 * it's possible to set all_visible/all_frozen independently
-				 * of lazy_scan_prune().  Note that prunestate was invalidated
-				 * by lazy_vacuum_heap_page() call.
-				 */
-				freespace = PageGetHeapFreeSpace(page);
-
-				UnlockReleaseBuffer(buf);
-				RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-
-				/*
-				 * Periodically perform FSM vacuuming to make newly-freed
-				 * space visible on upper FSM pages. FreeSpaceMapVacuumRange()
-				 * vacuums the portion of the freespace map covering heap
-				 * pages from start to end - 1. Include the block we just
-				 * vacuumed by passing it blkno + 1. Overflow isn't an issue
-				 * because MaxBlockNumber + 1 is InvalidBlockNumber which
-				 * causes FreeSpaceMapVacuumRange() to vacuum freespace map
-				 * pages covering the remainder of the relation.
-				 */
-				if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
-				{
-					FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-											blkno + 1);
-					next_fsm_block_to_vacuum = blkno + 1;
-				}
-
-				continue;
-			}
-
-			/*
-			 * There was no call to lazy_vacuum_heap_page() because pruning
-			 * didn't encounter/create any LP_DEAD items that needed to be
-			 * vacuumed.  Prune state has not been invalidated, so proceed
-			 * with prunestate-driven visibility map and FSM steps (just like
-			 * the two-pass strategy).
-			 */
-			Assert(dead_items->num_items == 0);
-		}
-
 		/*
 		 * Handle setting visibility map bit based on information from the VM
 		 * (as of last lazy_scan_skip() call), and from prunestate
@@ -1209,38 +1146,44 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
-		 * FSM
+		 * FSM.
+		 *
+		 * If we will likely do index vacuuming, wait until
+		 * lazy_vacuum_heap_rel() to save free space. This doesn't just save
+		 * us some cycles; it also allows us to record any additional free
+		 * space that lazy_vacuum_heap_page() will make available in cases
+		 * where it's possible to truncate the page's line pointer array.
+		 *
+		 * Note: It's not in fact 100% certain that we really will call
+		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
+		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
+		 * because it only happens in emergencies, or when there is very
+		 * little free space anyway. (Besides, we start recording free space
+		 * in the FSM once index vacuuming has been abandoned.)
 		 */
-		if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
-		{
-			/*
-			 * Wait until lazy_vacuum_heap_rel() to save free space.  This
-			 * doesn't just save us some cycles; it also allows us to record
-			 * any additional free space that lazy_vacuum_heap_page() will
-			 * make available in cases where it's possible to truncate the
-			 * page's line pointer array.
-			 *
-			 * Note: It's not in fact 100% certain that we really will call
-			 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip
-			 * index vacuuming (and so must skip heap vacuuming).  This is
-			 * deemed okay because it only happens in emergencies, or when
-			 * there is very little free space anyway. (Besides, we start
-			 * recording free space in the FSM once index vacuuming has been
-			 * abandoned.)
-			 *
-			 * Note: The one-pass (no indexes) case is only supposed to make
-			 * it this far when there were no LP_DEAD items during pruning.
-			 */
-			Assert(vacrel->nindexes > 0);
-			UnlockReleaseBuffer(buf);
-		}
-		else
+		if (vacrel->nindexes == 0
+			|| !vacrel->do_index_vacuuming
+			|| !prunestate.has_lpdead_items)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
 			UnlockReleaseBuffer(buf);
 			RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
+
+			/*
+			 * Periodically perform FSM vacuuming to make newly-freed space
+			 * visible on upper FSM pages. This is done after vacuuming if the
+			 * table has indexes.
+			 */
+			if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+			{
+				FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
+										blkno);
+				next_fsm_block_to_vacuum = blkno;
+			}
 		}
+		else
+			UnlockReleaseBuffer(buf);
 	}
 
 	vacrel->blkno = InvalidBlockNumber;
@@ -1596,8 +1539,13 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * in presult.ndeleted. It should not be confused with lpdead_items;
 	 * lpdead_items's final value can be thought of as the number of tuples
 	 * that were deleted from indexes.
+	 *
+	 * If the relation has no indexes, we can immediately mark would-be dead
+	 * items LP_UNUSED, so mark_unused_now should be true if no indexes and
+	 * false otherwise.
 	 */
-	heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum);
+	heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+					&presult, &vacrel->offnum);
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
@@ -2520,7 +2468,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
 
-	Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
+	Assert(vacrel->do_index_vacuuming);
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 932ec0d6f2..4b133f6859 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -320,6 +320,7 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
+							bool mark_unused_now,
 							PruneResult *presult,
 							OffsetNumber *off_loc);
 extern void heap_page_prune_execute(Buffer buffer,
-- 
2.37.2

v9-0003-Inline-LVPagePruneState-members-in-lazy_scan_prun.patchtext/x-patch; charset=US-ASCII; name=v9-0003-Inline-LVPagePruneState-members-in-lazy_scan_prun.patchDownload
From 7f126b1e76c975179960c3f60300f73b087497b9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 9 Jan 2024 17:17:01 -0500
Subject: [PATCH v9 3/4] Inline LVPagePruneState members in lazy_scan_prune()

After moving the code to update the visibility map from lazy_scan_heap()
into lazy_scan_prune(), the LVPagePruneState is mainly obsolete. Replace
all instances of use of its members in lazy_scan_prune() with local
variables and remove the struct.

Only has_lpdead_items remains in use.
---
 src/backend/access/heap/vacuumlazy.c | 131 ++++++++++++++-------------
 src/tools/pgindent/typedefs.list     |   1 -
 2 files changed, 68 insertions(+), 64 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6535d8bdee..610740c44e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -212,23 +212,6 @@ typedef struct LVRelState
 	int64		missed_dead_tuples; /* # removable, but not removed */
 } LVRelState;
 
-/*
- * State returned by lazy_scan_prune()
- */
-typedef struct LVPagePruneState
-{
-	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
-
-	/*
-	 * State describes the proper VM bit states to set for the page following
-	 * pruning and freezing.  all_visible implies !has_lpdead_items, but don't
-	 * trust all_frozen result unless all_visible is also set to true.
-	 */
-	bool		all_visible;	/* Every item visible to all? */
-	bool		all_frozen;		/* provided all_visible is also true */
-	TransactionId visibility_cutoff_xid;	/* For recovery conflicts */
-} LVPagePruneState;
-
 /* Struct for saving and restoring vacuum error information. */
 typedef struct LVSavedErrInfo
 {
@@ -250,7 +233,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
-							LVPagePruneState *prunestate);
+							bool *has_lpdead_items);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *has_lpdead_items);
@@ -828,6 +811,7 @@ lazy_scan_heap(LVRelState *vacrel)
 				blkno,
 				next_unskippable_block,
 				next_fsm_block_to_vacuum = 0;
+	bool		has_lpdead_items;
 	VacDeadItems *dead_items = vacrel->dead_items;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		next_unskippable_allvis,
@@ -854,7 +838,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		Buffer		buf;
 		Page		page;
 		bool		all_visible_according_to_vm;
-		LVPagePruneState prunestate;
 
 		if (blkno == next_unskippable_block)
 		{
@@ -959,8 +942,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		has_lpdead_items;
-
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
 			/* Check for new or empty pages before lazy_scan_noprune call */
@@ -1035,7 +1016,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 */
 		lazy_scan_prune(vacrel, buf, blkno, page,
 						vmbuffer, all_visible_according_to_vm,
-						&prunestate);
+						&has_lpdead_items);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1056,7 +1037,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 */
 		if (vacrel->nindexes == 0
 			|| !vacrel->do_index_vacuuming
-			|| !prunestate.has_lpdead_items)
+			|| !has_lpdead_items)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
@@ -1382,6 +1363,14 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
  * right after this operation completes instead of in the middle of it. Note that
  * any tuple that becomes dead after the call to heap_page_prune() can't need to
  * be frozen, because it was visible to another session when vacuum started.
+ *
+ * vmbuffer is the buffer containing the VM block with visibility information
+ * for the heap block, blkno. all_visible_according_to_vm is the saved
+ * visibility status of the heap block looked up earlier by the caller. We
+ * won't rely entirely on this status, as it may be out of date.
+ *
+ * *has_lpdead_items is set to true or false depending on whether, upon return
+ * from this function, any LP_DEAD items are still present on the page.
  */
 static void
 lazy_scan_prune(LVRelState *vacrel,
@@ -1390,7 +1379,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				Page page,
 				Buffer vmbuffer,
 				bool all_visible_according_to_vm,
-				LVPagePruneState *prunestate)
+				bool *has_lpdead_items)
 {
 	Relation	rel = vacrel->rel;
 	OffsetNumber offnum,
@@ -1403,6 +1392,9 @@ lazy_scan_prune(LVRelState *vacrel,
 				recently_dead_tuples;
 	HeapPageFreeze pagefrz;
 	bool		hastup = false;
+	bool		all_visible,
+				all_frozen;
+	TransactionId visibility_cutoff_xid;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1444,12 +1436,21 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
-	 * requiring freezing among remaining tuples with storage
+	 * requiring freezing among remaining tuples with storage. Keep track of
+	 * the visibility cutoff xid for recovery conflicts.
 	 */
-	prunestate->has_lpdead_items = false;
-	prunestate->all_visible = true;
-	prunestate->all_frozen = true;
-	prunestate->visibility_cutoff_xid = InvalidTransactionId;
+	*has_lpdead_items = false;
+	visibility_cutoff_xid = InvalidTransactionId;
+
+	/*
+	 * We will update the VM after collecting LP_DEAD items and freezing
+	 * tuples. Keep track of whether or not the page is all_visible and
+	 * all_frozen and use this information to update the VM. all_visible
+	 * implies 0 lpdead_items, but don't trust all_frozen result unless
+	 * all_visible is also set to true.
+	 */
+	all_visible = true;
+	all_frozen = true;
 
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
@@ -1537,13 +1538,13 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * asynchronously. See SetHintBits for more info. Check that
 				 * the tuple is hinted xmin-committed because of that.
 				 */
-				if (prunestate->all_visible)
+				if (all_visible)
 				{
 					TransactionId xmin;
 
 					if (!HeapTupleHeaderXminCommitted(htup))
 					{
-						prunestate->all_visible = false;
+						all_visible = false;
 						break;
 					}
 
@@ -1555,14 +1556,14 @@ lazy_scan_prune(LVRelState *vacrel,
 					if (!TransactionIdPrecedes(xmin,
 											   vacrel->cutoffs.OldestXmin))
 					{
-						prunestate->all_visible = false;
+						all_visible = false;
 						break;
 					}
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, prunestate->visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
 						TransactionIdIsNormal(xmin))
-						prunestate->visibility_cutoff_xid = xmin;
+						visibility_cutoff_xid = xmin;
 				}
 				break;
 			case HEAPTUPLE_RECENTLY_DEAD:
@@ -1573,7 +1574,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * pruning.)
 				 */
 				recently_dead_tuples++;
-				prunestate->all_visible = false;
+				all_visible = false;
 				break;
 			case HEAPTUPLE_INSERT_IN_PROGRESS:
 
@@ -1584,11 +1585,11 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * results.  This assumption is a bit shaky, but it is what
 				 * acquire_sample_rows() does, so be consistent.
 				 */
-				prunestate->all_visible = false;
+				all_visible = false;
 				break;
 			case HEAPTUPLE_DELETE_IN_PROGRESS:
 				/* This is an expected case during concurrent vacuum */
-				prunestate->all_visible = false;
+				all_visible = false;
 
 				/*
 				 * Count such rows as live.  As above, we assume the deleting
@@ -1618,7 +1619,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * definitely cannot be set all-frozen in the visibility map later on
 		 */
 		if (!totally_frozen)
-			prunestate->all_frozen = false;
+			all_frozen = false;
 	}
 
 	/*
@@ -1636,7 +1637,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * page all-frozen afterwards (might not happen until final heap pass).
 	 */
 	if (pagefrz.freeze_required || tuples_frozen == 0 ||
-		(prunestate->all_visible && prunestate->all_frozen &&
+		(all_visible && all_frozen &&
 		 fpi_before != pgWalUsage.wal_fpi))
 	{
 		/*
@@ -1674,11 +1675,11 @@ lazy_scan_prune(LVRelState *vacrel,
 			 * once we're done with it.  Otherwise we generate a conservative
 			 * cutoff by stepping back from OldestXmin.
 			 */
-			if (prunestate->all_visible && prunestate->all_frozen)
+			if (all_visible && all_frozen)
 			{
 				/* Using same cutoff when setting VM is now unnecessary */
-				snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
-				prunestate->visibility_cutoff_xid = InvalidTransactionId;
+				snapshotConflictHorizon = visibility_cutoff_xid;
+				visibility_cutoff_xid = InvalidTransactionId;
 			}
 			else
 			{
@@ -1701,7 +1702,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 */
 		vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
 		vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
-		prunestate->all_frozen = false;
+		all_frozen = false;
 		tuples_frozen = 0;		/* avoid miscounts in instrumentation */
 	}
 
@@ -1714,16 +1715,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	 */
 #ifdef USE_ASSERT_CHECKING
 	/* Note that all_frozen value does not matter when !all_visible */
-	if (prunestate->all_visible && lpdead_items == 0)
+	if (all_visible && lpdead_items == 0)
 	{
-		TransactionId cutoff;
-		bool		all_frozen;
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen))
+		if (!heap_page_is_all_visible(vacrel, buf,
+									  &debug_cutoff, &debug_all_frozen))
 			Assert(false);
 
-		Assert(!TransactionIdIsValid(cutoff) ||
-			   cutoff == prunestate->visibility_cutoff_xid);
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == visibility_cutoff_xid);
 	}
 #endif
 
@@ -1736,7 +1738,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		ItemPointerData tmp;
 
 		vacrel->lpdead_item_pages++;
-		prunestate->has_lpdead_items = true;
 
 		ItemPointerSetBlockNumber(&tmp, blkno);
 
@@ -1761,7 +1762,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * Now that freezing has been finalized, unset all_visible.  It needs
 		 * to reflect the present state of things, as expected by our caller.
 		 */
-		prunestate->all_visible = false;
+		all_visible = false;
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1775,19 +1776,23 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
 
-	Assert(!prunestate->all_visible || !prunestate->has_lpdead_items);
+	/* Did we find LP_DEAD items? */
+	*has_lpdead_items = (lpdead_items > 0);
+
+	Assert(!all_visible || !(*has_lpdead_items));
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last lazy_scan_skip() call), and from prunestate
+	 * of last lazy_scan_skip() call), and from all_visible and all_frozen
+	 * variables
 	 */
-	if (!all_visible_according_to_vm && prunestate->all_visible)
+	if (!all_visible_according_to_vm && all_visible)
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
-		if (prunestate->all_frozen)
+		if (all_frozen)
 		{
-			Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
@@ -1807,7 +1812,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
 		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-						  vmbuffer, prunestate->visibility_cutoff_xid,
+						  vmbuffer, visibility_cutoff_xid,
 						  flags);
 	}
 
@@ -1840,7 +1845,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
 	 * however.
 	 */
-	else if (prunestate->has_lpdead_items && PageIsAllVisible(page))
+	else if (lpdead_items > 0 && PageIsAllVisible(page))
 	{
 		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
 			 vacrel->relname, blkno);
@@ -1853,16 +1858,16 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both prunestate fields.
+	 * true, so we must check both all_visible and all_frozen.
 	 */
-	else if (all_visible_according_to_vm && prunestate->all_visible &&
-			 prunestate->all_frozen &&
+	else if (all_visible_according_to_vm && all_visible &&
+			 all_frozen &&
 			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		/*
 		 * Avoid relying on all_visible_according_to_vm as a proxy for the
 		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set in prunestate
+		 * stale -- even when all_visible is set
 		 */
 		if (!PageIsAllVisible(page))
 		{
@@ -1877,7 +1882,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * since a snapshotConflictHorizon sufficient to make everything safe
 		 * for REDO was logged when the page's tuples were frozen.
 		 */
-		Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+		Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
 						  vmbuffer, InvalidTransactionId,
 						  VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 29fd1cae64..17c0104376 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1405,7 +1405,6 @@ LPVOID
 LPWSTR
 LSEG
 LUID
-LVPagePruneState
 LVRelState
 LVSavedErrInfo
 LWLock
-- 
2.37.2

v9-0004-Combine-FSM-updates-for-prune-and-no-prune-cases.patchtext/x-patch; charset=US-ASCII; name=v9-0004-Combine-FSM-updates-for-prune-and-no-prune-cases.patchDownload
From 11a846cd54941f6f2110868ba97e252e251fde4b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Jan 2024 17:28:07 -0500
Subject: [PATCH v9 4/4] Combine FSM updates for prune and no prune cases

The goal is to update the FSM once for each page vacuumed. The logic for
ensuring this is the same for lazy_scan_prune() and lazy_scan_noprune().
Combine those FSM updates. While doing this, de-indent the logic for
calling lazy_scan_noprune() and the first invocation of
lazy_scan_new_or_empty(). With these changes, it is easier to see that
lazy_scan_new_or_empty() is called once before invoking
lazy_scan_noprune() and lazy_scan_prune().

There is one functional difference. Now, the FSM may be vacuumed
whenever we update the FSM in lazy_scan_heap(). Previously, it was not
considered if lazy_scan_noprune() returned true or if we called
lazy_scan_prune() for a block of a relation with indexes (or a relation
with no indexes but which did not have any dead items after pruning).
---
 src/backend/access/heap/vacuumlazy.c | 84 ++++++++++------------------
 1 file changed, 30 insertions(+), 54 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 610740c44e..93fb33ef81 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -838,6 +838,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		Buffer		buf;
 		Page		page;
 		bool		all_visible_according_to_vm;
+		bool		got_cleanup_lock = false;
 
 		if (blkno == next_unskippable_block)
 		{
@@ -940,54 +941,28 @@ lazy_scan_heap(LVRelState *vacrel)
 		buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
 								 vacrel->bstrategy);
 		page = BufferGetPage(buf);
-		if (!ConditionalLockBufferForCleanup(buf))
-		{
-			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-			/* Check for new or empty pages before lazy_scan_noprune call */
-			if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, true,
-									   vmbuffer))
-			{
-				/* Processed as new/empty page (lock and pin released) */
-				continue;
-			}
+		got_cleanup_lock = ConditionalLockBufferForCleanup(buf);
 
-			/*
-			 * Collect LP_DEAD items in dead_items array, count tuples,
-			 * determine if rel truncation is safe
-			 */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page, &has_lpdead_items))
-			{
-				Size		freespace = 0;
-				bool		recordfreespace;
+		if (!got_cleanup_lock)
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * We processed the page successfully (without a cleanup
-				 * lock).
-				 *
-				 * Update the FSM, just as we would in the case where
-				 * lazy_scan_prune() is called. Our goal is to update the
-				 * freespace map the last time we touch the page. If the
-				 * relation has no indexes, or if index vacuuming is disabled,
-				 * there will be no second heap pass; if this particular page
-				 * has no dead items, the second heap pass will not touch this
-				 * page. So, in those cases, update the FSM now.
-				 *
-				 * After a call to lazy_scan_prune(), we would also try to
-				 * adjust the page-level all-visible bit and the visibility
-				 * map, but we skip that step in this path.
-				 */
-				recordfreespace = vacrel->nindexes == 0
-					|| !vacrel->do_index_vacuuming
-					|| !has_lpdead_items;
-				if (recordfreespace)
-					freespace = PageGetHeapFreeSpace(page);
-				UnlockReleaseBuffer(buf);
-				if (recordfreespace)
-					RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-				continue;
-			}
+		/* Check for new or empty pages before lazy_scan_[no]prune call */
+		if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, !got_cleanup_lock,
+								   vmbuffer))
+		{
+			/* Processed as new/empty page (lock and pin released) */
+			continue;
+		}
 
+		/*
+		 * Collect LP_DEAD items in dead_items array, count tuples, determine
+		 * if rel truncation is safe. If lazy_scan_noprune() returns false, we
+		 * must get a cleanup lock and call lazy_scan_prune().
+		 */
+		if (!got_cleanup_lock &&
+			!lazy_scan_noprune(vacrel, buf, blkno, page, &has_lpdead_items))
+		{
 			/*
 			 * lazy_scan_noprune could not do all required processing.  Wait
 			 * for a cleanup lock, and call lazy_scan_prune in the usual way.
@@ -995,13 +970,7 @@ lazy_scan_heap(LVRelState *vacrel)
 			Assert(vacrel->aggressive);
 			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 			LockBufferForCleanup(buf);
-		}
-
-		/* Check for new or empty pages before lazy_scan_prune call */
-		if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, false, vmbuffer))
-		{
-			/* Processed as new/empty page (lock and pin released) */
-			continue;
+			got_cleanup_lock = true;
 		}
 
 		/*
@@ -1014,9 +983,10 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * tuple headers of remaining items with storage. It also determines
 		 * if truncating this block is safe.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page,
-						vmbuffer, all_visible_according_to_vm,
-						&has_lpdead_items);
+		if (got_cleanup_lock)
+			lazy_scan_prune(vacrel, buf, blkno, page,
+							vmbuffer, all_visible_according_to_vm,
+							&has_lpdead_items);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1028,6 +998,12 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * space that lazy_vacuum_heap_page() will make available in cases
 		 * where it's possible to truncate the page's line pointer array.
 		 *
+		 * Our goal is to update the freespace map the last time we touch the
+		 * page. If the relation has no indexes, or if index vacuuming is
+		 * disabled, there will be no second heap pass; if this particular
+		 * page has no dead items, the second heap pass will not touch this
+		 * page. So, in those cases, update the FSM now.
+		 *
 		 * Note: It's not in fact 100% certain that we really will call
 		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
 		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
-- 
2.37.2

#73Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#72)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 17, 2024 at 3:12 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Reviewing 0001, consider the case where a table has no indexes.
Pre-patch, PageTruncateLinePointerArray() will happen when
lazy_vacuum_heap_page() is called; post-patch, it will not occur.
That's a loss. Should we care? On the plus side, visibility map
repair, if required, can now take place. That's a gain.

I thought that this wasn't an issue because heap_page_prune_execute()
calls PageRepairFragmentation() which similarly modifies pd_lower and
sets the hint bit about free line pointers.

Ah, OK, I didn't understand that PageRepairFragmentation() does what
is also done by PageTruncateLinePointerArray().

Yes, I also spent some time thinking about this. In master, we do
always call lazy_scan_new_or_empty() before calling
lazy_scan_noprune(). The code is aiming to ensure we call
lazy_scan_new_or_empty() once before calling either of
lazy_scan_noprune() or lazy_scan_prune(). I think it didn't call
lazy_scan_new_or_empty() unconditionally first because of the
different lock types expected. But, your structure has solved that.
I've used a version of your example code above in attached v9. It is
much nicer.

Oh, OK, I see it now. I missed that lazy_scan_new_or_empty() was
called either way. Glad that my proposed restructuring managed to be
helpful despite that confusion, though. :-)

At a quick glance, I also like the way this looks. I'll review it more
thoroughly later. Does this patch require 0002 and 0003 or could it
equally well go first? I confess that I don't entirely understand why
we want 0002 and 0003.

Ah, I realize I was not clear. I am now talking about inconsistencies
in vacuuming the FSM itself. FreeSpaceMapVacuumRange(). Not updating
the freespace map during the course of vacuuming the heap relation.

Fair enough, but I'm still not quite sure exactly what the question
is. It looks to me like the current code, when there are indexes,
vacuums the FSM after each round of index vacuuming. When there are no
indexes, doing it after each round of index vacuuming would mean never
doing it, so instead we vacuum the FSM every ~8GB. I assume what
happened here is that somebody decided doing it after each round of
index vacuuming was the "right thing," and then realized that was not
going to work if no index vacuuming was happening, and so inserted the
8GB threshold to cover that case. I don't really know what to make of
all of this. On a happy PostgreSQL system, doing anything after each
round of index vacuuming means doing it once, because multiple rounds
of indexing vacuum are extremely expensive and we hope that it won't
ever occur. From that point of view, the 8GB threshold is better,
because it means that when we vacuum a large relation, space should
become visible to the rest of the system incrementally without needing
to wait for the entire vacuum to finish. On the other hand, we also
have this idea that we want to record free space in the FSM once,
after the last time we touch the page. Given that behavior, vacuuming
the FSM every 8GB when we haven't yet done index vacuuming wouldn't
accomplish much of anything, because we haven't updated it for the
pages we just touched. On the third hand, the current behavior seems
slightly ridiculous, because pruning the page is where we're mostly
going to free up space, so we might be better off just updating the
FSM then instead of waiting. That free space could be mighty useful
during the long wait between pruning and marking line pointers unused.
On the fourth hand, that seems like a significant behavior change that
we might not want to undertake without a bunch of research that we
might not want to do right now -- and if we did do it, should we then
update the FSM a second time after marking line pointers unused?

I'm not sure if any of this is answering your actual question, though.

--
Robert Haas
EDB: http://www.enterprisedb.com

In reply to: Robert Haas (#73)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 17, 2024 at 3:58 PM Robert Haas <robertmhaas@gmail.com> wrote:

Ah, I realize I was not clear. I am now talking about inconsistencies
in vacuuming the FSM itself. FreeSpaceMapVacuumRange(). Not updating
the freespace map during the course of vacuuming the heap relation.

Fair enough, but I'm still not quite sure exactly what the question
is. It looks to me like the current code, when there are indexes,
vacuums the FSM after each round of index vacuuming. When there are no
indexes, doing it after each round of index vacuuming would mean never
doing it, so instead we vacuum the FSM every ~8GB. I assume what
happened here is that somebody decided doing it after each round of
index vacuuming was the "right thing," and then realized that was not
going to work if no index vacuuming was happening, and so inserted the
8GB threshold to cover that case.

Note that VACUUM_FSM_EVERY_PAGES is applied against the number of
rel_pages "processed" so far -- *including* any pages that were
skipped using the visibility map. It would make a bit more sense if it
was applied against scanned_pages instead (just like
FAILSAFE_EVERY_PAGES has been since commit 07eef53955). In other
words, VACUUM_FSM_EVERY_PAGES is applied against a thing that has only
a very loose relationship with physical work performed/time elapsed.

I tend to suspect that VACUUM_FSM_EVERY_PAGES is fundamentally the
wrong idea. If it's such a good idea then why not apply it all the
time? That is, why not apply it independently of whether nindexes==0
in the current VACUUM operation? (You know, just like with
FAILSAFE_EVERY_PAGES.)

--
Peter Geoghegan

In reply to: Peter Geoghegan (#74)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 17, 2024 at 4:25 PM Peter Geoghegan <pg@bowt.ie> wrote:

I tend to suspect that VACUUM_FSM_EVERY_PAGES is fundamentally the
wrong idea. If it's such a good idea then why not apply it all the
time? That is, why not apply it independently of whether nindexes==0
in the current VACUUM operation? (You know, just like with
FAILSAFE_EVERY_PAGES.)

Actually, I suppose that we couldn't apply it independently of
nindexes==0. Then we'd call FreeSpaceMapVacuumRange() before our
second pass over the heap takes place for those LP_DEAD-containing
heap pages scanned since the last round of index/heap vacuuming took
place (or since VACUUM began). We need to make sure that the FSM has
the most recent possible information known to VACUUM, which would
break if we applied VACUUM_FSM_EVERY_PAGES rules when nindexes > 0.

Even still, the design of VACUUM_FSM_EVERY_PAGES seems questionable to me.

--
Peter Geoghegan

#76Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#73)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 17, 2024 at 3:58 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 17, 2024 at 3:12 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Yes, I also spent some time thinking about this. In master, we do
always call lazy_scan_new_or_empty() before calling
lazy_scan_noprune(). The code is aiming to ensure we call
lazy_scan_new_or_empty() once before calling either of
lazy_scan_noprune() or lazy_scan_prune(). I think it didn't call
lazy_scan_new_or_empty() unconditionally first because of the
different lock types expected. But, your structure has solved that.
I've used a version of your example code above in attached v9. It is
much nicer.

Oh, OK, I see it now. I missed that lazy_scan_new_or_empty() was
called either way. Glad that my proposed restructuring managed to be
helpful despite that confusion, though. :-)

At a quick glance, I also like the way this looks. I'll review it more
thoroughly later. Does this patch require 0002 and 0003 or could it
equally well go first? I confess that I don't entirely understand why
we want 0002 and 0003.

Well, 0002 and 0003 move the updates to the visibility map into
lazy_scan_prune(). We only want to update the VM if we called
lazy_scan_prune() (i.e. not if lazy_scan_noprune() returned true). We
also need the lock on the heap page when updating the visibility map
but we want to have released the lock before updating the FSM, so we
need to first update the VM then the FSM.

The VM update code, in my opinion, belongs in lazy_scan_prune() --
since it is only done when lazy_scan_prune() is called. To keep the VM
update code in lazy_scan_heap() and still consolidate the FSM update
code, we would have to surround all of the VM update code in a test
(if got_cleanup_lock, I suppose). I don't see any advantage in doing
that.

Ah, I realize I was not clear. I am now talking about inconsistencies
in vacuuming the FSM itself. FreeSpaceMapVacuumRange(). Not updating
the freespace map during the course of vacuuming the heap relation.

Fair enough, but I'm still not quite sure exactly what the question
is. It looks to me like the current code, when there are indexes,
vacuums the FSM after each round of index vacuuming. When there are no
indexes, doing it after each round of index vacuuming would mean never
doing it, so instead we vacuum the FSM every ~8GB. I assume what
happened here is that somebody decided doing it after each round of
index vacuuming was the "right thing," and then realized that was not
going to work if no index vacuuming was happening, and so inserted the
8GB threshold to cover that case.

Ah, I see. I understood that we want to update the FSM every 8GB, but
I didn't understand that we wanted to check if we were at that 8GB
only after a round of index vacuuming. That would explain why we also
had to do it in the no indexes case -- because, as you say, there
wouldn't be a round of index vacuuming.

This does mean that something is not quite right with 0001 as well as
0004. We'd end up checking if we are at 8GB much more often. I should
probably find a way to replicate the cadence on master.

I don't really know what to make of
all of this. On a happy PostgreSQL system, doing anything after each
round of index vacuuming means doing it once, because multiple rounds
of indexing vacuum are extremely expensive and we hope that it won't
ever occur. From that point of view, the 8GB threshold is better,
because it means that when we vacuum a large relation, space should
become visible to the rest of the system incrementally without needing
to wait for the entire vacuum to finish. On the other hand, we also
have this idea that we want to record free space in the FSM once,
after the last time we touch the page. Given that behavior, vacuuming
the FSM every 8GB when we haven't yet done index vacuuming wouldn't
accomplish much of anything, because we haven't updated it for the
pages we just touched. On the third hand, the current behavior seems
slightly ridiculous, because pruning the page is where we're mostly
going to free up space, so we might be better off just updating the
FSM then instead of waiting. That free space could be mighty useful
during the long wait between pruning and marking line pointers unused.
On the fourth hand, that seems like a significant behavior change that
we might not want to undertake without a bunch of research that we
might not want to do right now -- and if we did do it, should we then
update the FSM a second time after marking line pointers unused?

I suspect we'd need to do some testing of various scenarios to justify
such a change.

- Melanie

#77Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#74)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 17, 2024 at 4:25 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Jan 17, 2024 at 3:58 PM Robert Haas <robertmhaas@gmail.com> wrote:

Ah, I realize I was not clear. I am now talking about inconsistencies
in vacuuming the FSM itself. FreeSpaceMapVacuumRange(). Not updating
the freespace map during the course of vacuuming the heap relation.

Fair enough, but I'm still not quite sure exactly what the question
is. It looks to me like the current code, when there are indexes,
vacuums the FSM after each round of index vacuuming. When there are no
indexes, doing it after each round of index vacuuming would mean never
doing it, so instead we vacuum the FSM every ~8GB. I assume what
happened here is that somebody decided doing it after each round of
index vacuuming was the "right thing," and then realized that was not
going to work if no index vacuuming was happening, and so inserted the
8GB threshold to cover that case.

Note that VACUUM_FSM_EVERY_PAGES is applied against the number of
rel_pages "processed" so far -- *including* any pages that were
skipped using the visibility map. It would make a bit more sense if it
was applied against scanned_pages instead (just like
FAILSAFE_EVERY_PAGES has been since commit 07eef53955). In other
words, VACUUM_FSM_EVERY_PAGES is applied against a thing that has only
a very loose relationship with physical work performed/time elapsed.

This is a good point. Seems like a very reasonable change to make, as
I would think that was the original intent.

- Melanie

#78Melanie Plageman
melanieplageman@gmail.com
In reply to: Peter Geoghegan (#75)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 17, 2024 at 4:31 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Jan 17, 2024 at 4:25 PM Peter Geoghegan <pg@bowt.ie> wrote:

I tend to suspect that VACUUM_FSM_EVERY_PAGES is fundamentally the
wrong idea. If it's such a good idea then why not apply it all the
time? That is, why not apply it independently of whether nindexes==0
in the current VACUUM operation? (You know, just like with
FAILSAFE_EVERY_PAGES.)

Actually, I suppose that we couldn't apply it independently of
nindexes==0. Then we'd call FreeSpaceMapVacuumRange() before our
second pass over the heap takes place for those LP_DEAD-containing
heap pages scanned since the last round of index/heap vacuuming took
place (or since VACUUM began). We need to make sure that the FSM has
the most recent possible information known to VACUUM, which would
break if we applied VACUUM_FSM_EVERY_PAGES rules when nindexes > 0.

Even still, the design of VACUUM_FSM_EVERY_PAGES seems questionable to me.

I now see I misunderstood and my earlier email was wrong. I didn't
notice that we only use VACUUM_FSM_EVERY_PAGES if nindexes ==0.
So, in master, we call FreeSpaceMapVacuumRange() always after a round
of index vacuuming and periodically if there are no indexes.

It seems like you are asking whether not we should vacuum the FSM at a
different cadence for the no indexes case (and potentially count
blocks actually vacuumed instead of blocks considered).

And it seems like Robert is asking whether or not we should
FreeSpaceMapVacuumRange() more frequently than after index vacuuming
in the nindexes > 0 case.

Other than the overhead of the actual vacuuming of the FSM, what are
the potential downsides of knowing about freespace sooner? It could
change what pages are inserted to. What are the possible undesirable
side effects?

- Melanie

In reply to: Melanie Plageman (#78)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 17, 2024 at 5:47 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Wed, Jan 17, 2024 at 4:31 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Jan 17, 2024 at 4:25 PM Peter Geoghegan <pg@bowt.ie> wrote:

I tend to suspect that VACUUM_FSM_EVERY_PAGES is fundamentally the
wrong idea. If it's such a good idea then why not apply it all the
time? That is, why not apply it independently of whether nindexes==0
in the current VACUUM operation? (You know, just like with
FAILSAFE_EVERY_PAGES.)

Actually, I suppose that we couldn't apply it independently of
nindexes==0. Then we'd call FreeSpaceMapVacuumRange() before our
second pass over the heap takes place for those LP_DEAD-containing
heap pages scanned since the last round of index/heap vacuuming took
place (or since VACUUM began). We need to make sure that the FSM has
the most recent possible information known to VACUUM, which would
break if we applied VACUUM_FSM_EVERY_PAGES rules when nindexes > 0.

Even still, the design of VACUUM_FSM_EVERY_PAGES seems questionable to me.

I now see I misunderstood and my earlier email was wrong. I didn't
notice that we only use VACUUM_FSM_EVERY_PAGES if nindexes ==0.
So, in master, we call FreeSpaceMapVacuumRange() always after a round
of index vacuuming and periodically if there are no indexes.

The "nindexes == 0" if() that comes just after our call to
lazy_scan_prune() is "the one-pass equivalent of a call to
lazy_vacuum()". Though this includes the call to
FreeSpaceMapVacuumRange() that immediately follows the two-pass case
calling lazy_vacuum(), too.

It seems like you are asking whether not we should vacuum the FSM at a
different cadence for the no indexes case (and potentially count
blocks actually vacuumed instead of blocks considered).

And it seems like Robert is asking whether or not we should
FreeSpaceMapVacuumRange() more frequently than after index vacuuming
in the nindexes > 0 case.

There is no particular reason for the nindexes==0 case to care about
how often we'd call FreeSpaceMapVacuumRange() in the counterfactual
world where the same VACUUM ran on the same table, except that it was
nindexes>1 instead. At least I don't see any.

Other than the overhead of the actual vacuuming of the FSM, what are
the potential downsides of knowing about freespace sooner? It could
change what pages are inserted to. What are the possible undesirable
side effects?

The whole VACUUM_FSM_EVERY_PAGES thing comes from commit 851a26e266.
The commit message of that work seems to suppose that calling
FreeSpaceMapVacuumRange() more frequently is pretty much strictly
better than calling it less frequently, at least up to the point where
certain more-or-less fixed costs paid once per
FreeSpaceMapVacuumRange() start to become a problem. I think that
that's probably about right.

The commit message also says that we "arbitrarily update upper FSM
pages after each 8GB of heap" (in the nindexes==0 case). So
VACUUM_FSM_EVERY_PAGES is only very approximately analogous to what we
do in the nindexes>1 case. That seems reasonable because these two
cases really aren't so comparable in terms of the FSM vacuuming
requirements -- the nindexes==0 case legitimately doesn't have the
same dependency on heap vacuuming (and index vacuuming) that we have
to consider when nindexes>1.

--
Peter Geoghegan

#80Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#75)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 17, 2024 at 4:31 PM Peter Geoghegan <pg@bowt.ie> wrote:

Actually, I suppose that we couldn't apply it independently of
nindexes==0. Then we'd call FreeSpaceMapVacuumRange() before our
second pass over the heap takes place for those LP_DEAD-containing
heap pages scanned since the last round of index/heap vacuuming took
place (or since VACUUM began). We need to make sure that the FSM has
the most recent possible information known to VACUUM, which would
break if we applied VACUUM_FSM_EVERY_PAGES rules when nindexes > 0.

Even still, the design of VACUUM_FSM_EVERY_PAGES seems questionable to me.

I agree with all of this. I thought I'd said all of this, actually, in
my prior email, but perhaps it wasn't as clear as it needed to be.

But I also said one more thing that I'd still like to hear your
thoughts about, which is: why is it right to update the FSM after the
second heap pass rather than the first one? I can't help but suspect
this is an algorithmic holdover from pre-HOT days, when VACUUM's first
heap pass was read-only and all the work happened in the second pass.
Now, nearly all of the free space that will ever become free becomes
free in the first pass, so why not advertise it then, instead of
waiting?

Admittedly, HOT is not yet 15 years old, so maybe it's too soon to
adapt our VACUUM algorithm for it. *wink*

--
Robert Haas
EDB: http://www.enterprisedb.com

#81Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#76)
4 attachment(s)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 17, 2024 at 5:33 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

This does mean that something is not quite right with 0001 as well as
0004. We'd end up checking if we are at 8GB much more often. I should
probably find a way to replicate the cadence on master.

I believe I've done this in attached v10.

- Melanie

Attachments:

v10-0004-Combine-FSM-updates-for-prune-and-no-prune-cases.patchtext/x-patch; charset=US-ASCII; name=v10-0004-Combine-FSM-updates-for-prune-and-no-prune-cases.patchDownload
From 23fe74086fadf6ed4c7697e7b35b3ff9e0a5f87a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Jan 2024 17:28:07 -0500
Subject: [PATCH v10 4/4] Combine FSM updates for prune and no prune cases

The goal is to update the FSM once for each page vacuumed. The logic for
ensuring this is the same for lazy_scan_prune() and lazy_scan_noprune().
Combine those FSM updates. While doing this, de-indent the logic for
calling lazy_scan_noprune() and the first invocation of
lazy_scan_new_or_empty(). With these changes, it is easier to see that
lazy_scan_new_or_empty() is called once before invoking
lazy_scan_noprune() and lazy_scan_prune().

Discussion: https://postgr.es/m/CA%2BTgmoaLTvipm%3Dxx4rJLr07m908PCu%3DQH3uCjD1UOn8YaEuO2g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 89 +++++++++++-----------------
 1 file changed, 33 insertions(+), 56 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 32631d27c5..703136413f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -838,6 +838,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		Buffer		buf;
 		Page		page;
 		bool		all_visible_according_to_vm;
+		bool		got_cleanup_lock = false;
 
 		if (blkno == next_unskippable_block)
 		{
@@ -940,54 +941,28 @@ lazy_scan_heap(LVRelState *vacrel)
 		buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
 								 vacrel->bstrategy);
 		page = BufferGetPage(buf);
-		if (!ConditionalLockBufferForCleanup(buf))
-		{
-			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-			/* Check for new or empty pages before lazy_scan_noprune call */
-			if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, true,
-									   vmbuffer))
-			{
-				/* Processed as new/empty page (lock and pin released) */
-				continue;
-			}
+		got_cleanup_lock = ConditionalLockBufferForCleanup(buf);
 
-			/*
-			 * Collect LP_DEAD items in dead_items array, count tuples,
-			 * determine if rel truncation is safe
-			 */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page, &has_lpdead_items))
-			{
-				Size		freespace = 0;
-				bool		recordfreespace;
+		if (!got_cleanup_lock)
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * We processed the page successfully (without a cleanup
-				 * lock).
-				 *
-				 * Update the FSM, just as we would in the case where
-				 * lazy_scan_prune() is called. Our goal is to update the
-				 * freespace map the last time we touch the page. If the
-				 * relation has no indexes, or if index vacuuming is disabled,
-				 * there will be no second heap pass; if this particular page
-				 * has no dead items, the second heap pass will not touch this
-				 * page. So, in those cases, update the FSM now.
-				 *
-				 * After a call to lazy_scan_prune(), we would also try to
-				 * adjust the page-level all-visible bit and the visibility
-				 * map, but we skip that step in this path.
-				 */
-				recordfreespace = vacrel->nindexes == 0
-					|| !vacrel->do_index_vacuuming
-					|| !has_lpdead_items;
-				if (recordfreespace)
-					freespace = PageGetHeapFreeSpace(page);
-				UnlockReleaseBuffer(buf);
-				if (recordfreespace)
-					RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-				continue;
-			}
+		/* Check for new or empty pages before lazy_scan_[no]prune call */
+		if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, !got_cleanup_lock,
+								   vmbuffer))
+		{
+			/* Processed as new/empty page (lock and pin released) */
+			continue;
+		}
 
+		/*
+		 * Collect LP_DEAD items in dead_items array, count tuples, determine
+		 * if rel truncation is safe. If lazy_scan_noprune() returns false, we
+		 * must get a cleanup lock and call lazy_scan_prune().
+		 */
+		if (!got_cleanup_lock &&
+			!lazy_scan_noprune(vacrel, buf, blkno, page, &has_lpdead_items))
+		{
 			/*
 			 * lazy_scan_noprune could not do all required processing.  Wait
 			 * for a cleanup lock, and call lazy_scan_prune in the usual way.
@@ -995,13 +970,7 @@ lazy_scan_heap(LVRelState *vacrel)
 			Assert(vacrel->aggressive);
 			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 			LockBufferForCleanup(buf);
-		}
-
-		/* Check for new or empty pages before lazy_scan_prune call */
-		if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, false, vmbuffer))
-		{
-			/* Processed as new/empty page (lock and pin released) */
-			continue;
+			got_cleanup_lock = true;
 		}
 
 		/*
@@ -1014,9 +983,10 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * tuple headers of remaining items with storage. It also determines
 		 * if truncating this block is safe.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page,
-						vmbuffer, all_visible_according_to_vm,
-						&has_lpdead_items);
+		if (got_cleanup_lock)
+			lazy_scan_prune(vacrel, buf, blkno, page,
+							vmbuffer, all_visible_according_to_vm,
+							&has_lpdead_items);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1028,6 +998,12 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * space that lazy_vacuum_heap_page() will make available in cases
 		 * where it's possible to truncate the page's line pointer array.
 		 *
+		 * Our goal is to update the freespace map the last time we touch the
+		 * page. If the relation has no indexes, or if index vacuuming is
+		 * disabled, there will be no second heap pass; if this particular
+		 * page has no dead items, the second heap pass will not touch this
+		 * page. So, in those cases, update the FSM now.
+		 *
 		 * Note: It's not in fact 100% certain that we really will call
 		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
 		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
@@ -1047,9 +1023,10 @@ lazy_scan_heap(LVRelState *vacrel)
 			/*
 			 * Periodically perform FSM vacuuming to make newly-freed space
 			 * visible on upper FSM pages. This is done after vacuuming if the
-			 * table has indexes.
+			 * table has indexes. Space won't have been freed up if only
+			 * lazy_scan_noprune() was called, so check got_cleanup_lock.
 			 */
-			if (vacrel->nindexes == 0 && has_lpdead_items &&
+			if (got_cleanup_lock && vacrel->nindexes == 0 && has_lpdead_items &&
 				blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
 			{
 				FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-- 
2.37.2

v10-0003-Inline-LVPagePruneState-members-in-lazy_scan_pru.patchtext/x-patch; charset=US-ASCII; name=v10-0003-Inline-LVPagePruneState-members-in-lazy_scan_pru.patchDownload
From 04bc4a900f8fc6dc78bfbbbe5454c9ee1f4bc703 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 9 Jan 2024 17:17:01 -0500
Subject: [PATCH v10 3/4] Inline LVPagePruneState members in lazy_scan_prune()

After moving the code to update the visibility map from lazy_scan_heap()
into lazy_scan_prune(), the LVPagePruneState is mainly obsolete. Replace
all instances of use of its members in lazy_scan_prune() with local
variables and remove the struct.

Only has_lpdead_items remains in use.
---
 src/backend/access/heap/vacuumlazy.c | 133 ++++++++++++++-------------
 src/tools/pgindent/typedefs.list     |   1 -
 2 files changed, 69 insertions(+), 65 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 34bc6b6890..32631d27c5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -212,23 +212,6 @@ typedef struct LVRelState
 	int64		missed_dead_tuples; /* # removable, but not removed */
 } LVRelState;
 
-/*
- * State returned by lazy_scan_prune()
- */
-typedef struct LVPagePruneState
-{
-	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
-
-	/*
-	 * State describes the proper VM bit states to set for the page following
-	 * pruning and freezing.  all_visible implies !has_lpdead_items, but don't
-	 * trust all_frozen result unless all_visible is also set to true.
-	 */
-	bool		all_visible;	/* Every item visible to all? */
-	bool		all_frozen;		/* provided all_visible is also true */
-	TransactionId visibility_cutoff_xid;	/* For recovery conflicts */
-} LVPagePruneState;
-
 /* Struct for saving and restoring vacuum error information. */
 typedef struct LVSavedErrInfo
 {
@@ -250,7 +233,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer, bool all_visible_according_to_vm,
-							LVPagePruneState *prunestate);
+							bool *has_lpdead_items);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
 							  bool *has_lpdead_items);
@@ -828,6 +811,7 @@ lazy_scan_heap(LVRelState *vacrel)
 				blkno,
 				next_unskippable_block,
 				next_fsm_block_to_vacuum = 0;
+	bool		has_lpdead_items;
 	VacDeadItems *dead_items = vacrel->dead_items;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		next_unskippable_allvis,
@@ -854,7 +838,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		Buffer		buf;
 		Page		page;
 		bool		all_visible_according_to_vm;
-		LVPagePruneState prunestate;
 
 		if (blkno == next_unskippable_block)
 		{
@@ -959,8 +942,6 @@ lazy_scan_heap(LVRelState *vacrel)
 		page = BufferGetPage(buf);
 		if (!ConditionalLockBufferForCleanup(buf))
 		{
-			bool		has_lpdead_items;
-
 			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
 			/* Check for new or empty pages before lazy_scan_noprune call */
@@ -1035,7 +1016,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 */
 		lazy_scan_prune(vacrel, buf, blkno, page,
 						vmbuffer, all_visible_according_to_vm,
-						&prunestate);
+						&has_lpdead_items);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1056,7 +1037,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		 */
 		if (vacrel->nindexes == 0
 			|| !vacrel->do_index_vacuuming
-			|| !prunestate.has_lpdead_items)
+			|| !has_lpdead_items)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
@@ -1068,7 +1049,7 @@ lazy_scan_heap(LVRelState *vacrel)
 			 * visible on upper FSM pages. This is done after vacuuming if the
 			 * table has indexes.
 			 */
-			if (vacrel->nindexes == 0 && prunestate.has_lpdead_items &&
+			if (vacrel->nindexes == 0 && has_lpdead_items &&
 				blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
 			{
 				FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
@@ -1383,6 +1364,14 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
  * right after this operation completes instead of in the middle of it. Note that
  * any tuple that becomes dead after the call to heap_page_prune() can't need to
  * be frozen, because it was visible to another session when vacuum started.
+ *
+ * vmbuffer is the buffer containing the VM block with visibility information
+ * for the heap block, blkno. all_visible_according_to_vm is the saved
+ * visibility status of the heap block looked up earlier by the caller. We
+ * won't rely entirely on this status, as it may be out of date.
+ *
+ * *has_lpdead_items is set to true or false depending on whether, upon return
+ * from this function, any LP_DEAD items are still present on the page.
  */
 static void
 lazy_scan_prune(LVRelState *vacrel,
@@ -1391,7 +1380,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				Page page,
 				Buffer vmbuffer,
 				bool all_visible_according_to_vm,
-				LVPagePruneState *prunestate)
+				bool *has_lpdead_items)
 {
 	Relation	rel = vacrel->rel;
 	OffsetNumber offnum,
@@ -1404,6 +1393,9 @@ lazy_scan_prune(LVRelState *vacrel,
 				recently_dead_tuples;
 	HeapPageFreeze pagefrz;
 	bool		hastup = false;
+	bool		all_visible,
+				all_frozen;
+	TransactionId visibility_cutoff_xid;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1445,12 +1437,21 @@ lazy_scan_prune(LVRelState *vacrel,
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
-	 * requiring freezing among remaining tuples with storage
+	 * requiring freezing among remaining tuples with storage. Keep track of
+	 * the visibility cutoff xid for recovery conflicts.
 	 */
-	prunestate->has_lpdead_items = false;
-	prunestate->all_visible = true;
-	prunestate->all_frozen = true;
-	prunestate->visibility_cutoff_xid = InvalidTransactionId;
+	*has_lpdead_items = false;
+	visibility_cutoff_xid = InvalidTransactionId;
+
+	/*
+	 * We will update the VM after collecting LP_DEAD items and freezing
+	 * tuples. Keep track of whether or not the page is all_visible and
+	 * all_frozen and use this information to update the VM. all_visible
+	 * implies 0 lpdead_items, but don't trust all_frozen result unless
+	 * all_visible is also set to true.
+	 */
+	all_visible = true;
+	all_frozen = true;
 
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
@@ -1538,13 +1539,13 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * asynchronously. See SetHintBits for more info. Check that
 				 * the tuple is hinted xmin-committed because of that.
 				 */
-				if (prunestate->all_visible)
+				if (all_visible)
 				{
 					TransactionId xmin;
 
 					if (!HeapTupleHeaderXminCommitted(htup))
 					{
-						prunestate->all_visible = false;
+						all_visible = false;
 						break;
 					}
 
@@ -1556,14 +1557,14 @@ lazy_scan_prune(LVRelState *vacrel,
 					if (!TransactionIdPrecedes(xmin,
 											   vacrel->cutoffs.OldestXmin))
 					{
-						prunestate->all_visible = false;
+						all_visible = false;
 						break;
 					}
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, prunestate->visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
 						TransactionIdIsNormal(xmin))
-						prunestate->visibility_cutoff_xid = xmin;
+						visibility_cutoff_xid = xmin;
 				}
 				break;
 			case HEAPTUPLE_RECENTLY_DEAD:
@@ -1574,7 +1575,7 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * pruning.)
 				 */
 				recently_dead_tuples++;
-				prunestate->all_visible = false;
+				all_visible = false;
 				break;
 			case HEAPTUPLE_INSERT_IN_PROGRESS:
 
@@ -1585,11 +1586,11 @@ lazy_scan_prune(LVRelState *vacrel,
 				 * results.  This assumption is a bit shaky, but it is what
 				 * acquire_sample_rows() does, so be consistent.
 				 */
-				prunestate->all_visible = false;
+				all_visible = false;
 				break;
 			case HEAPTUPLE_DELETE_IN_PROGRESS:
 				/* This is an expected case during concurrent vacuum */
-				prunestate->all_visible = false;
+				all_visible = false;
 
 				/*
 				 * Count such rows as live.  As above, we assume the deleting
@@ -1619,7 +1620,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * definitely cannot be set all-frozen in the visibility map later on
 		 */
 		if (!totally_frozen)
-			prunestate->all_frozen = false;
+			all_frozen = false;
 	}
 
 	/*
@@ -1637,7 +1638,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * page all-frozen afterwards (might not happen until final heap pass).
 	 */
 	if (pagefrz.freeze_required || tuples_frozen == 0 ||
-		(prunestate->all_visible && prunestate->all_frozen &&
+		(all_visible && all_frozen &&
 		 fpi_before != pgWalUsage.wal_fpi))
 	{
 		/*
@@ -1675,11 +1676,11 @@ lazy_scan_prune(LVRelState *vacrel,
 			 * once we're done with it.  Otherwise we generate a conservative
 			 * cutoff by stepping back from OldestXmin.
 			 */
-			if (prunestate->all_visible && prunestate->all_frozen)
+			if (all_visible && all_frozen)
 			{
 				/* Using same cutoff when setting VM is now unnecessary */
-				snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
-				prunestate->visibility_cutoff_xid = InvalidTransactionId;
+				snapshotConflictHorizon = visibility_cutoff_xid;
+				visibility_cutoff_xid = InvalidTransactionId;
 			}
 			else
 			{
@@ -1702,7 +1703,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 */
 		vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
 		vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
-		prunestate->all_frozen = false;
+		all_frozen = false;
 		tuples_frozen = 0;		/* avoid miscounts in instrumentation */
 	}
 
@@ -1715,16 +1716,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	 */
 #ifdef USE_ASSERT_CHECKING
 	/* Note that all_frozen value does not matter when !all_visible */
-	if (prunestate->all_visible && lpdead_items == 0)
+	if (all_visible && lpdead_items == 0)
 	{
-		TransactionId cutoff;
-		bool		all_frozen;
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
 
-		if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen))
+		if (!heap_page_is_all_visible(vacrel, buf,
+									  &debug_cutoff, &debug_all_frozen))
 			Assert(false);
 
-		Assert(!TransactionIdIsValid(cutoff) ||
-			   cutoff == prunestate->visibility_cutoff_xid);
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == visibility_cutoff_xid);
 	}
 #endif
 
@@ -1737,7 +1739,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		ItemPointerData tmp;
 
 		vacrel->lpdead_item_pages++;
-		prunestate->has_lpdead_items = true;
 
 		ItemPointerSetBlockNumber(&tmp, blkno);
 
@@ -1762,7 +1763,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * Now that freezing has been finalized, unset all_visible.  It needs
 		 * to reflect the present state of things, as expected by our caller.
 		 */
-		prunestate->all_visible = false;
+		all_visible = false;
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1776,19 +1777,23 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
 
-	Assert(!prunestate->all_visible || !prunestate->has_lpdead_items);
+	/* Did we find LP_DEAD items? */
+	*has_lpdead_items = (lpdead_items > 0);
+
+	Assert(!all_visible || !(*has_lpdead_items));
 
 	/*
 	 * Handle setting visibility map bit based on information from the VM (as
-	 * of last lazy_scan_skip() call), and from prunestate
+	 * of last lazy_scan_skip() call), and from all_visible and all_frozen
+	 * variables
 	 */
-	if (!all_visible_according_to_vm && prunestate->all_visible)
+	if (!all_visible_according_to_vm && all_visible)
 	{
 		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
 
-		if (prunestate->all_frozen)
+		if (all_frozen)
 		{
-			Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 			flags |= VISIBILITYMAP_ALL_FROZEN;
 		}
 
@@ -1808,7 +1813,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		PageSetAllVisible(page);
 		MarkBufferDirty(buf);
 		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-						  vmbuffer, prunestate->visibility_cutoff_xid,
+						  vmbuffer, visibility_cutoff_xid,
 						  flags);
 	}
 
@@ -1841,7 +1846,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
 	 * however.
 	 */
-	else if (prunestate->has_lpdead_items && PageIsAllVisible(page))
+	else if (lpdead_items > 0 && PageIsAllVisible(page))
 	{
 		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
 			 vacrel->relname, blkno);
@@ -1854,16 +1859,16 @@ lazy_scan_prune(LVRelState *vacrel,
 	/*
 	 * If the all-visible page is all-frozen but not marked as such yet, mark
 	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
-	 * true, so we must check both prunestate fields.
+	 * true, so we must check both all_visible and all_frozen.
 	 */
-	else if (all_visible_according_to_vm && prunestate->all_visible &&
-			 prunestate->all_frozen &&
+	else if (all_visible_according_to_vm && all_visible &&
+			 all_frozen &&
 			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 	{
 		/*
 		 * Avoid relying on all_visible_according_to_vm as a proxy for the
 		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
-		 * stale -- even when all_visible is set in prunestate
+		 * stale -- even when all_visible is set
 		 */
 		if (!PageIsAllVisible(page))
 		{
@@ -1878,7 +1883,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		 * since a snapshotConflictHorizon sufficient to make everything safe
 		 * for REDO was logged when the page's tuples were frozen.
 		 */
-		Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+		Assert(!TransactionIdIsValid(visibility_cutoff_xid));
 		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
 						  vmbuffer, InvalidTransactionId,
 						  VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 29fd1cae64..17c0104376 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1405,7 +1405,6 @@ LPVOID
 LPWSTR
 LSEG
 LUID
-LVPagePruneState
 LVRelState
 LVSavedErrInfo
 LWLock
-- 
2.37.2

v10-0002-Move-VM-update-code-to-lazy_scan_prune.patchtext/x-patch; charset=US-ASCII; name=v10-0002-Move-VM-update-code-to-lazy_scan_prune.patchDownload
From 6aa1cd8f04a5ca188100267bc23cfa74426e6af3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 5 Jan 2024 11:12:58 -0500
Subject: [PATCH v10 2/4] Move VM update code to lazy_scan_prune()

The visibility map is updated after lazy_scan_prune() returns in
lazy_scan_heap(). Most of the output parameters of lazy_scan_prune() are
used to update the VM in lazy_scan_heap(). Moving that code into
lazy_scan_prune() simplifies lazy_scan_heap() and requires less
communication between the two. This is a pure lift and shift. No VM
update logic is changed. All but one of the members of the prunestate
are now obsolete. It will be removed in a future commit.
---
 src/backend/access/heap/vacuumlazy.c | 226 ++++++++++++++-------------
 1 file changed, 115 insertions(+), 111 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2f530ad62c..34bc6b6890 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -249,6 +249,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
+							Buffer vmbuffer, bool all_visible_according_to_vm,
 							LVPagePruneState *prunestate);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 							  BlockNumber blkno, Page page,
@@ -1032,117 +1033,9 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * tuple headers of remaining items with storage. It also determines
 		 * if truncating this block is safe.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page, &prunestate);
-
-		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
-
-		/*
-		 * Handle setting visibility map bit based on information from the VM
-		 * (as of last lazy_scan_skip() call), and from prunestate
-		 */
-		if (!all_visible_according_to_vm && prunestate.all_visible)
-		{
-			uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
-
-			if (prunestate.all_frozen)
-			{
-				Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-				flags |= VISIBILITYMAP_ALL_FROZEN;
-			}
-
-			/*
-			 * It should never be the case that the visibility map page is set
-			 * while the page-level bit is clear, but the reverse is allowed
-			 * (if checksums are not enabled).  Regardless, set both bits so
-			 * that we get back in sync.
-			 *
-			 * NB: If the heap page is all-visible but the VM bit is not set,
-			 * we don't need to dirty the heap page.  However, if checksums
-			 * are enabled, we do need to make sure that the heap page is
-			 * dirtied before passing it to visibilitymap_set(), because it
-			 * may be logged.  Given that this situation should only happen in
-			 * rare cases after a crash, it is not worth optimizing.
-			 */
-			PageSetAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, prunestate.visibility_cutoff_xid,
-							  flags);
-		}
-
-		/*
-		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
-		 * the page-level bit is clear.  However, it's possible that the bit
-		 * got cleared after lazy_scan_skip() was called, so we must recheck
-		 * with buffer lock before concluding that the VM is corrupt.
-		 */
-		else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-				 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
-		{
-			elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-				 vacrel->relname, blkno);
-			visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-								VISIBILITYMAP_VALID_BITS);
-		}
-
-		/*
-		 * It's possible for the value returned by
-		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-		 * wrong for us to see tuples that appear to not be visible to
-		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
-		 * xmin value never moves backwards, but
-		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
-		 * returns a value that's unnecessarily small, so if we see that
-		 * contradiction it just means that the tuples that we think are not
-		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
-		 * is correct.
-		 *
-		 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE
-		 * set, however.
-		 */
-		else if (prunestate.has_lpdead_items && PageIsAllVisible(page))
-		{
-			elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-				 vacrel->relname, blkno);
-			PageClearAllVisible(page);
-			MarkBufferDirty(buf);
-			visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
-								VISIBILITYMAP_VALID_BITS);
-		}
-
-		/*
-		 * If the all-visible page is all-frozen but not marked as such yet,
-		 * mark it as all-frozen.  Note that all_frozen is only valid if
-		 * all_visible is true, so we must check both prunestate fields.
-		 */
-		else if (all_visible_according_to_vm && prunestate.all_visible &&
-				 prunestate.all_frozen &&
-				 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-		{
-			/*
-			 * Avoid relying on all_visible_according_to_vm as a proxy for the
-			 * page-level PD_ALL_VISIBLE bit being set, since it might have
-			 * become stale -- even when all_visible is set in prunestate
-			 */
-			if (!PageIsAllVisible(page))
-			{
-				PageSetAllVisible(page);
-				MarkBufferDirty(buf);
-			}
-
-			/*
-			 * Set the page all-frozen (and all-visible) in the VM.
-			 *
-			 * We can pass InvalidTransactionId as our visibility_cutoff_xid,
-			 * since a snapshotConflictHorizon sufficient to make everything
-			 * safe for REDO was logged when the page's tuples were frozen.
-			 */
-			Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
-			visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
-		}
+		lazy_scan_prune(vacrel, buf, blkno, page,
+						vmbuffer, all_visible_according_to_vm,
+						&prunestate);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1496,6 +1389,8 @@ lazy_scan_prune(LVRelState *vacrel,
 				Buffer buf,
 				BlockNumber blkno,
 				Page page,
+				Buffer vmbuffer,
+				bool all_visible_according_to_vm,
 				LVPagePruneState *prunestate)
 {
 	Relation	rel = vacrel->rel;
@@ -1880,6 +1775,115 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Can't truncate this page */
 	if (hastup)
 		vacrel->nonempty_pages = blkno + 1;
+
+	Assert(!prunestate->all_visible || !prunestate->has_lpdead_items);
+
+	/*
+	 * Handle setting visibility map bit based on information from the VM (as
+	 * of last lazy_scan_skip() call), and from prunestate
+	 */
+	if (!all_visible_according_to_vm && prunestate->all_visible)
+	{
+		uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
+
+		if (prunestate->all_frozen)
+		{
+			Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+			flags |= VISIBILITYMAP_ALL_FROZEN;
+		}
+
+		/*
+		 * It should never be the case that the visibility map page is set
+		 * while the page-level bit is clear, but the reverse is allowed (if
+		 * checksums are not enabled).  Regardless, set both bits so that we
+		 * get back in sync.
+		 *
+		 * NB: If the heap page is all-visible but the VM bit is not set, we
+		 * don't need to dirty the heap page.  However, if checksums are
+		 * enabled, we do need to make sure that the heap page is dirtied
+		 * before passing it to visibilitymap_set(), because it may be logged.
+		 * Given that this situation should only happen in rare cases after a
+		 * crash, it is not worth optimizing.
+		 */
+		PageSetAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, prunestate->visibility_cutoff_xid,
+						  flags);
+	}
+
+	/*
+	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+	 * page-level bit is clear.  However, it's possible that the bit got
+	 * cleared after lazy_scan_skip() was called, so we must recheck with
+	 * buffer lock before concluding that the VM is corrupt.
+	 */
+	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
+			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	{
+		elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+			 vacrel->relname, blkno);
+		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * It's possible for the value returned by
+	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+	 * wrong for us to see tuples that appear to not be visible to everyone
+	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
+	 * conservative and sometimes returns a value that's unnecessarily small,
+	 * so if we see that contradiction it just means that the tuples that we
+	 * think are not visible to everyone yet actually are, and the
+	 * PD_ALL_VISIBLE flag is correct.
+	 *
+	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+	 * however.
+	 */
+	else if (prunestate->has_lpdead_items && PageIsAllVisible(page))
+	{
+		elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+			 vacrel->relname, blkno);
+		PageClearAllVisible(page);
+		MarkBufferDirty(buf);
+		visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+	}
+
+	/*
+	 * If the all-visible page is all-frozen but not marked as such yet, mark
+	 * it as all-frozen.  Note that all_frozen is only valid if all_visible is
+	 * true, so we must check both prunestate fields.
+	 */
+	else if (all_visible_according_to_vm && prunestate->all_visible &&
+			 prunestate->all_frozen &&
+			 !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+	{
+		/*
+		 * Avoid relying on all_visible_according_to_vm as a proxy for the
+		 * page-level PD_ALL_VISIBLE bit being set, since it might have become
+		 * stale -- even when all_visible is set in prunestate
+		 */
+		if (!PageIsAllVisible(page))
+		{
+			PageSetAllVisible(page);
+			MarkBufferDirty(buf);
+		}
+
+		/*
+		 * Set the page all-frozen (and all-visible) in the VM.
+		 *
+		 * We can pass InvalidTransactionId as our visibility_cutoff_xid,
+		 * since a snapshotConflictHorizon sufficient to make everything safe
+		 * for REDO was logged when the page's tuples were frozen.
+		 */
+		Assert(!TransactionIdIsValid(prunestate->visibility_cutoff_xid));
+		visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
+						  vmbuffer, InvalidTransactionId,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN);
+	}
 }
 
 /*
-- 
2.37.2

v10-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchtext/x-patch; charset=US-ASCII; name=v10-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patchDownload
From e1568664171befdb1281d360f0e20a6545260f39 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Jan 2024 17:41:55 -0500
Subject: [PATCH v10 1/4] Set would-be dead items LP_UNUSED while pruning

If there are no indexes on a relation, items can be marked LP_UNUSED
instead of LP_DEAD during while pruning. This avoids a separate
invocation of lazy_vacuum_heap_page() and saves a vacuum WAL record.

To accomplish this, pass heap_page_prune() a new parameter, mark_unused_now,
which indicates that dead line pointers should be set to LP_UNUSED
during pruning, allowing earlier reaping of tuples.

We only enable this for vacuum's pruning. On access pruning always
passes mark_unused_now == false.

Discussion: https://postgr.es/m/CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  |  80 ++++++++++++++---
 src/backend/access/heap/vacuumlazy.c | 129 ++++++++-------------------
 src/include/access/heapam.h          |   1 +
 3 files changed, 108 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3e0a1a260e..5917633567 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -35,6 +35,8 @@ typedef struct
 
 	/* tuple visibility test, initialized for the relation */
 	GlobalVisState *vistest;
+	/* whether or not dead items can be set LP_UNUSED during pruning */
+	bool		mark_unused_now;
 
 	TransactionId new_prune_xid;	/* new prune hint value for page */
 	TransactionId snapshotConflictHorizon;	/* latest xid removed */
@@ -67,6 +69,7 @@ static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum);
 static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum);
 static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
 static void page_verify_redirects(Page page);
 
@@ -148,7 +151,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		{
 			PruneResult presult;
 
-			heap_page_prune(relation, buffer, vistest, &presult, NULL);
+			/*
+			 * For now, pass mark_unused_now as false regardless of whether or
+			 * not the relation has indexes, since we cannot safely determine
+			 * that during on-access pruning with the current implementation.
+			 */
+			heap_page_prune(relation, buffer, vistest, false,
+							&presult, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -193,6 +202,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * (see heap_prune_satisfies_vacuum and
  * HeapTupleSatisfiesVacuum).
  *
+ * mark_unused_now indicates whether or not dead items can be set LP_UNUSED during
+ * pruning.
+ *
  * off_loc is the offset location required by the caller to use in error
  * callback.
  *
@@ -203,6 +215,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 void
 heap_page_prune(Relation relation, Buffer buffer,
 				GlobalVisState *vistest,
+				bool mark_unused_now,
 				PruneResult *presult,
 				OffsetNumber *off_loc)
 {
@@ -227,6 +240,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 	prstate.new_prune_xid = InvalidTransactionId;
 	prstate.rel = relation;
 	prstate.vistest = vistest;
+	prstate.mark_unused_now = mark_unused_now;
 	prstate.snapshotConflictHorizon = InvalidTransactionId;
 	prstate.nredirected = prstate.ndead = prstate.nunused = 0;
 	memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -306,9 +320,9 @@ heap_page_prune(Relation relation, Buffer buffer,
 		if (off_loc)
 			*off_loc = offnum;
 
-		/* Nothing to do if slot is empty or already dead */
+		/* Nothing to do if slot is empty */
 		itemid = PageGetItemId(page, offnum);
-		if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+		if (!ItemIdIsUsed(itemid))
 			continue;
 
 		/* Process this item or chain of items */
@@ -581,7 +595,17 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * function.)
 		 */
 		if (ItemIdIsDead(lp))
+		{
+			/*
+			 * If the caller set mark_unused_now true, we can set dead line
+			 * pointers LP_UNUSED now. We don't increment ndeleted here since
+			 * the LP was already marked dead.
+			 */
+			if (unlikely(prstate->mark_unused_now))
+				heap_prune_record_unused(prstate, offnum);
+
 			break;
+		}
 
 		Assert(ItemIdIsNormal(lp));
 		htup = (HeapTupleHeader) PageGetItem(dp, lp);
@@ -715,7 +739,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * redirect the root to the correct chain member.
 		 */
 		if (i >= nchain)
-			heap_prune_record_dead(prstate, rootoffnum);
+			heap_prune_record_dead_or_unused(prstate, rootoffnum);
 		else
 			heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
 	}
@@ -726,9 +750,9 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 		 * item.  This can happen if the loop in heap_page_prune caused us to
 		 * visit the dead successor of a redirect item before visiting the
 		 * redirect item.  We can clean up by setting the redirect item to
-		 * DEAD state.
+		 * DEAD state or LP_UNUSED if the caller indicated.
 		 */
-		heap_prune_record_dead(prstate, rootoffnum);
+		heap_prune_record_dead_or_unused(prstate, rootoffnum);
 	}
 
 	return ndeleted;
@@ -774,6 +798,27 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
 	prstate->marked[offnum] = true;
 }
 
+/*
+ * Depending on whether or not the caller set mark_unused_now to true, record that a
+ * line pointer should be marked LP_DEAD or LP_UNUSED. There are other cases in
+ * which we will mark line pointers LP_UNUSED, but we will not mark line
+ * pointers LP_DEAD if mark_unused_now is true.
+ */
+static void
+heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
+{
+	/*
+	 * If the caller set mark_unused_now to true, we can remove dead tuples
+	 * during pruning instead of marking their line pointers dead. Set this
+	 * tuple's line pointer LP_UNUSED. We hint that this option is less
+	 * likely.
+	 */
+	if (unlikely(prstate->mark_unused_now))
+		heap_prune_record_unused(prstate, offnum);
+	else
+		heap_prune_record_dead(prstate, offnum);
+}
+
 /* Record line pointer to be marked unused */
 static void
 heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
@@ -903,13 +948,24 @@ heap_page_prune_execute(Buffer buffer,
 #ifdef USE_ASSERT_CHECKING
 
 		/*
-		 * Only heap-only tuples can become LP_UNUSED during pruning.  They
-		 * don't need to be left in place as LP_DEAD items until VACUUM gets
-		 * around to doing index vacuuming.
+		 * When heap_page_prune() was called, mark_unused_now may have been
+		 * passed as true, which allows would-be LP_DEAD items to be made
+		 * LP_UNUSED instead. This is only possible if the relation has no
+		 * indexes. If there are any dead items, then mark_unused_now was not
+		 * true and every item being marked LP_UNUSED must refer to a
+		 * heap-only tuple.
 		 */
-		Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
-		htup = (HeapTupleHeader) PageGetItem(page, lp);
-		Assert(HeapTupleHeaderIsHeapOnly(htup));
+		if (ndead > 0)
+		{
+			Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
+			htup = (HeapTupleHeader) PageGetItem(page, lp);
+			Assert(HeapTupleHeaderIsHeapOnly(htup));
+		}
+		else
+		{
+			Assert(ItemIdIsUsed(lp));
+		}
+
 #endif
 
 		ItemIdSetUnused(lp);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e7a942e183..2f530ad62c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1036,69 +1036,6 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 
-		if (vacrel->nindexes == 0)
-		{
-			/*
-			 * Consider the need to do page-at-a-time heap vacuuming when
-			 * using the one-pass strategy now.
-			 *
-			 * The one-pass strategy will never call lazy_vacuum().  The steps
-			 * performed here can be thought of as the one-pass equivalent of
-			 * a call to lazy_vacuum().
-			 */
-			if (prunestate.has_lpdead_items)
-			{
-				Size		freespace;
-
-				lazy_vacuum_heap_page(vacrel, blkno, buf, 0, vmbuffer);
-
-				/* Forget the LP_DEAD items that we just vacuumed */
-				dead_items->num_items = 0;
-
-				/*
-				 * Now perform FSM processing for blkno, and move on to next
-				 * page.
-				 *
-				 * Our call to lazy_vacuum_heap_page() will have considered if
-				 * it's possible to set all_visible/all_frozen independently
-				 * of lazy_scan_prune().  Note that prunestate was invalidated
-				 * by lazy_vacuum_heap_page() call.
-				 */
-				freespace = PageGetHeapFreeSpace(page);
-
-				UnlockReleaseBuffer(buf);
-				RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-
-				/*
-				 * Periodically perform FSM vacuuming to make newly-freed
-				 * space visible on upper FSM pages. FreeSpaceMapVacuumRange()
-				 * vacuums the portion of the freespace map covering heap
-				 * pages from start to end - 1. Include the block we just
-				 * vacuumed by passing it blkno + 1. Overflow isn't an issue
-				 * because MaxBlockNumber + 1 is InvalidBlockNumber which
-				 * causes FreeSpaceMapVacuumRange() to vacuum freespace map
-				 * pages covering the remainder of the relation.
-				 */
-				if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
-				{
-					FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-											blkno + 1);
-					next_fsm_block_to_vacuum = blkno + 1;
-				}
-
-				continue;
-			}
-
-			/*
-			 * There was no call to lazy_vacuum_heap_page() because pruning
-			 * didn't encounter/create any LP_DEAD items that needed to be
-			 * vacuumed.  Prune state has not been invalidated, so proceed
-			 * with prunestate-driven visibility map and FSM steps (just like
-			 * the two-pass strategy).
-			 */
-			Assert(dead_items->num_items == 0);
-		}
-
 		/*
 		 * Handle setting visibility map bit based on information from the VM
 		 * (as of last lazy_scan_skip() call), and from prunestate
@@ -1209,38 +1146,45 @@ lazy_scan_heap(LVRelState *vacrel)
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
-		 * FSM
+		 * FSM.
+		 *
+		 * If we will likely do index vacuuming, wait until
+		 * lazy_vacuum_heap_rel() to save free space. This doesn't just save
+		 * us some cycles; it also allows us to record any additional free
+		 * space that lazy_vacuum_heap_page() will make available in cases
+		 * where it's possible to truncate the page's line pointer array.
+		 *
+		 * Note: It's not in fact 100% certain that we really will call
+		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
+		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
+		 * because it only happens in emergencies, or when there is very
+		 * little free space anyway. (Besides, we start recording free space
+		 * in the FSM once index vacuuming has been abandoned.)
 		 */
-		if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
-		{
-			/*
-			 * Wait until lazy_vacuum_heap_rel() to save free space.  This
-			 * doesn't just save us some cycles; it also allows us to record
-			 * any additional free space that lazy_vacuum_heap_page() will
-			 * make available in cases where it's possible to truncate the
-			 * page's line pointer array.
-			 *
-			 * Note: It's not in fact 100% certain that we really will call
-			 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip
-			 * index vacuuming (and so must skip heap vacuuming).  This is
-			 * deemed okay because it only happens in emergencies, or when
-			 * there is very little free space anyway. (Besides, we start
-			 * recording free space in the FSM once index vacuuming has been
-			 * abandoned.)
-			 *
-			 * Note: The one-pass (no indexes) case is only supposed to make
-			 * it this far when there were no LP_DEAD items during pruning.
-			 */
-			Assert(vacrel->nindexes > 0);
-			UnlockReleaseBuffer(buf);
-		}
-		else
+		if (vacrel->nindexes == 0
+			|| !vacrel->do_index_vacuuming
+			|| !prunestate.has_lpdead_items)
 		{
 			Size		freespace = PageGetHeapFreeSpace(page);
 
 			UnlockReleaseBuffer(buf);
 			RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
+
+			/*
+			 * Periodically perform FSM vacuuming to make newly-freed space
+			 * visible on upper FSM pages. This is done after vacuuming if the
+			 * table has indexes.
+			 */
+			if (vacrel->nindexes == 0 && prunestate.has_lpdead_items &&
+				blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
+			{
+				FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
+										blkno);
+				next_fsm_block_to_vacuum = blkno;
+			}
 		}
+		else
+			UnlockReleaseBuffer(buf);
 	}
 
 	vacrel->blkno = InvalidBlockNumber;
@@ -1596,8 +1540,13 @@ lazy_scan_prune(LVRelState *vacrel,
 	 * in presult.ndeleted. It should not be confused with lpdead_items;
 	 * lpdead_items's final value can be thought of as the number of tuples
 	 * that were deleted from indexes.
+	 *
+	 * If the relation has no indexes, we can immediately mark would-be dead
+	 * items LP_UNUSED, so mark_unused_now should be true if no indexes and
+	 * false otherwise.
 	 */
-	heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum);
+	heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+					&presult, &vacrel->offnum);
 
 	/*
 	 * Now scan the page to collect LP_DEAD items and check for tuples
@@ -2520,7 +2469,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
 
-	Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
+	Assert(vacrel->do_index_vacuuming);
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 932ec0d6f2..4b133f6859 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -320,6 +320,7 @@ struct GlobalVisState;
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern void heap_page_prune(Relation relation, Buffer buffer,
 							struct GlobalVisState *vistest,
+							bool mark_unused_now,
 							PruneResult *presult,
 							OffsetNumber *off_loc);
 extern void heap_page_prune_execute(Buffer buffer,
-- 
2.37.2

In reply to: Robert Haas (#80)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 18, 2024 at 8:52 AM Robert Haas <robertmhaas@gmail.com> wrote:

But I also said one more thing that I'd still like to hear your
thoughts about, which is: why is it right to update the FSM after the
second heap pass rather than the first one? I can't help but suspect
this is an algorithmic holdover from pre-HOT days, when VACUUM's first
heap pass was read-only and all the work happened in the second pass.
Now, nearly all of the free space that will ever become free becomes
free in the first pass, so why not advertise it then, instead of
waiting?

I don't think that doing everything FSM-related in the first heap pass
is a bad idea -- especially not if it buys you something elsewhere.

The problem with your justification for moving things in that
direction (if any) is that it is occasionally not quite true: there
are at least some cases where line pointer truncation after making a
page's LP_DEAD items -> LP_UNUSED will actually matter. Plus
PageGetHeapFreeSpace() will return 0 if and when
"PageGetMaxOffsetNumber(page) > MaxHeapTuplesPerPage &&
!PageHasFreeLinePointers(page)". Of course, nothing stops you from
compensating for this by anticipating what will happen later on, and
assuming that the page already has that much free space.

It might even be okay to just not try to compensate for anything,
PageGetHeapFreeSpace-wise -- just do all FSM stuff in the first heap
pass, and ignore all this. I happen to believe that a FSM_CATEGORIES
of 256 is way too much granularity to be useful in practice -- I just
don't have any faith in the idea that that kind of granularity is
useful (it's quite the opposite).

A further justification might be what we already do in the heapam.c
REDO routines: the way that we use XLogRecordPageWithFreeSpace already
operates with far less precision that corresponding code from
vacuumlazy.c. heap_xlog_prune() already has recovery do what you
propose to do during original execution; it doesn't try to avoid
duplicating an anticipated call to XLogRecordPageWithFreeSpace that'll
take place when heap_xlog_vacuum() runs against the same page a bit
later on.

You'd likely prefer a simpler argument for doing this -- an argument
that doesn't require abandoning/discrediting the idea that a high
degree of FSM_CATEGORIES-wise precision is a valuable thing. Not sure
that that's possible -- the current design is at least correct on its
own terms. And what you propose to do will probably be less correct on
those same terms, silly though they are.

--
Peter Geoghegan

#83Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#81)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 18, 2024 at 9:53 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I believe I've done this in attached v10.

Oh, I see. Good catch.

I've now committed 0001.

--
Robert Haas
EDB: http://www.enterprisedb.com

#84Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#82)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 18, 2024 at 10:09 AM Peter Geoghegan <pg@bowt.ie> wrote:

The problem with your justification for moving things in that
direction (if any) is that it is occasionally not quite true: there
are at least some cases where line pointer truncation after making a
page's LP_DEAD items -> LP_UNUSED will actually matter. Plus
PageGetHeapFreeSpace() will return 0 if and when
"PageGetMaxOffsetNumber(page) > MaxHeapTuplesPerPage &&
!PageHasFreeLinePointers(page)". Of course, nothing stops you from
compensating for this by anticipating what will happen later on, and
assuming that the page already has that much free space.

I think we're agreeing but I want to be sure. If we only set LP_DEAD
items to LP_UNUSED, that frees no space. But if doing so allows us to
truncate the line pointer array, that that frees a little bit of
space. Right?

One problem with using this as a justification for the status quo is
that truncating the line pointer array is a relatively recent
behavior. It's certainly much newer than the choice to have VACUUM
touch the FSM in the second page than the first page.

Another problem is that the amount of space that we're freeing up in
the second pass is really quite minimal even when it's >0. Any tuple
that actually contains any data at all is at least 32 bytes, and most
of them are quite a bit larger. Item pointers are 2 bytes. To save
enough space to fit even one additional tuple, we'd have to free *at
least* 16 line pointers. That's going to be really rare.

And even if it happens, is it even useful to advertise that free
space? Do we want to cram one more tuple into a page that has a
history of extremely heavy updates? Could it be that it's smarter to
just forget about that free space? You've written before about the
stupidity of cramming tuples of different generations into the same
page, and that concept seems to apply here. When we heap_page_prune(),
we don't know how much time has elapsed since the page was last
modified - but if we're lucky, it might not be very much. Updating the
FSM at that time gives us some shot of filling up the page with data
created around the same time as the existing page contents. By the
time we vacuum the indexes and come back, that temporal locality is
definitely lost.

You'd likely prefer a simpler argument for doing this -- an argument
that doesn't require abandoning/discrediting the idea that a high
degree of FSM_CATEGORIES-wise precision is a valuable thing. Not sure
that that's possible -- the current design is at least correct on its
own terms. And what you propose to do will probably be less correct on
those same terms, silly though they are.

I've never really understood why you think that the number of
FSM_CATEGORIES is the problem. I believe I recall you endorsing a
system where pages are open or closed, to try to achieve temporal
locality of data. I agree that such a system could work better than
what we have now. I think there's a risk that such a system could
create pathological cases where the behavior is much worse than what
we have today, and I think we'd need to consider carefully what such
cases might exist and what mitigation strategies might make sense.
However, I don't see a reason why such a system should intrinsically
want to reduce FSM_CATEGORIES. If we have two open pages and one of
them has enough space for the tuple we're now trying to insert and the
other doesn't, we'd still like to avoid having the FSM hand us the one
that doesn't.

Now, that said, I suspect that we actually could reduce FSM_CATEGORIES
somewhat without causing any real problems, because many tables are
going to have tuples that are all about the same size, and even in a
table where the sizes vary more than is typical, a single tuple can't
consume more than a quarter of the page, so granularity above that
point seems completely useless. So if we needed some bitspace to track
the open/closed status of pages or similar, I suspect we could find
that in the existing FSM byte per page without losing anything. But
all of that is just an argument that reducing the number of
FSM_CATEGORIES is *acceptable*; it doesn't amount to an argument that
it's better. My current belief is that it isn't better, just a vehicle
to do something else that maybe is better, like squeezing open/closed
tracking or similar into the existing bit space. My understanding is
that you think it would be better on its own terms, but I have not yet
been able to grasp why that would be so.

--
Robert Haas
EDB: http://www.enterprisedb.com

#85Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#84)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 18, 2024 at 10:42 AM Robert Haas <robertmhaas@gmail.com> wrote:

Now, that said, I suspect that we actually could reduce FSM_CATEGORIES
somewhat without causing any real problems, because many tables are
going to have tuples that are all about the same size, and even in a
table where the sizes vary more than is typical, a single tuple can't
consume more than a quarter of the page,

Actually, I think that's a soft limit, not a hard limit. But the
granularity above that level probably doesn't need to be very high, at
leat.

--
Robert Haas
EDB: http://www.enterprisedb.com

In reply to: Robert Haas (#84)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 18, 2024 at 10:43 AM Robert Haas <robertmhaas@gmail.com> wrote:

I think we're agreeing but I want to be sure. If we only set LP_DEAD
items to LP_UNUSED, that frees no space. But if doing so allows us to
truncate the line pointer array, that that frees a little bit of
space. Right?

That's part of it, yes.

One problem with using this as a justification for the status quo is
that truncating the line pointer array is a relatively recent
behavior. It's certainly much newer than the choice to have VACUUM
touch the FSM in the second page than the first page.

True. But the way that PageGetHeapFreeSpace() returns 0 for a page
with 291 LP_DEAD stubs is a much older behavior. When that happens it
is literally true that the page has lots of free space. And yet it's
not free space we can actually use. Not until those LP_DEAD items are
marked LP_UNUSED.

Another problem is that the amount of space that we're freeing up in
the second pass is really quite minimal even when it's >0. Any tuple
that actually contains any data at all is at least 32 bytes, and most
of them are quite a bit larger. Item pointers are 2 bytes. To save
enough space to fit even one additional tuple, we'd have to free *at
least* 16 line pointers. That's going to be really rare.

I basically agree with this. I would still worry about the "291
LP_DEAD stubs makes PageGetHeapFreeSpace return 0" thing specifically,
though. It's sort of a special case.

And even if it happens, is it even useful to advertise that free
space? Do we want to cram one more tuple into a page that has a
history of extremely heavy updates? Could it be that it's smarter to
just forget about that free space?

I think so, yes.

Another big source of inaccuracies here is that we don't credit
RECENTLY_DEAD tuple space with being free space. Maybe that isn't a
huge problem, but it makes it even harder to believe that precision in
FSM accounting is an intrinsic good.

You'd likely prefer a simpler argument for doing this -- an argument
that doesn't require abandoning/discrediting the idea that a high
degree of FSM_CATEGORIES-wise precision is a valuable thing. Not sure
that that's possible -- the current design is at least correct on its
own terms. And what you propose to do will probably be less correct on
those same terms, silly though they are.

I've never really understood why you think that the number of
FSM_CATEGORIES is the problem. I believe I recall you endorsing a
system where pages are open or closed, to try to achieve temporal
locality of data.

My remarks about "FSM_CATEGORIES-wise precision" were basically
remarks about the fundamental problem with the free space map. Which
is really that it's just a map of free space, that gives exactly zero
thought to various high level things that *obviously* matter. I wasn't
particularly planning on getting into the specifics of that with you
now, on this thread.

A brief recap might be useful: other systems with a heap table AM free
space management structure typically represent the free space
available on each page using a far more coarse grained counter.
Usually one with less than 10 distinct increments. The immediate
problem with FSM_CATEGORIES having such a fine granularity is that it
increases contention/competition among backends that need to find some
free space for a new tuple. They'll all diligently try to find the
page with the least free space that still satisfies their immediate
needs -- there is no thought for the second-order effects, which are
really important in practice.

But all of that is just an argument that reducing the number of
FSM_CATEGORIES is *acceptable*; it doesn't amount to an argument that
it's better. My current belief is that it isn't better, just a vehicle
to do something else that maybe is better, like squeezing open/closed
tracking or similar into the existing bit space. My understanding is
that you think it would be better on its own terms, but I have not yet
been able to grasp why that would be so.

I'm not really arguing that reducing FSM_CATEGORIES and changing
nothing else would be better on its own (it might be, but that's not
what I meant to convey).

What I really wanted to convey is this: if you're going to go the
route of ignoring LP_DEAD free space during vacuuming, you're
conceding that having a high degree of precision about available free
space isn't actually useful (or wouldn't be useful if it was actually
possible at all). Which is something that I generally agree with. I'd
just like it to be clear that you/Melanie are in fact taking one small
step in that direction. We don't need to discuss possible later steps
beyond that first step. Not right now.

--
Peter Geoghegan

#87Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#86)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 18, 2024 at 11:17 AM Peter Geoghegan <pg@bowt.ie> wrote:

True. But the way that PageGetHeapFreeSpace() returns 0 for a page
with 291 LP_DEAD stubs is a much older behavior. When that happens it
is literally true that the page has lots of free space. And yet it's
not free space we can actually use. Not until those LP_DEAD items are
marked LP_UNUSED.

To me, this is just accurate reporting. What we care about in this
context is the amount of free space on the page that can be used to
store a new tuple. When there are no line pointers available to be
allocated, that amount is 0.

Another big source of inaccuracies here is that we don't credit
RECENTLY_DEAD tuple space with being free space. Maybe that isn't a
huge problem, but it makes it even harder to believe that precision in
FSM accounting is an intrinsic good.

The difficulty here is that we don't know how long it will be before
that space can be reused. Those recently dead tuples could become dead
within a few milliseconds or stick around for hours. I've wondered
about the merits of some FSM that had built-in visibility awareness,
i.e. the capability to record something like "page X currently has Y
space free and after XID Z is all-visible it will have Y' space free".
That seems complex, but without it, we either have to bet that the
space will actually become free before anyone tries to use it, or that
it won't. If whatever guess we make is wrong, bad things happen.

My remarks about "FSM_CATEGORIES-wise precision" were basically
remarks about the fundamental problem with the free space map. Which
is really that it's just a map of free space, that gives exactly zero
thought to various high level things that *obviously* matter. I wasn't
particularly planning on getting into the specifics of that with you
now, on this thread.

Fair.

A brief recap might be useful: other systems with a heap table AM free
space management structure typically represent the free space
available on each page using a far more coarse grained counter.
Usually one with less than 10 distinct increments. The immediate
problem with FSM_CATEGORIES having such a fine granularity is that it
increases contention/competition among backends that need to find some
free space for a new tuple. They'll all diligently try to find the
page with the least free space that still satisfies their immediate
needs -- there is no thought for the second-order effects, which are
really important in practice.

I think that the completely deterministic nature of the computation is
a mistake regardless of anything else. That serves to focus contention
rather than spreading it out, which is dumb, and would still be dumb
with any other number of FSM_CATEGORIES.

What I really wanted to convey is this: if you're going to go the
route of ignoring LP_DEAD free space during vacuuming, you're
conceding that having a high degree of precision about available free
space isn't actually useful (or wouldn't be useful if it was actually
possible at all). Which is something that I generally agree with. I'd
just like it to be clear that you/Melanie are in fact taking one small
step in that direction. We don't need to discuss possible later steps
beyond that first step. Not right now.

Yeah. I'm not sure we're actually going to change that right now, but
I agree with the high-level point regardless, which I would summarize
like this: The current system provides more precision about available
free space than we actually need, while failing to provide some other
things that we really do need. We need not agree today on exactly what
those other things are or how best to get them in order to agree that
the current system has significant flaws, and we do agree that it
does.

--
Robert Haas
EDB: http://www.enterprisedb.com

In reply to: Robert Haas (#87)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 18, 2024 at 11:46 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 18, 2024 at 11:17 AM Peter Geoghegan <pg@bowt.ie> wrote:

True. But the way that PageGetHeapFreeSpace() returns 0 for a page
with 291 LP_DEAD stubs is a much older behavior. When that happens it
is literally true that the page has lots of free space. And yet it's
not free space we can actually use. Not until those LP_DEAD items are
marked LP_UNUSED.

To me, this is just accurate reporting. What we care about in this
context is the amount of free space on the page that can be used to
store a new tuple. When there are no line pointers available to be
allocated, that amount is 0.

I agree. All I'm saying is this (can't imagine you'll disagree):

It's not okay if you fail to update the FSM a second time in the
second heap pass -- at least in some cases. It's reasonably frequent
for a page that has 0 usable free space when lazy_scan_prune returns
to go on to have almost BLCKSZ free space once lazy_vacuum_heap_page()
is done with it.

While I am sympathetic to the argument that LP_DEAD item space just
isn't that important in general, that doesn't apply with this one
special case. This is a "step function" behavior, and is seen whenever
VACUUM runs following bulk deletes of tuples -- a rather common case.
Clearly the FSM shouldn't show that pages that are actually completely
empty at the end of VACUUM as having no available free space after a
VACUUM finishes (on account of how they looked immediately after
lazy_scan_prune ran). That'd just be wrong.

Another big source of inaccuracies here is that we don't credit
RECENTLY_DEAD tuple space with being free space. Maybe that isn't a
huge problem, but it makes it even harder to believe that precision in
FSM accounting is an intrinsic good.

The difficulty here is that we don't know how long it will be before
that space can be reused. Those recently dead tuples could become dead
within a few milliseconds or stick around for hours. I've wondered
about the merits of some FSM that had built-in visibility awareness,
i.e. the capability to record something like "page X currently has Y
space free and after XID Z is all-visible it will have Y' space free".
That seems complex, but without it, we either have to bet that the
space will actually become free before anyone tries to use it, or that
it won't. If whatever guess we make is wrong, bad things happen.

All true -- it is rather complex.

Other systems with a heap table access method based on a foundation of
2PL (Oracle, DB2) literally need a transactionally consistent FSM
structure. In fact I believe that Oracle literally requires the
equivalent of an MVCC snapshot read (a "consistent get") to be able to
access what seems like it ought to be strictly a physical data
structure correctly. Everything needs to work in the rollback path,
independent of whatever else may happen to the page before an xact
rolls back (i.e. independently of what other xacts might end up doing
with the page). This requires very tight coordination to avoid bugs
where a transaction cannot roll back due to not having enough free
space to restore the original tuple during UNDO.

I don't think it's desirable to have anything as delicate as that
here. But some rudimentary understanding of free space being
allocated/leased to certain transactions and/or backends does seem
like a good idea. There is some intrinsic value to these sorts of
behaviors, even in a system without any UNDO segments, where it is
never strictly necessary.

I think that the completely deterministic nature of the computation is
a mistake regardless of anything else. That serves to focus contention
rather than spreading it out, which is dumb, and would still be dumb
with any other number of FSM_CATEGORIES.

That's a part of the problem too, I guess.

The actual available free space on each page is literally changing all
the time, when measured at FSM_CATEGORIES-wise granularity -- which
leads to a mad dash among backends that all need the same amount of
free space for their new tuple. One reason why other systems pretty
much require coarse-grained increments of free space is the need to
manage the WAL overhead for a crash-safe FSM/free list structure.

Yeah. I'm not sure we're actually going to change that right now, but
I agree with the high-level point regardless, which I would summarize
like this: The current system provides more precision about available
free space than we actually need, while failing to provide some other
things that we really do need. We need not agree today on exactly what
those other things are or how best to get them in order to agree that
the current system has significant flaws, and we do agree that it
does.

I agree with this.

--
Peter Geoghegan

#89Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#88)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 18, 2024 at 12:15 PM Peter Geoghegan <pg@bowt.ie> wrote:

It's not okay if you fail to update the FSM a second time in the
second heap pass -- at least in some cases. It's reasonably frequent
for a page that has 0 usable free space when lazy_scan_prune returns
to go on to have almost BLCKSZ free space once lazy_vacuum_heap_page()
is done with it.

Oh, good point. That's an important subtlety I had missed.

That's a part of the problem too, I guess.

The actual available free space on each page is literally changing all
the time, when measured at FSM_CATEGORIES-wise granularity -- which
leads to a mad dash among backends that all need the same amount of
free space for their new tuple. One reason why other systems pretty
much require coarse-grained increments of free space is the need to
manage the WAL overhead for a crash-safe FSM/free list structure.

Interesting.

--
Robert Haas
EDB: http://www.enterprisedb.com

#90Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#83)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 18, 2024 at 10:09 AM Robert Haas <robertmhaas@gmail.com> wrote:

Oh, I see. Good catch.

I've now committed 0001.

I have now also committed 0002 and 0003. I made some modifications to
0003. Specifically:

- I moved has_lpdead_items inside the loop over blocks instead of
putting it at the function toplevel.
- I adjusted the comments in lazy_scan_prune() because it seemed to me
that the comment about "Now scan the page..." had gotten too far
separated from the loop where that happens.
- I combined two lines in an if-test because one of them was kinda short.

Hope that's OK with you.

--
Robert Haas
EDB: http://www.enterprisedb.com

#91Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#90)
1 attachment(s)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 18, 2024 at 3:20 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 18, 2024 at 10:09 AM Robert Haas <robertmhaas@gmail.com> wrote:

Oh, I see. Good catch.

I've now committed 0001.

I have now also committed 0002 and 0003. I made some modifications to
0003. Specifically:

- I moved has_lpdead_items inside the loop over blocks instead of
putting it at the function toplevel.
- I adjusted the comments in lazy_scan_prune() because it seemed to me
that the comment about "Now scan the page..." had gotten too far
separated from the loop where that happens.
- I combined two lines in an if-test because one of them was kinda short.

Hope that's OK with you.

Awesome, thanks!

I have attached a rebased version of the former 0004 as v11-0001.

- Melanie

Attachments:

v11-0001-Combine-FSM-updates-for-prune-and-no-prune-cases.patchtext/x-patch; charset=US-ASCII; name=v11-0001-Combine-FSM-updates-for-prune-and-no-prune-cases.patchDownload
From e8028cce937a2f2b3e5fccb43c6a20868a3d3314 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Jan 2024 17:28:07 -0500
Subject: [PATCH v11] Combine FSM updates for prune and no prune cases

lazy_scan_prune() and lazy_scan_noprune() update the freespace map with
identical conditions -- both with the goal of doing so once for each
page vacuumed. Combine both of their FSM updates. This consolidation is
easier now that cb970240f13df2 moved visibility map updates into
lazy_scan_prune().

While combining the FSM updates, simplify the logic for calling
lazy_scan_new_or_empty() and lazy_scan_noprune().

Discussion: https://postgr.es/m/CA%2BTgmoaLTvipm%3Dxx4rJLr07m908PCu%3DQH3uCjD1UOn8YaEuO2g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 89 +++++++++++-----------------
 1 file changed, 33 insertions(+), 56 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0fb3953513..f9ce555fde 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -838,6 +838,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		Page		page;
 		bool		all_visible_according_to_vm;
 		bool		has_lpdead_items;
+		bool		got_cleanup_lock = false;
 
 		if (blkno == next_unskippable_block)
 		{
@@ -940,54 +941,28 @@ lazy_scan_heap(LVRelState *vacrel)
 		buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
 								 vacrel->bstrategy);
 		page = BufferGetPage(buf);
-		if (!ConditionalLockBufferForCleanup(buf))
-		{
-			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-			/* Check for new or empty pages before lazy_scan_noprune call */
-			if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, true,
-									   vmbuffer))
-			{
-				/* Processed as new/empty page (lock and pin released) */
-				continue;
-			}
+		got_cleanup_lock = ConditionalLockBufferForCleanup(buf);
 
-			/*
-			 * Collect LP_DEAD items in dead_items array, count tuples,
-			 * determine if rel truncation is safe
-			 */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page, &has_lpdead_items))
-			{
-				Size		freespace = 0;
-				bool		recordfreespace;
+		if (!got_cleanup_lock)
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * We processed the page successfully (without a cleanup
-				 * lock).
-				 *
-				 * Update the FSM, just as we would in the case where
-				 * lazy_scan_prune() is called. Our goal is to update the
-				 * freespace map the last time we touch the page. If the
-				 * relation has no indexes, or if index vacuuming is disabled,
-				 * there will be no second heap pass; if this particular page
-				 * has no dead items, the second heap pass will not touch this
-				 * page. So, in those cases, update the FSM now.
-				 *
-				 * After a call to lazy_scan_prune(), we would also try to
-				 * adjust the page-level all-visible bit and the visibility
-				 * map, but we skip that step in this path.
-				 */
-				recordfreespace = vacrel->nindexes == 0
-					|| !vacrel->do_index_vacuuming
-					|| !has_lpdead_items;
-				if (recordfreespace)
-					freespace = PageGetHeapFreeSpace(page);
-				UnlockReleaseBuffer(buf);
-				if (recordfreespace)
-					RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-				continue;
-			}
+		/* Check for new or empty pages before lazy_scan_[no]prune call */
+		if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, !got_cleanup_lock,
+								   vmbuffer))
+		{
+			/* Processed as new/empty page (lock and pin released) */
+			continue;
+		}
 
+		/*
+		 * Collect LP_DEAD items in dead_items array, count tuples, determine
+		 * if rel truncation is safe. If lazy_scan_noprune() returns false, we
+		 * must get a cleanup lock and call lazy_scan_prune().
+		 */
+		if (!got_cleanup_lock &&
+			!lazy_scan_noprune(vacrel, buf, blkno, page, &has_lpdead_items))
+		{
 			/*
 			 * lazy_scan_noprune could not do all required processing.  Wait
 			 * for a cleanup lock, and call lazy_scan_prune in the usual way.
@@ -995,13 +970,7 @@ lazy_scan_heap(LVRelState *vacrel)
 			Assert(vacrel->aggressive);
 			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 			LockBufferForCleanup(buf);
-		}
-
-		/* Check for new or empty pages before lazy_scan_prune call */
-		if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, false, vmbuffer))
-		{
-			/* Processed as new/empty page (lock and pin released) */
-			continue;
+			got_cleanup_lock = true;
 		}
 
 		/*
@@ -1014,9 +983,10 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * tuple headers of remaining items with storage. It also determines
 		 * if truncating this block is safe.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page,
-						vmbuffer, all_visible_according_to_vm,
-						&has_lpdead_items);
+		if (got_cleanup_lock)
+			lazy_scan_prune(vacrel, buf, blkno, page,
+							vmbuffer, all_visible_according_to_vm,
+							&has_lpdead_items);
 
 		/*
 		 * Final steps for block: drop cleanup lock, record free space in the
@@ -1028,6 +998,12 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * space that lazy_vacuum_heap_page() will make available in cases
 		 * where it's possible to truncate the page's line pointer array.
 		 *
+		 * Our goal is to update the freespace map the last time we touch the
+		 * page. If the relation has no indexes, or if index vacuuming is
+		 * disabled, there will be no second heap pass; if this particular
+		 * page has no dead items, the second heap pass will not touch this
+		 * page. So, in those cases, update the FSM now.
+		 *
 		 * Note: It's not in fact 100% certain that we really will call
 		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
 		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
@@ -1047,9 +1023,10 @@ lazy_scan_heap(LVRelState *vacrel)
 			/*
 			 * Periodically perform FSM vacuuming to make newly-freed space
 			 * visible on upper FSM pages. This is done after vacuuming if the
-			 * table has indexes.
+			 * table has indexes. Space won't have been freed up if only
+			 * lazy_scan_noprune() was called, so check got_cleanup_lock.
 			 */
-			if (vacrel->nindexes == 0 && has_lpdead_items &&
+			if (got_cleanup_lock && vacrel->nindexes == 0 && has_lpdead_items &&
 				blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
 			{
 				FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-- 
2.37.2

#92Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#91)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 18, 2024 at 9:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I have attached a rebased version of the former 0004 as v11-0001.

This looks correct to me, although I wouldn't mind some more eyes on
it. However, I think that the comments still need more work.

Specifically:

         /*
          * Prune, freeze, and count tuples.
          *
          * Accumulates details of remaining LP_DEAD line pointers on page in
          * dead_items array.  This includes LP_DEAD line pointers that we
          * pruned ourselves, as well as existing LP_DEAD line pointers that
          * were pruned some time earlier.  Also considers freezing XIDs in the
          * tuple headers of remaining items with storage. It also determines
          * if truncating this block is safe.
          */
-        lazy_scan_prune(vacrel, buf, blkno, page,
-                        vmbuffer, all_visible_according_to_vm,
-                        &has_lpdead_items);
+        if (got_cleanup_lock)
+            lazy_scan_prune(vacrel, buf, blkno, page,
+                            vmbuffer, all_visible_according_to_vm,
+                            &has_lpdead_items);

I think this comment needs adjusting. Possibly, the preceding calls to
lazy_scan_noprune() and lazy_scan_new_or_empty() could even use a bit
better comments, but in those cases, you're basically keeping the same
code with the same comment, so it's kinda defensible. Here, though,
you're making the call conditional without any comment update.

         /*
          * Final steps for block: drop cleanup lock, record free space in the
          * FSM.
          *
          * If we will likely do index vacuuming, wait until
          * lazy_vacuum_heap_rel() to save free space. This doesn't just save
          * us some cycles; it also allows us to record any additional free
          * space that lazy_vacuum_heap_page() will make available in cases
          * where it's possible to truncate the page's line pointer array.
          *
+         * Our goal is to update the freespace map the last time we touch the
+         * page. If the relation has no indexes, or if index vacuuming is
+         * disabled, there will be no second heap pass; if this particular
+         * page has no dead items, the second heap pass will not touch this
+         * page. So, in those cases, update the FSM now.
+         *
          * Note: It's not in fact 100% certain that we really will call
          * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
          * vacuuming (and so must skip heap vacuuming).  This is deemed okay
          * because it only happens in emergencies, or when there is very
          * little free space anyway. (Besides, we start recording free space
          * in the FSM once index vacuuming has been abandoned.)
          */

I think this comment needs a rewrite, not just sticking the other
comment in the middle of it. There's some duplication between these
two comments, and merging it all together should iron that out.
Personally, I think my comment (which was there before, this commit
only moves it here) is clearer than what's already here about the
intent, but it's lacking some details that are captured in the other
two paragraphs, and we probably don't want to lose those details.

If you'd like, I can try rewriting these comments to my satisfaction
and you can reverse-review the result. Or you can rewrite them and
I'll re-review the result. But I think this needs to be a little less
mechanical. It's not just about shuffling all the comments around so
that all the text ends up somewhere -- we also need to consider the
degree to which the meaning becomes duplicated when it all gets merged
together.

--
Robert Haas
EDB: http://www.enterprisedb.com

#93Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#92)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 24, 2024 at 2:59 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 18, 2024 at 9:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I have attached a rebased version of the former 0004 as v11-0001.

This looks correct to me, although I wouldn't mind some more eyes on
it. However, I think that the comments still need more work.

Specifically:

/*
* Prune, freeze, and count tuples.
*
* Accumulates details of remaining LP_DEAD line pointers on page in
* dead_items array.  This includes LP_DEAD line pointers that we
* pruned ourselves, as well as existing LP_DEAD line pointers that
* were pruned some time earlier.  Also considers freezing XIDs in the
* tuple headers of remaining items with storage. It also determines
* if truncating this block is safe.
*/
-        lazy_scan_prune(vacrel, buf, blkno, page,
-                        vmbuffer, all_visible_according_to_vm,
-                        &has_lpdead_items);
+        if (got_cleanup_lock)
+            lazy_scan_prune(vacrel, buf, blkno, page,
+                            vmbuffer, all_visible_according_to_vm,
+                            &has_lpdead_items);

I think this comment needs adjusting. Possibly, the preceding calls to
lazy_scan_noprune() and lazy_scan_new_or_empty() could even use a bit
better comments, but in those cases, you're basically keeping the same
code with the same comment, so it's kinda defensible. Here, though,
you're making the call conditional without any comment update.

/*
* Final steps for block: drop cleanup lock, record free space in the
* FSM.
*
* If we will likely do index vacuuming, wait until
* lazy_vacuum_heap_rel() to save free space. This doesn't just save
* us some cycles; it also allows us to record any additional free
* space that lazy_vacuum_heap_page() will make available in cases
* where it's possible to truncate the page's line pointer array.
*
+         * Our goal is to update the freespace map the last time we touch the
+         * page. If the relation has no indexes, or if index vacuuming is
+         * disabled, there will be no second heap pass; if this particular
+         * page has no dead items, the second heap pass will not touch this
+         * page. So, in those cases, update the FSM now.
+         *
* Note: It's not in fact 100% certain that we really will call
* lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
* vacuuming (and so must skip heap vacuuming).  This is deemed okay
* because it only happens in emergencies, or when there is very
* little free space anyway. (Besides, we start recording free space
* in the FSM once index vacuuming has been abandoned.)
*/

I think this comment needs a rewrite, not just sticking the other
comment in the middle of it. There's some duplication between these
two comments, and merging it all together should iron that out.
Personally, I think my comment (which was there before, this commit
only moves it here) is clearer than what's already here about the
intent, but it's lacking some details that are captured in the other
two paragraphs, and we probably don't want to lose those details.

If you'd like, I can try rewriting these comments to my satisfaction
and you can reverse-review the result. Or you can rewrite them and
I'll re-review the result. But I think this needs to be a little less
mechanical. It's not just about shuffling all the comments around so
that all the text ends up somewhere -- we also need to consider the
degree to which the meaning becomes duplicated when it all gets merged
together.

I will take a stab at rewriting the comments myself first. Usually, I
try to avoid changing comments if the code isn't functionally
different because I know it adds additional review overhead and I try
to reduce that to an absolute minimum. However, I see what you are
saying and agree that it would be better to have actually good
comments instead of frankenstein comments made up of parts that were
previously considered acceptable. I'll have a new version ready by
tomorrow.

- Melanie

#94Melanie Plageman
melanieplageman@gmail.com
In reply to: Melanie Plageman (#93)
1 attachment(s)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 24, 2024 at 4:34 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

On Wed, Jan 24, 2024 at 2:59 PM Robert Haas <robertmhaas@gmail.com> wrote:

...

If you'd like, I can try rewriting these comments to my satisfaction
and you can reverse-review the result. Or you can rewrite them and
I'll re-review the result. But I think this needs to be a little less
mechanical. It's not just about shuffling all the comments around so
that all the text ends up somewhere -- we also need to consider the
degree to which the meaning becomes duplicated when it all gets merged
together.

I will take a stab at rewriting the comments myself first. Usually, I
try to avoid changing comments if the code isn't functionally
different because I know it adds additional review overhead and I try
to reduce that to an absolute minimum. However, I see what you are
saying and agree that it would be better to have actually good
comments instead of frankenstein comments made up of parts that were
previously considered acceptable. I'll have a new version ready by
tomorrow.

v12 attached has my attempt at writing better comments for this
section of lazy_scan_heap().

Above the combined FSM update code, I have written a comment that is a
revised version of your comment from above the lazy_scan_noprune() FSM
update code but with some of the additional details from the previous
comment above the lazy_scan_pruen() FSM update code.

The one part that I did not incorporate was the point about how
sometimes we think we'll do a second pass on the block so we don't
update the FSM but then we end up not doing it but it's all okay.

* Note: It's not in fact 100% certain that we really will call
* lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
* vacuuming (and so must skip heap vacuuming). This is deemed okay
* because it only happens in emergencies, or when there is very
* little free space anyway. (Besides, we start recording free space
* in the FSM once index vacuuming has been abandoned.)

I didn't incorporate it because I wasn't sure I understood the
situation. I can imagine us skipping updating the FSM after
lazy_scan_prune() because there are indexes on the relation and dead
items on the page and we think we'll do a second pass. Later, we end
up triggering a failsafe vacuum or, somehow, there are still too few
TIDs for the second pass, so we update do_index_vacuuming to false.
Then we wouldn't ever record this block's free space in the FSM. That
seems fine (which is what the comment says). So, what does the last
sentence mean? "Besides, we start recording..."

- Melanie

Attachments:

v12-0001-Combine-FSM-updates-for-prune-and-no-prune-cases.patchtext/x-patch; charset=US-ASCII; name=v12-0001-Combine-FSM-updates-for-prune-and-no-prune-cases.patchDownload
From fabf79ed6f5866634a324f740e2eb0a5530b3286 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 16 Jan 2024 17:28:07 -0500
Subject: [PATCH v12] Combine FSM updates for prune and no prune cases

lazy_scan_prune() and lazy_scan_noprune() update the freespace map with
identical conditions -- both with the goal of doing so once for each
page vacuumed. Combine both of their FSM updates. This consolidation is
easier now that cb970240f13df2 moved visibility map updates into
lazy_scan_prune().

While combining the FSM updates, simplify the logic for calling
lazy_scan_new_or_empty() and lazy_scan_noprune().

Discussion: https://postgr.es/m/CA%2BTgmoaLTvipm%3Dxx4rJLr07m908PCu%3DQH3uCjD1UOn8YaEuO2g%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 133 +++++++++++----------------
 1 file changed, 52 insertions(+), 81 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0fb3953513d..432db4698bc 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -838,6 +838,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		Page		page;
 		bool		all_visible_according_to_vm;
 		bool		has_lpdead_items;
+		bool		got_cleanup_lock = false;
 
 		if (blkno == next_unskippable_block)
 		{
@@ -931,63 +932,41 @@ lazy_scan_heap(LVRelState *vacrel)
 		 */
 		visibilitymap_pin(vacrel->rel, blkno, &vmbuffer);
 
+		buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
+								 vacrel->bstrategy);
+		page = BufferGetPage(buf);
+
 		/*
 		 * We need a buffer cleanup lock to prune HOT chains and defragment
 		 * the page in lazy_scan_prune.  But when it's not possible to acquire
 		 * a cleanup lock right away, we may be able to settle for reduced
 		 * processing using lazy_scan_noprune.
 		 */
-		buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
-								 vacrel->bstrategy);
-		page = BufferGetPage(buf);
-		if (!ConditionalLockBufferForCleanup(buf))
-		{
-			LockBuffer(buf, BUFFER_LOCK_SHARE);
+		got_cleanup_lock = ConditionalLockBufferForCleanup(buf);
 
-			/* Check for new or empty pages before lazy_scan_noprune call */
-			if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, true,
-									   vmbuffer))
-			{
-				/* Processed as new/empty page (lock and pin released) */
-				continue;
-			}
-
-			/*
-			 * Collect LP_DEAD items in dead_items array, count tuples,
-			 * determine if rel truncation is safe
-			 */
-			if (lazy_scan_noprune(vacrel, buf, blkno, page, &has_lpdead_items))
-			{
-				Size		freespace = 0;
-				bool		recordfreespace;
+		if (!got_cleanup_lock)
+			LockBuffer(buf, BUFFER_LOCK_SHARE);
 
-				/*
-				 * We processed the page successfully (without a cleanup
-				 * lock).
-				 *
-				 * Update the FSM, just as we would in the case where
-				 * lazy_scan_prune() is called. Our goal is to update the
-				 * freespace map the last time we touch the page. If the
-				 * relation has no indexes, or if index vacuuming is disabled,
-				 * there will be no second heap pass; if this particular page
-				 * has no dead items, the second heap pass will not touch this
-				 * page. So, in those cases, update the FSM now.
-				 *
-				 * After a call to lazy_scan_prune(), we would also try to
-				 * adjust the page-level all-visible bit and the visibility
-				 * map, but we skip that step in this path.
-				 */
-				recordfreespace = vacrel->nindexes == 0
-					|| !vacrel->do_index_vacuuming
-					|| !has_lpdead_items;
-				if (recordfreespace)
-					freespace = PageGetHeapFreeSpace(page);
-				UnlockReleaseBuffer(buf);
-				if (recordfreespace)
-					RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
-				continue;
-			}
+		/* Check for new or empty pages before lazy_scan_[no]prune call */
+		if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, !got_cleanup_lock,
+								   vmbuffer))
+		{
+			/* Processed as new/empty page (lock and pin released) */
+			continue;
+		}
 
+		/*
+		 * If we didn't get the cleanup lock and the page is not new or empty,
+		 * we can still collect LP_DEAD items in the dead_items array for
+		 * later vacuuming, count live and recently dead tuples for vacuum
+		 * logging, and determine if this block could later be truncated. If
+		 * we encounter any xid/mxids that require advancing the
+		 * relfrozenxid/relminxid, we'll have to wait for a cleanup lock and
+		 * call lazy_scan_prune().
+		 */
+		if (!got_cleanup_lock &&
+			!lazy_scan_noprune(vacrel, buf, blkno, page, &has_lpdead_items))
+		{
 			/*
 			 * lazy_scan_noprune could not do all required processing.  Wait
 			 * for a cleanup lock, and call lazy_scan_prune in the usual way.
@@ -995,45 +974,36 @@ lazy_scan_heap(LVRelState *vacrel)
 			Assert(vacrel->aggressive);
 			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 			LockBufferForCleanup(buf);
-		}
-
-		/* Check for new or empty pages before lazy_scan_prune call */
-		if (lazy_scan_new_or_empty(vacrel, buf, blkno, page, false, vmbuffer))
-		{
-			/* Processed as new/empty page (lock and pin released) */
-			continue;
+			got_cleanup_lock = true;
 		}
 
 		/*
-		 * Prune, freeze, and count tuples.
+		 * If we got a cleanup lock, we can prune and freeze tuples and
+		 * defragment the page. If we didn't get a cleanup lock, we will still
+		 * consider whether or not to update the FSM.
 		 *
-		 * Accumulates details of remaining LP_DEAD line pointers on page in
-		 * dead_items array.  This includes LP_DEAD line pointers that we
-		 * pruned ourselves, as well as existing LP_DEAD line pointers that
-		 * were pruned some time earlier.  Also considers freezing XIDs in the
-		 * tuple headers of remaining items with storage. It also determines
-		 * if truncating this block is safe.
+		 * Like lazy_scan_noprune(), lazy_scan_prune() will count
+		 * recently_dead_tuples and live tuples for vacuum logging, determine
+		 * if the block can later be truncated, and accumulate the details of
+		 * remaining LP_DEAD line pointers on the page in the dead_items
+		 * array. These dead items include those pruned by lazy_scan_prune()
+		 * as well we line pointers previously marked LP_DEAD.
 		 */
-		lazy_scan_prune(vacrel, buf, blkno, page,
-						vmbuffer, all_visible_according_to_vm,
-						&has_lpdead_items);
+		if (got_cleanup_lock)
+			lazy_scan_prune(vacrel, buf, blkno, page,
+							vmbuffer, all_visible_according_to_vm,
+							&has_lpdead_items);
 
 		/*
-		 * Final steps for block: drop cleanup lock, record free space in the
-		 * FSM.
-		 *
-		 * If we will likely do index vacuuming, wait until
-		 * lazy_vacuum_heap_rel() to save free space. This doesn't just save
-		 * us some cycles; it also allows us to record any additional free
-		 * space that lazy_vacuum_heap_page() will make available in cases
-		 * where it's possible to truncate the page's line pointer array.
+		 * Now drop the buffer lock and, potentially, update the FSM.
 		 *
-		 * Note: It's not in fact 100% certain that we really will call
-		 * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
-		 * vacuuming (and so must skip heap vacuuming).  This is deemed okay
-		 * because it only happens in emergencies, or when there is very
-		 * little free space anyway. (Besides, we start recording free space
-		 * in the FSM once index vacuuming has been abandoned.)
+		 * Our goal is to update the freespace map the last time we touch the
+		 * page. If we'll process a block in the second pass, we may free up
+		 * additional space on the page, so it is better to update the FSM
+		 * after the second pass. If the relation has no indexes, or if index
+		 * vacuuming is disabled, there will be no second heap pass; if this
+		 * particular page has no dead items, the second heap pass will not
+		 * touch this page. So, in those cases, update the FSM now.
 		 */
 		if (vacrel->nindexes == 0
 			|| !vacrel->do_index_vacuuming
@@ -1047,9 +1017,10 @@ lazy_scan_heap(LVRelState *vacrel)
 			/*
 			 * Periodically perform FSM vacuuming to make newly-freed space
 			 * visible on upper FSM pages. This is done after vacuuming if the
-			 * table has indexes.
+			 * table has indexes. There will only be newly-freed space if we
+			 * held the cleanup lock and lazy_scan_prune() was called.
 			 */
-			if (vacrel->nindexes == 0 && has_lpdead_items &&
+			if (got_cleanup_lock && vacrel->nindexes == 0 && has_lpdead_items &&
 				blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
 			{
 				FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-- 
2.37.2

In reply to: Melanie Plageman (#94)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 24, 2024 at 9:13 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

I didn't incorporate it because I wasn't sure I understood the
situation. I can imagine us skipping updating the FSM after
lazy_scan_prune() because there are indexes on the relation and dead
items on the page and we think we'll do a second pass. Later, we end
up triggering a failsafe vacuum or, somehow, there are still too few
TIDs for the second pass, so we update do_index_vacuuming to false.
Then we wouldn't ever record this block's free space in the FSM. That
seems fine (which is what the comment says). So, what does the last
sentence mean? "Besides, we start recording..."

It means: when the failsafe kicks in, from that point on we won't do
any more heap vacuuming. Clearly any pages that still need to be
scanned at that point won't ever be processed by
lazy_vacuum_heap_rel(). So from that point on we should record the
free space in every scanned heap page in the "first heap pass" --
including pages that have LP_DEAD stubs that aren't going to be made
LP_UNUSED in the ongoing VACUUM.

--
Peter Geoghegan

#96Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#94)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Wed, Jan 24, 2024 at 9:13 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

v12 attached has my attempt at writing better comments for this
section of lazy_scan_heap().

+ /*
+ * If we didn't get the cleanup lock and the page is not new or empty,
+ * we can still collect LP_DEAD items in the dead_items array for
+ * later vacuuming, count live and recently dead tuples for vacuum
+ * logging, and determine if this block could later be truncated. If
+ * we encounter any xid/mxids that require advancing the
+ * relfrozenxid/relminxid, we'll have to wait for a cleanup lock and
+ * call lazy_scan_prune().
+ */

I like this comment. I would probably drop "and the page is not new or
empty" from it since that's really speaking to the previous bit of
code, but it wouldn't be the end of the world to keep it, either.

  /*
- * Prune, freeze, and count tuples.
+ * If we got a cleanup lock, we can prune and freeze tuples and
+ * defragment the page. If we didn't get a cleanup lock, we will still
+ * consider whether or not to update the FSM.
  *
- * Accumulates details of remaining LP_DEAD line pointers on page in
- * dead_items array.  This includes LP_DEAD line pointers that we
- * pruned ourselves, as well as existing LP_DEAD line pointers that
- * were pruned some time earlier.  Also considers freezing XIDs in the
- * tuple headers of remaining items with storage. It also determines
- * if truncating this block is safe.
+ * Like lazy_scan_noprune(), lazy_scan_prune() will count
+ * recently_dead_tuples and live tuples for vacuum logging, determine
+ * if the block can later be truncated, and accumulate the details of
+ * remaining LP_DEAD line pointers on the page in the dead_items
+ * array. These dead items include those pruned by lazy_scan_prune()
+ * as well we line pointers previously marked LP_DEAD.
  */

To me, the first paragraph of this one misses the mark. What I thought
we should be saying here was something like "If we don't have a
cleanup lock, the code above has already processed this page to the
extent that is possible. Otherwise, we either got the cleanup lock
initially and have not processed the page yet, or we didn't get it
initially, attempted to process it without the cleanup lock, and
decided we needed one after all. Either way, if we now have the lock,
we must prune, freeze, and count tuples."

The second paragraph seems fine.

- * Note: It's not in fact 100% certain that we really will call
- * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
- * vacuuming (and so must skip heap vacuuming). This is deemed okay
- * because it only happens in emergencies, or when there is very
- * little free space anyway. (Besides, we start recording free space
- * in the FSM once index vacuuming has been abandoned.)

Here's a suggestion from me:

Note: In corner cases, it's possible to miss updating the FSM
entirely. If index vacuuming is currently enabled, we'll skip the FSM
update now. But if failsafe mode is later activated, disabling index
vacuuming, there will also be no opportunity to update the FSM later,
because we'll never revisit this page. Since updating the FSM is
desirable but not absolutely required, that's OK.

I think this expresses the same sentiment as the current comment, but
IMHO more clearly. The one part of the current comment that I don't
understand at all is the remark about "when there is very little
freespace anyway". I get that if the failsafe activates we won't come
back to the page, which is the "only happens in emergencies" part of
the existing comment. But the current phrasing makes it sound like
there is a second case where it can happen -- "when there is very
little free space anyway" -- and I don't know what that is talking
about. If it's important, we should try to make it clearer.

We could also just decide to keep this entire paragraph as it is for
purposes of the present patch. The part I really thought needed
adjusting was "Prune, freeze, and count tuples."

--
Robert Haas
EDB: http://www.enterprisedb.com

#97Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#96)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 25, 2024 at 8:57 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 24, 2024 at 9:13 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

v12 attached has my attempt at writing better comments for this
section of lazy_scan_heap().

+ /*
+ * If we didn't get the cleanup lock and the page is not new or empty,
+ * we can still collect LP_DEAD items in the dead_items array for
+ * later vacuuming, count live and recently dead tuples for vacuum
+ * logging, and determine if this block could later be truncated. If
+ * we encounter any xid/mxids that require advancing the
+ * relfrozenxid/relminxid, we'll have to wait for a cleanup lock and
+ * call lazy_scan_prune().
+ */

I like this comment. I would probably drop "and the page is not new or
empty" from it since that's really speaking to the previous bit of
code, but it wouldn't be the end of the world to keep it, either.

Yes, probably best to get rid of the part about new or empty.

/*
- * Prune, freeze, and count tuples.
+ * If we got a cleanup lock, we can prune and freeze tuples and
+ * defragment the page. If we didn't get a cleanup lock, we will still
+ * consider whether or not to update the FSM.
*
- * Accumulates details of remaining LP_DEAD line pointers on page in
- * dead_items array.  This includes LP_DEAD line pointers that we
- * pruned ourselves, as well as existing LP_DEAD line pointers that
- * were pruned some time earlier.  Also considers freezing XIDs in the
- * tuple headers of remaining items with storage. It also determines
- * if truncating this block is safe.
+ * Like lazy_scan_noprune(), lazy_scan_prune() will count
+ * recently_dead_tuples and live tuples for vacuum logging, determine
+ * if the block can later be truncated, and accumulate the details of
+ * remaining LP_DEAD line pointers on the page in the dead_items
+ * array. These dead items include those pruned by lazy_scan_prune()
+ * as well we line pointers previously marked LP_DEAD.
*/

To me, the first paragraph of this one misses the mark. What I thought
we should be saying here was something like "If we don't have a
cleanup lock, the code above has already processed this page to the
extent that is possible. Otherwise, we either got the cleanup lock
initially and have not processed the page yet, or we didn't get it
initially, attempted to process it without the cleanup lock, and
decided we needed one after all. Either way, if we now have the lock,
we must prune, freeze, and count tuples."

I see. Your suggestion makes sense. The sentence starting with
"Otherwise" is a bit long. I started to lose the thread at "decided we
needed one after all". You previously referred to the cleanup lock as
"it" -- once you start referring to it as "one", I as the future
developer am no longer sure we are talking about the cleanup lock (as
opposed to the page or something else).

- * Note: It's not in fact 100% certain that we really will call
- * lazy_vacuum_heap_rel() -- lazy_vacuum() might yet opt to skip index
- * vacuuming (and so must skip heap vacuuming). This is deemed okay
- * because it only happens in emergencies, or when there is very
- * little free space anyway. (Besides, we start recording free space
- * in the FSM once index vacuuming has been abandoned.)

Here's a suggestion from me:

Note: In corner cases, it's possible to miss updating the FSM
entirely. If index vacuuming is currently enabled, we'll skip the FSM
update now. But if failsafe mode is later activated, disabling index
vacuuming, there will also be no opportunity to update the FSM later,
because we'll never revisit this page. Since updating the FSM is
desirable but not absolutely required, that's OK.

I think this expresses the same sentiment as the current comment, but
IMHO more clearly. The one part of the current comment that I don't
understand at all is the remark about "when there is very little
freespace anyway". I get that if the failsafe activates we won't come
back to the page, which is the "only happens in emergencies" part of
the existing comment. But the current phrasing makes it sound like
there is a second case where it can happen -- "when there is very
little free space anyway" -- and I don't know what that is talking
about. If it's important, we should try to make it clearer.

We could also just decide to keep this entire paragraph as it is for
purposes of the present patch. The part I really thought needed
adjusting was "Prune, freeze, and count tuples."

I think it would be nice to clarify this comment. I think the "when
there is little free space anyway" is referring to the case in
lazy_vacuum() where we set do_index_vacuuming to false because "there
are almost zero TIDs". I initially thought it was saying that in the
failsafe vacuum case the pages whose free space we wouldn't record in
the FSM have little free space anyway -- which I didn't get. But then
I looked at where we set do_index_vacuuming to false.

As for the last sentence starting with "Besides", even with Peter's
explanation I still am not sure what it should say. There are blocks
whose free space we don't record in the first heap pass. Then, due to
skipping index vacuuming and the second heap pass, we also don't
record their free space in the second heap pass. I think he is saying
that once we set do_index_vacuuming to false, we will stop skipping
updating the FSM after the first pass for future blocks. So, future
blocks will have their free space recorded in the FSM. But that feels
self-evident. The more salient point is that there are some blocks
whose free space is not recorded (those whose first pass happened
before unsetting do_index_vacuuming and whose second pass did not
happen before do_index_vacuuming is unset). The extra sentence made me
think there was some way we might go back and record free space for
those blocks, but that is not true.

- Melanie

#98Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#97)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 25, 2024 at 9:18 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

To me, the first paragraph of this one misses the mark. What I thought
we should be saying here was something like "If we don't have a
cleanup lock, the code above has already processed this page to the
extent that is possible. Otherwise, we either got the cleanup lock
initially and have not processed the page yet, or we didn't get it
initially, attempted to process it without the cleanup lock, and
decided we needed one after all. Either way, if we now have the lock,
we must prune, freeze, and count tuples."

I see. Your suggestion makes sense. The sentence starting with
"Otherwise" is a bit long. I started to lose the thread at "decided we
needed one after all". You previously referred to the cleanup lock as
"it" -- once you start referring to it as "one", I as the future
developer am no longer sure we are talking about the cleanup lock (as
opposed to the page or something else).

Ok... trying again:

If we have a cleanup lock, we must now prune, freeze, and count
tuples. We may have acquired the cleanup lock originally, or we may
have gone back and acquired it after lazy_scan_noprune() returned
false. Either way, the page hasn't been processed yet.

I think it would be nice to clarify this comment. I think the "when
there is little free space anyway" is referring to the case in
lazy_vacuum() where we set do_index_vacuuming to false because "there
are almost zero TIDs". I initially thought it was saying that in the
failsafe vacuum case the pages whose free space we wouldn't record in
the FSM have little free space anyway -- which I didn't get. But then
I looked at where we set do_index_vacuuming to false.

Oh... wow. That's kind of confusing; somehow I was thinking we were
talking about free space on the disk, rather than newly free space in
pages that could be added to the FSM. And it seems really questionable
whether that case is OK. I mean, in the emergency case, fine,
whatever, we should do whatever it takes to get the system back up,
and it should barely ever happen on a well-configured system. But this
case could happen regularly, and losing track of free space could
easily cause bloat.

This might be another argument for moving FSM updates to the first
heap pass, but that's a separate task from fixing the comment.

As for the last sentence starting with "Besides", even with Peter's
explanation I still am not sure what it should say. There are blocks
whose free space we don't record in the first heap pass. Then, due to
skipping index vacuuming and the second heap pass, we also don't
record their free space in the second heap pass. I think he is saying
that once we set do_index_vacuuming to false, we will stop skipping
updating the FSM after the first pass for future blocks. So, future
blocks will have their free space recorded in the FSM. But that feels
self-evident.

Yes, I don't think that necessarily needs to be mentioned here.

The more salient point is that there are some blocks
whose free space is not recorded (those whose first pass happened
before unsetting do_index_vacuuming and whose second pass did not
happen before do_index_vacuuming is unset). The extra sentence made me
think there was some way we might go back and record free space for
those blocks, but that is not true.

I don't really see why that sentence made you think that, but it's not
important. I agree with you about what point we need to emphasize
here.

--
Robert Haas
EDB: http://www.enterprisedb.com

#99Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#98)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 25, 2024 at 10:19 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 25, 2024 at 9:18 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

To me, the first paragraph of this one misses the mark. What I thought
we should be saying here was something like "If we don't have a
cleanup lock, the code above has already processed this page to the
extent that is possible. Otherwise, we either got the cleanup lock
initially and have not processed the page yet, or we didn't get it
initially, attempted to process it without the cleanup lock, and
decided we needed one after all. Either way, if we now have the lock,
we must prune, freeze, and count tuples."

I see. Your suggestion makes sense. The sentence starting with
"Otherwise" is a bit long. I started to lose the thread at "decided we
needed one after all". You previously referred to the cleanup lock as
"it" -- once you start referring to it as "one", I as the future
developer am no longer sure we are talking about the cleanup lock (as
opposed to the page or something else).

Ok... trying again:

If we have a cleanup lock, we must now prune, freeze, and count
tuples. We may have acquired the cleanup lock originally, or we may
have gone back and acquired it after lazy_scan_noprune() returned
false. Either way, the page hasn't been processed yet.

Cool. I might add "successfully" or "fully" to "Either way, the page
hasn't been processed yet"

I think it would be nice to clarify this comment. I think the "when
there is little free space anyway" is referring to the case in
lazy_vacuum() where we set do_index_vacuuming to false because "there
are almost zero TIDs". I initially thought it was saying that in the
failsafe vacuum case the pages whose free space we wouldn't record in
the FSM have little free space anyway -- which I didn't get. But then
I looked at where we set do_index_vacuuming to false.

Oh... wow. That's kind of confusing; somehow I was thinking we were
talking about free space on the disk, rather than newly free space in
pages that could be added to the FSM.

Perhaps I misunderstood. I interpreted it to refer to the bypass optimization.

And it seems really questionable
whether that case is OK. I mean, in the emergency case, fine,
whatever, we should do whatever it takes to get the system back up,
and it should barely ever happen on a well-configured system. But this
case could happen regularly, and losing track of free space could
easily cause bloat.

This might be another argument for moving FSM updates to the first
heap pass, but that's a separate task from fixing the comment.

Yes, it seems we could miss recording space freed in the first pass if
we never end up doing a second pass. consider_bypass_optimization is
set to false only if index cleanup is explicitly enabled or there are
dead items accumulated for vacuum's second pass at some point.

- Melanie

#100Robert Haas
robertmhaas@gmail.com
In reply to: Melanie Plageman (#99)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 25, 2024 at 11:19 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:

Cool. I might add "successfully" or "fully" to "Either way, the page
hasn't been processed yet"

I'm OK with that.

I think it would be nice to clarify this comment. I think the "when
there is little free space anyway" is referring to the case in
lazy_vacuum() where we set do_index_vacuuming to false because "there
are almost zero TIDs". I initially thought it was saying that in the
failsafe vacuum case the pages whose free space we wouldn't record in
the FSM have little free space anyway -- which I didn't get. But then
I looked at where we set do_index_vacuuming to false.

Oh... wow. That's kind of confusing; somehow I was thinking we were
talking about free space on the disk, rather than newly free space in
pages that could be added to the FSM.

Perhaps I misunderstood. I interpreted it to refer to the bypass optimization.

I think you're probably correct. I just didn't realize what was meant.

--
Robert Haas
EDB: http://www.enterprisedb.com

#101Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#100)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Thu, Jan 25, 2024 at 12:25 PM Robert Haas <robertmhaas@gmail.com> wrote:

I think you're probably correct. I just didn't realize what was meant.

I tweaked your v12 based on this discussion and committed the result.

Thanks to you for the patches, and to Peter for participating in the
discussion which, IMHO, was very helpful in clarifying things.

--
Robert Haas
EDB: http://www.enterprisedb.com

#102Melanie Plageman
melanieplageman@gmail.com
In reply to: Robert Haas (#101)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 26, 2024 at 11:44 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 25, 2024 at 12:25 PM Robert Haas <robertmhaas@gmail.com> wrote:

I think you're probably correct. I just didn't realize what was meant.

I tweaked your v12 based on this discussion and committed the result.

Thanks to you for the patches, and to Peter for participating in the
discussion which, IMHO, was very helpful in clarifying things.

Thanks! I've marked the CF entry as committed.

- Melanie

In reply to: Robert Haas (#101)
Re: Emit fewer vacuum records by reaping removable tuples during pruning

On Fri, Jan 26, 2024 at 11:44 AM Robert Haas <robertmhaas@gmail.com> wrote:

Thanks to you for the patches, and to Peter for participating in the
discussion which, IMHO, was very helpful in clarifying things.

Glad I could help.

--
Peter Geoghegan