Rename dead_tuples to dead_items in vacuumlazy.c

Started by Peter Geogheganabout 4 years ago16 messages

pg@bowt.ie

about 4 years ago

1 attachment(s)

Attached patch performs polishing within vacuumlazy.c, as follow-up
work to the refactoring work in Postgres 14. This mainly consists of
changing references of dead tuples to dead items, which reflects the
fact that VACUUM no longer deals with TIDs that might point to
remaining heap tuples with storage -- the TIDs in the array must now
strictly point to LP_DEAD stub line pointers that remain in the heap,
following pruning.

I've also simplified header comments, and comments above the main
entry point functions. These comments made much more sense back when
lazy_scan_heap() was simpler, but wasn't yet broken up into smaller,
better-scoped functions.

If there are no objections, I'll move on this soon. It's mostly just
mechanical changes.

--
Peter Geoghegan

Attachments:

v1-0001-vacuumlazy.c-Rename-dead_tuples-to-dead_items.patchapplication/octet-stream; name=v1-0001-vacuumlazy.c-Rename-dead_tuples-to-dead_items.patchDownload

From 55103a788fda6cc69b1452882bae9aefb510bfcc Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Tue, 23 Nov 2021 20:11:30 -0800
Subject: [PATCH v1] vacuumlazy.c: Rename dead_tuples to dead_items.

---
 src/include/commands/progress.h      |   4 +-
 src/backend/access/heap/vacuumlazy.c | 304 ++++++++++++---------------
 2 files changed, 142 insertions(+), 166 deletions(-)

diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index d7bf16368..b5358e5a0 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -23,8 +23,8 @@
 #define PROGRESS_VACUUM_HEAP_BLKS_SCANNED		2
 #define PROGRESS_VACUUM_HEAP_BLKS_VACUUMED		3
 #define PROGRESS_VACUUM_NUM_INDEX_VACUUMS		4
-#define PROGRESS_VACUUM_MAX_DEAD_TUPLES			5
-#define PROGRESS_VACUUM_NUM_DEAD_TUPLES			6
+#define PROGRESS_VACUUM_MAX_DEAD_ITEMS			5
+#define PROGRESS_VACUUM_NUM_DEAD_ITEMS			6
 
 /* Phases of vacuum (as advertised via PROGRESS_VACUUM_PHASE) */
 #define PROGRESS_VACUUM_PHASE_SCAN_HEAP			1
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 88b9d1f41..95d328678 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3,40 +3,19 @@
  * vacuumlazy.c
  *	  Concurrent ("lazy") vacuuming.
  *
- *
- * The major space usage for LAZY VACUUM is storage for the array of dead tuple
- * TIDs.  We want to ensure we can vacuum even the very largest relations with
- * finite memory space usage.  To do that, we set upper bounds on the number of
- * tuples we will keep track of at once.
+ * The major space usage for LAZY VACUUM is storage for the array of dead TIDs
+ * that are to be removed from indexes.  We want to ensure we can vacuum even
+ * the very largest relations with finite memory space usage.  To do that, we
+ * set upper bounds on the number of tuples we will keep track of at once.
  *
  * We are willing to use at most maintenance_work_mem (or perhaps
- * autovacuum_work_mem) memory space to keep track of dead tuples.  We
- * initially allocate an array of TIDs of that size, with an upper limit that
- * depends on table size (this limit ensures we don't allocate a huge area
- * uselessly for vacuuming small tables).  If the array threatens to overflow,
- * we suspend the heap scan phase and perform a pass of index cleanup and page
+ * autovacuum_work_mem) memory space to keep track of dead TIDs.  We initially
+ * allocate an array of TIDs of that size, with an upper limit that depends on
+ * table size (this limit ensures we don't allocate a huge area uselessly for
+ * vacuuming small tables).  If the array threatens to overflow, we suspend
+ * the heap scan phase and perform a pass of index cleanup and page
  * compaction, then resume the heap scan with an empty TID array.
  *
- * If we're processing a table with no indexes, we can just vacuum each page
- * as we go; there's no need to save up multiple tuples to minimize the number
- * of index scans performed.  So we don't use maintenance_work_mem memory for
- * the TID array, just enough to hold as many heap tuples as fit on one page.
- *
- * Lazy vacuum supports parallel execution with parallel worker processes.  In
- * a parallel vacuum, we perform both index vacuum and index cleanup with
- * parallel worker processes.  Individual indexes are processed by one vacuum
- * process.  At the beginning of a lazy vacuum (at lazy_scan_heap) we prepare
- * the parallel context and initialize the DSM segment that contains shared
- * information as well as the memory space for storing dead tuples.  When
- * starting either index vacuum or index cleanup, we launch parallel worker
- * processes.  Once all indexes are processed the parallel worker processes
- * exit.  After that, the leader process re-initializes the parallel context
- * so that it can use the same DSM for multiple passes of index vacuum and
- * for performing index cleanup.  For updating the index statistics, we need
- * to update the system table and since updates are not allowed during
- * parallel mode we update the index statistics after exiting from the
- * parallel mode.
- *
  * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
@@ -124,13 +103,6 @@
 #define VACUUM_FSM_EVERY_PAGES \
 	((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
 
-/*
- * Guesstimation of number of dead tuples per page.  This is used to
- * provide an upper limit to memory allocated when vacuuming small
- * tables.
- */
-#define LAZY_ALLOC_TUPLES		MaxHeapTuplesPerPage
-
 /*
  * Before we consider skipping a page that's marked as clean in
  * visibility map, we must've seen at least this many clean pages.
@@ -149,7 +121,7 @@
  * use small integers.
  */
 #define PARALLEL_VACUUM_KEY_SHARED			1
-#define PARALLEL_VACUUM_KEY_DEAD_TUPLES		2
+#define PARALLEL_VACUUM_KEY_DEAD_ITEMS		2
 #define PARALLEL_VACUUM_KEY_QUERY_TEXT		3
 #define PARALLEL_VACUUM_KEY_BUFFER_USAGE	4
 #define PARALLEL_VACUUM_KEY_WAL_USAGE		5
@@ -172,26 +144,22 @@ typedef enum
 } VacErrPhase;
 
 /*
- * LVDeadTuples stores the dead tuple TIDs collected during the heap scan.
- * This is allocated in the DSM segment in parallel mode and in local memory
- * in non-parallel mode.
+ * LVDeadItems is an array of TIDs, each of which points to an LP_DEAD
+ * item in a heap page
  */
-typedef struct LVDeadTuples
+typedef struct LVDeadItems
 {
-	int			max_tuples;		/* # slots allocated in array */
-	int			num_tuples;		/* current # of entries */
-	/* List of TIDs of tuples we intend to delete */
-	/* NB: this list is ordered by TID address */
-	ItemPointerData itemptrs[FLEXIBLE_ARRAY_MEMBER];	/* array of
-														 * ItemPointerData */
-} LVDeadTuples;
+	int			maxitems;		/* # slots allocated in array */
+	int			nitems;			/* current # of entries */
+	/* Sorted list of TIDs to delete from indexes */
+	ItemPointerData dead[FLEXIBLE_ARRAY_MEMBER];
+} LVDeadItems;
 
-/* The dead tuple space consists of LVDeadTuples and dead tuple TIDs */
-#define SizeOfDeadTuples(cnt) \
-	add_size(offsetof(LVDeadTuples, itemptrs), \
+#define SizeOfDeadItems(cnt) \
+	add_size(offsetof(LVDeadItems, dead), \
 			 mul_size(sizeof(ItemPointerData), cnt))
-#define MAXDEADTUPLES(max_size) \
-		(((max_size) - offsetof(LVDeadTuples, itemptrs)) / sizeof(ItemPointerData))
+#define MAXDEADITEMS(max_size) \
+		(((max_size) - offsetof(LVDeadItems, dead)) / sizeof(ItemPointerData))
 
 /*
  * Shared information among parallel workers.  So this is allocated in the DSM
@@ -345,7 +313,7 @@ typedef struct LVRelState
 	/*
 	 * State managed by lazy_scan_heap() follows
 	 */
-	LVDeadTuples *dead_tuples;	/* items to vacuum from indexes */
+	LVDeadItems *dead_items;	/* dead TIDs to vacuum from indexes */
 	BlockNumber rel_pages;		/* total number of pages */
 	BlockNumber scanned_pages;	/* number of pages we examined */
 	BlockNumber pinskipped_pages;	/* # of pages skipped due to a pin */
@@ -442,7 +410,7 @@ static bool should_attempt_truncation(LVRelState *vacrel);
 static void lazy_truncate_heap(LVRelState *vacrel);
 static BlockNumber count_nondeletable_pages(LVRelState *vacrel,
 											bool *lock_waiter_detected);
-static long compute_max_dead_tuples(BlockNumber relblocks, bool hasindex);
+static long compute_max_dead_items(BlockNumber relblocks, bool hasindex);
 static void lazy_space_alloc(LVRelState *vacrel, int nworkers,
 							 BlockNumber relblocks);
 static void lazy_space_free(LVRelState *vacrel);
@@ -472,8 +440,9 @@ static void restore_vacuum_error_info(LVRelState *vacrel,
 /*
  *	heap_vacuum_rel() -- perform VACUUM for one heap relation
  *
- *		This routine vacuums a single heap, cleans out its indexes, and
- *		updates its relpages and reltuples statistics.
+ *		This routine sets things up for and then calls lazy_scan_heap, which
+ *		is where the real work of vacuuming takes place.  Finalizes everything
+ *		after call returns by managing update to pg_class.
  *
  *		At entry, we have already established a transaction and opened
  *		and locked the relation.
@@ -631,7 +600,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	errcallback.previous = error_context_stack;
 	error_context_stack = &errcallback;
 
-	/* Do the vacuuming */
+	/* Do the real work of vacuuming */
 	lazy_scan_heap(vacrel, params, aggressive);
 
 	/* Done with indexes */
@@ -714,8 +683,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	 *
 	 * Deliberately avoid telling the stats collector about LP_DEAD items that
 	 * remain in the table due to VACUUM bypassing index and heap vacuuming.
-	 * ANALYZE will consider the remaining LP_DEAD items to be dead tuples. It
-	 * seems like a good idea to err on the side of not vacuuming again too
+	 * ANALYZE will consider the remaining LP_DEAD items to be dead "tuples".
+	 * It seems like a good idea to err on the side of not vacuuming again too
 	 * soon in cases where the failsafe prevented significant amounts of heap
 	 * vacuuming.
 	 */
@@ -875,25 +844,31 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 }
 
 /*
- *	lazy_scan_heap() -- scan an open heap relation
+ *	lazy_scan_heap() -- workhorse function for VACUUM
  *
- *		This routine prunes each page in the heap, which will among other
- *		things truncate dead tuples to dead line pointers, defragment the
- *		page, and set commit status bits (see heap_page_prune).  It also builds
- *		lists of dead tuples and pages with free space, calculates statistics
- *		on the number of live tuples in the heap, and marks pages as
- *		all-visible if appropriate.  When done, or when we run low on space
- *		for dead-tuple TIDs, invoke lazy_vacuum to vacuum indexes and vacuum
- *		heap relation during its own second pass over the heap.
+ *		This routine prunes each page in the heap, and considers the need to
+ *		freeze remaining tuples with storage (not including pages that can be
+ *		skipped using the visibility map).  This takes place during an initial
+ *		first pass over the heap.
  *
- *		If there are no indexes then we can reclaim line pointers on the fly;
- *		dead line pointers need only be retained until all index pointers that
- *		reference them have been killed.
+ *		Also invokes lazy_vacuum_all_indexes to vacuum indexes, which is
+ *		largely a matter of deleting index tuples that point to LP_DEAD items
+ *		left in heap pages following pruning.  First pass over the heap
+ *		collects these in the dead_items array.
+ *
+ *		Finally, invokes lazy_vacuum_heap_rel to vacuum heap pages, which
+ *		consists of marking LP_DEAD items LP_UNUSED.  This has to happen in a
+ *		second, final pass over the heap, to preserve a basic invariant that
+ *		all index AMs rely on.  No extant index tuple can ever be allowed to
+ *		contain a TID that points to a line pointer that becomes LP_UNUSED.
+ *		(Actually, it's safe to do everything in one single pass over the heap
+ *		when there happen to be no indexes -- no danger from line pointers
+ *		being recycled for unrelated logical rows when there are no indexes.)
  */
 static void
 lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
 {
-	LVDeadTuples *dead_tuples;
+	LVDeadItems *dead_items;
 	BlockNumber nblocks,
 				blkno,
 				next_unskippable_block,
@@ -906,7 +881,7 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
-		PROGRESS_VACUUM_MAX_DEAD_TUPLES
+		PROGRESS_VACUUM_MAX_DEAD_ITEMS
 	};
 	int64		initprog_val[3];
 	GlobalVisState *vistest;
@@ -959,15 +934,15 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
 	/*
 	 * Allocate the space for dead tuples.  Note that this handles parallel
 	 * VACUUM initialization as part of allocating shared memory space used
-	 * for dead_tuples.
+	 * for dead_items.
 	 */
 	lazy_space_alloc(vacrel, params->nworkers, nblocks);
-	dead_tuples = vacrel->dead_tuples;
+	dead_items = vacrel->dead_items;
 
 	/* Report that we're scanning the heap, advertising total # of blocks */
 	initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
 	initprog_val[1] = nblocks;
-	initprog_val[2] = dead_tuples->max_tuples;
+	initprog_val[2] = dead_items->maxitems;
 	pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
 
 	/*
@@ -1158,8 +1133,8 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
 		 * dead-tuple TIDs, pause and do a cycle of vacuuming before we tackle
 		 * this page.
 		 */
-		if ((dead_tuples->max_tuples - dead_tuples->num_tuples) < MaxHeapTuplesPerPage &&
-			dead_tuples->num_tuples > 0)
+		if (dead_items->maxitems - dead_items->nitems < MaxHeapTuplesPerPage &&
+			dead_items->nitems > 0)
 		{
 			/*
 			 * Before beginning index vacuuming, we release any pin we may
@@ -1270,7 +1245,7 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
 		}
 
 		/*
-		 * By here we definitely have enough dead_tuples space for whatever
+		 * By here we definitely have enough dead_items space for whatever
 		 * LP_DEAD tids are on this page, we have the visibility map page set
 		 * up in case we need to set this page's all_visible/all_frozen bit,
 		 * and we have a super-exclusive lock.  Any tuples on this page are
@@ -1391,7 +1366,7 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
 				lazy_vacuum_heap_page(vacrel, blkno, buf, 0, &vmbuffer);
 
 				/* Forget the now-vacuumed tuples */
-				dead_tuples->num_tuples = 0;
+				dead_items->nitems = 0;
 
 				/*
 				 * Periodically perform FSM vacuuming to make newly-freed
@@ -1428,7 +1403,7 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
 			 * with prunestate-driven visibility map and FSM steps (just like
 			 * the two-pass strategy).
 			 */
-			Assert(dead_tuples->num_tuples == 0);
+			Assert(dead_items->nitems == 0);
 		}
 
 		/*
@@ -1490,12 +1465,12 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
 		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
 		 * is correct.
 		 *
-		 * There should never be dead tuples on a page with PD_ALL_VISIBLE
+		 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE
 		 * set, however.
 		 */
 		else if (prunestate.has_lpdead_items && PageIsAllVisible(page))
 		{
-			elog(WARNING, "page containing dead tuples is marked as all-visible in relation \"%s\" page %u",
+			elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
 				 vacrel->relname, blkno);
 			PageClearAllVisible(page);
 			MarkBufferDirty(buf);
@@ -1585,8 +1560,8 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
 		vmbuffer = InvalidBuffer;
 	}
 
-	/* If any tuples need to be deleted, perform final vacuum cycle */
-	if (dead_tuples->num_tuples > 0)
+	/* Perform a final round of index and heap vacuuming */
+	if (dead_items->nitems > 0)
 		lazy_vacuum(vacrel);
 
 	/*
@@ -1677,7 +1652,7 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
  * The approach we take now is to restart pruning when the race condition is
  * detected.  This allows heap_page_prune() to prune the tuples inserted by
  * the now-aborted transaction.  This is a little crude, but it guarantees
- * that any items that make it into the dead_tuples array are simple LP_DEAD
+ * that any items that make it into the dead_items array are simple LP_DEAD
  * line pointers, and that every remaining item with tuple storage is
  * considered as a candidate for freezing.
  */
@@ -1817,8 +1792,8 @@ retry:
 		 * impossible (e.g. in-progress insert from the same transaction).
 		 *
 		 * We treat LP_DEAD items a little differently, too -- we don't count
-		 * them as dead_tuples at all (we only consider new_dead_tuples).  The
-		 * outcome is no different because we assume that any LP_DEAD items we
+		 * them at all (we only consider new_dead_tuples).  The outcome is
+		 * really no different, since we assume that any LP_DEAD items we
 		 * encounter here will become LP_UNUSED inside lazy_vacuum_heap_page()
 		 * before we report anything to the stats collector. (Cases where we
 		 * bypass index vacuuming will violate our assumption, but the overall
@@ -2023,12 +1998,11 @@ retry:
 #endif
 
 	/*
-	 * Now save details of the LP_DEAD items from the page in the dead_tuples
-	 * array
+	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
 	if (lpdead_items > 0)
 	{
-		LVDeadTuples *dead_tuples = vacrel->dead_tuples;
+		LVDeadItems *dead_items = vacrel->dead_items;
 		ItemPointerData tmp;
 
 		Assert(!prunestate->all_visible);
@@ -2041,12 +2015,12 @@ retry:
 		for (int i = 0; i < lpdead_items; i++)
 		{
 			ItemPointerSetOffsetNumber(&tmp, deadoffsets[i]);
-			dead_tuples->itemptrs[dead_tuples->num_tuples++] = tmp;
+			dead_items->dead[dead_items->nitems++] = tmp;
 		}
 
-		Assert(dead_tuples->num_tuples <= dead_tuples->max_tuples);
-		pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
-									 dead_tuples->num_tuples);
+		Assert(dead_items->nitems <= dead_items->maxitems);
+		pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_ITEMS,
+									 dead_items->nitems);
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
@@ -2077,7 +2051,7 @@ lazy_vacuum(LVRelState *vacrel)
 	if (!vacrel->do_index_vacuuming)
 	{
 		Assert(!vacrel->do_index_cleanup);
-		vacrel->dead_tuples->num_tuples = 0;
+		vacrel->dead_items->nitems = 0;
 		return;
 	}
 
@@ -2106,7 +2080,7 @@ lazy_vacuum(LVRelState *vacrel)
 		BlockNumber threshold;
 
 		Assert(vacrel->num_index_scans == 0);
-		Assert(vacrel->lpdead_items == vacrel->dead_tuples->num_tuples);
+		Assert(vacrel->lpdead_items == vacrel->dead_items->nitems);
 		Assert(vacrel->do_index_vacuuming);
 		Assert(vacrel->do_index_cleanup);
 
@@ -2122,7 +2096,7 @@ lazy_vacuum(LVRelState *vacrel)
 		 * to store the TIDs (TIDs that now all point to LP_DEAD items) must
 		 * not exceed 32MB.  This limits the risk that we will bypass index
 		 * vacuuming again and again until eventually there is a VACUUM whose
-		 * dead_tuples space is not CPU cache resident.
+		 * dead_items space is not CPU cache resident.
 		 *
 		 * We don't take any special steps to remember the LP_DEAD items (such
 		 * as counting them in new_dead_tuples report to the stats collector)
@@ -2134,7 +2108,7 @@ lazy_vacuum(LVRelState *vacrel)
 		 */
 		threshold = (double) vacrel->rel_pages * BYPASS_THRESHOLD_PAGES;
 		bypass = (vacrel->lpdead_item_pages < threshold &&
-				  vacrel->lpdead_items < MAXDEADTUPLES(32L * 1024L * 1024L));
+				  vacrel->lpdead_items < MAXDEADITEMS(32L * 1024L * 1024L));
 	}
 
 	if (bypass)
@@ -2184,7 +2158,7 @@ lazy_vacuum(LVRelState *vacrel)
 	 * Forget the LP_DEAD items that we just vacuumed (or just decided to not
 	 * vacuum)
 	 */
-	vacrel->dead_tuples->num_tuples = 0;
+	vacrel->dead_items->nitems = 0;
 }
 
 /*
@@ -2258,7 +2232,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
 	 * place).
 	 */
 	Assert(vacrel->num_index_scans > 0 ||
-		   vacrel->dead_tuples->num_tuples == vacrel->lpdead_items);
+		   vacrel->dead_items->nitems == vacrel->lpdead_items);
 	Assert(allindexes || vacrel->failsafe_active);
 
 	/*
@@ -2277,7 +2251,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
 /*
  *	lazy_vacuum_heap_rel() -- second pass over the heap for two pass strategy
  *
- * This routine marks LP_DEAD items in vacrel->dead_tuples array as LP_UNUSED.
+ * This routine marks LP_DEAD items in vacrel->dead_items array as LP_UNUSED.
  * Pages that never had lazy_scan_prune record LP_DEAD items are not visited
  * at all.
  *
@@ -2318,7 +2292,7 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
 	vacuumed_pages = 0;
 
 	tupindex = 0;
-	while (tupindex < vacrel->dead_tuples->num_tuples)
+	while (tupindex < vacrel->dead_items->nitems)
 	{
 		BlockNumber tblk;
 		Buffer		buf;
@@ -2327,7 +2301,7 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
 
 		vacuum_delay_point();
 
-		tblk = ItemPointerGetBlockNumber(&vacrel->dead_tuples->itemptrs[tupindex]);
+		tblk = ItemPointerGetBlockNumber(&vacrel->dead_items->dead[tupindex]);
 		vacrel->blkno = tblk;
 		buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, tblk, RBM_NORMAL,
 								 vacrel->bstrategy);
@@ -2373,27 +2347,27 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
 
 /*
  *	lazy_vacuum_heap_page() -- free page's LP_DEAD items listed in the
- *						  vacrel->dead_tuples array.
+ *						  vacrel->dead_items array.
  *
  * Caller must have an exclusive buffer lock on the buffer (though a
  * super-exclusive lock is also acceptable).
  *
- * tupindex is the index in vacrel->dead_tuples of the first dead tuple for
- * this page.  We assume the rest follow sequentially.  The return value is
- * the first tupindex after the tuples of this page.
+ * tupindex is an offset into the vacrel->dead_items array for the first
+ * listed LP_DEAD item on the page.  The return value is the first tupindex
+ * immediately after all LP_DEAD items for the same page in the array.
  *
  * Prior to PostgreSQL 14 there were rare cases where this routine had to set
  * tuples with storage to unused.  These days it is strictly responsible for
  * marking LP_DEAD stub line pointers as unused.  This only happens for those
  * LP_DEAD items on the page that were determined to be LP_DEAD items back
  * when the same page was visited by lazy_scan_prune() (i.e. those whose TID
- * was recorded in the dead_tuples array).
+ * was recorded in the dead_items array at that time).
  */
 static int
 lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 					  int tupindex, Buffer *vmbuffer)
 {
-	LVDeadTuples *dead_tuples = vacrel->dead_tuples;
+	LVDeadItems *dead_items = vacrel->dead_items;
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			uncnt = 0;
@@ -2412,16 +2386,16 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 
 	START_CRIT_SECTION();
 
-	for (; tupindex < dead_tuples->num_tuples; tupindex++)
+	for (; tupindex < dead_items->nitems; tupindex++)
 	{
 		BlockNumber tblk;
 		OffsetNumber toff;
 		ItemId		itemid;
 
-		tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
+		tblk = ItemPointerGetBlockNumber(&dead_items->dead[tupindex]);
 		if (tblk != blkno)
 			break;				/* past end of tuples for this block */
-		toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
+		toff = ItemPointerGetOffsetNumber(&dead_items->dead[tupindex]);
 		itemid = PageGetItemId(page, toff);
 
 		Assert(ItemIdIsDead(itemid) && !ItemIdHasStorage(itemid));
@@ -2998,8 +2972,9 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
 /*
  *	lazy_vacuum_one_index() -- vacuum index relation.
  *
- *		Delete all the index entries pointing to tuples listed in
- *		dead_tuples, and update running statistics.
+ *		Delete all the index entries that point to items in the heap
+ *		that are now listed in dead_items.  Also, update running
+ *		statistics.
  *
  *		reltuples is the number of heap tuples to be passed to the
  *		bulkdelete callback.  It's always assumed to be estimated.
@@ -3038,11 +3013,11 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 
 	/* Do bulk deletion */
 	istat = index_bulk_delete(&ivinfo, istat, lazy_tid_reaped,
-							  (void *) vacrel->dead_tuples);
+							  (void *) vacrel->dead_items);
 
 	ereport(elevel,
 			(errmsg("scanned index \"%s\" to remove %d row versions",
-					vacrel->indname, vacrel->dead_tuples->num_tuples),
+					vacrel->indname, vacrel->dead_items->nitems),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 
 	/* Revert to the previous phase information for error traceback */
@@ -3431,33 +3406,34 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
 }
 
 /*
- * Return the maximum number of dead tuples we can record.
+ * Computes the number of dead TIDs that VACUUM will have to store in the
+ * worst case, where all line pointers are allocated, and all are LP_DEAD
  */
 static long
-compute_max_dead_tuples(BlockNumber relblocks, bool hasindex)
+compute_max_dead_items(BlockNumber relblocks, bool hasindex)
 {
-	long		maxtuples;
+	long		maxitems;
 	int			vac_work_mem = IsAutoVacuumWorkerProcess() &&
 	autovacuum_work_mem != -1 ?
 	autovacuum_work_mem : maintenance_work_mem;
 
 	if (hasindex)
 	{
-		maxtuples = MAXDEADTUPLES(vac_work_mem * 1024L);
-		maxtuples = Min(maxtuples, INT_MAX);
-		maxtuples = Min(maxtuples, MAXDEADTUPLES(MaxAllocSize));
+		maxitems = MAXDEADITEMS(vac_work_mem * 1024L);
+		maxitems = Min(maxitems, INT_MAX);
+		maxitems = Min(maxitems, MAXDEADITEMS(MaxAllocSize));
 
 		/* curious coding here to ensure the multiplication can't overflow */
-		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks)
-			maxtuples = relblocks * LAZY_ALLOC_TUPLES;
+		if ((BlockNumber) (maxitems / MaxHeapTuplesPerPage) > relblocks)
+			maxitems = relblocks * MaxHeapTuplesPerPage;
 
 		/* stay sane if small maintenance_work_mem */
-		maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
+		maxitems = Max(maxitems, MaxHeapTuplesPerPage);
 	}
 	else
-		maxtuples = MaxHeapTuplesPerPage;
+		maxitems = MaxHeapTuplesPerPage;
 
-	return maxtuples;
+	return maxitems;
 }
 
 /*
@@ -3468,8 +3444,8 @@ compute_max_dead_tuples(BlockNumber relblocks, bool hasindex)
 static void
 lazy_space_alloc(LVRelState *vacrel, int nworkers, BlockNumber nblocks)
 {
-	LVDeadTuples *dead_tuples;
-	long		maxtuples;
+	LVDeadItems *dead_items;
+	long		maxitems;
 
 	/*
 	 * Initialize state for a parallel vacuum.  As of now, only one worker can
@@ -3501,13 +3477,13 @@ lazy_space_alloc(LVRelState *vacrel, int nworkers, BlockNumber nblocks)
 			return;
 	}
 
-	maxtuples = compute_max_dead_tuples(nblocks, vacrel->nindexes > 0);
+	maxitems = compute_max_dead_items(nblocks, vacrel->nindexes > 0);
 
-	dead_tuples = (LVDeadTuples *) palloc(SizeOfDeadTuples(maxtuples));
-	dead_tuples->num_tuples = 0;
-	dead_tuples->max_tuples = (int) maxtuples;
+	dead_items = (LVDeadItems *) palloc(SizeOfDeadItems(maxitems));
+	dead_items->nitems = 0;
+	dead_items->maxitems = (int) maxitems;
 
-	vacrel->dead_tuples = dead_tuples;
+	vacrel->dead_items = dead_items;
 }
 
 /*
@@ -3531,24 +3507,24 @@ lazy_space_free(LVRelState *vacrel)
  *
  *		This has the right signature to be an IndexBulkDeleteCallback.
  *
- *		Assumes dead_tuples array is in sorted order.
+ *		Assumes dead_items array is sorted (in TID order).
  */
 static bool
 lazy_tid_reaped(ItemPointer itemptr, void *state)
 {
-	LVDeadTuples *dead_tuples = (LVDeadTuples *) state;
+	LVDeadItems *dead_items = (LVDeadItems *) state;
 	int64		litem,
 				ritem,
 				item;
 	ItemPointer res;
 
-	litem = itemptr_encode(&dead_tuples->itemptrs[0]);
-	ritem = itemptr_encode(&dead_tuples->itemptrs[dead_tuples->num_tuples - 1]);
+	litem = itemptr_encode(&dead_items->dead[0]);
+	ritem = itemptr_encode(&dead_items->dead[dead_items->nitems - 1]);
 	item = itemptr_encode(itemptr);
 
 	/*
 	 * Doing a simple bound check before bsearch() is useful to avoid the
-	 * extra cost of bsearch(), especially if dead tuples on the heap are
+	 * extra cost of bsearch(), especially if dead items on the heap are
 	 * concentrated in a certain range.  Since this function is called for
 	 * every index tuple, it pays to be really fast.
 	 */
@@ -3556,8 +3532,8 @@ lazy_tid_reaped(ItemPointer itemptr, void *state)
 		return false;
 
 	res = (ItemPointer) bsearch((void *) itemptr,
-								(void *) dead_tuples->itemptrs,
-								dead_tuples->num_tuples,
+								(void *) dead_items->dead,
+								dead_items->nitems,
 								sizeof(ItemPointerData),
 								vac_cmp_itemptr);
 
@@ -3831,13 +3807,13 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
 	int			nindexes = vacrel->nindexes;
 	ParallelContext *pcxt;
 	LVShared   *shared;
-	LVDeadTuples *dead_tuples;
+	LVDeadItems *dead_items;
 	BufferUsage *buffer_usage;
 	WalUsage   *wal_usage;
 	bool	   *will_parallel_vacuum;
-	long		maxtuples;
+	long		maxitems;
 	Size		est_shared;
-	Size		est_deadtuples;
+	Size		est_deaditems;
 	int			nindexes_mwm = 0;
 	int			parallel_workers = 0;
 	int			querylen;
@@ -3910,10 +3886,10 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
 	shm_toc_estimate_chunk(&pcxt->estimator, est_shared);
 	shm_toc_estimate_keys(&pcxt->estimator, 1);
 
-	/* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
-	maxtuples = compute_max_dead_tuples(nblocks, true);
-	est_deadtuples = MAXALIGN(SizeOfDeadTuples(maxtuples));
-	shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
+	/* Estimate size for dead items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
+	maxitems = compute_max_dead_items(nblocks, true);
+	est_deaditems = MAXALIGN(SizeOfDeadItems(maxitems));
+	shm_toc_estimate_chunk(&pcxt->estimator, est_deaditems);
 	shm_toc_estimate_keys(&pcxt->estimator, 1);
 
 	/*
@@ -3975,13 +3951,13 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
 	shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
 	lps->lvshared = shared;
 
-	/* Prepare the dead tuple space */
-	dead_tuples = (LVDeadTuples *) shm_toc_allocate(pcxt->toc, est_deadtuples);
-	dead_tuples->max_tuples = maxtuples;
-	dead_tuples->num_tuples = 0;
-	MemSet(dead_tuples->itemptrs, 0, sizeof(ItemPointerData) * maxtuples);
-	shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
-	vacrel->dead_tuples = dead_tuples;
+	/* Prepare the dead item space */
+	dead_items = (LVDeadItems *) shm_toc_allocate(pcxt->toc, est_deaditems);
+	dead_items->maxitems = maxitems;
+	dead_items->nitems = 0;
+	MemSet(dead_items->dead, 0, sizeof(ItemPointerData) * maxitems);
+	shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
+	vacrel->dead_items = dead_items;
 
 	/*
 	 * Allocate space for each worker's BufferUsage and WalUsage; no need to
@@ -4139,7 +4115,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	Relation	rel;
 	Relation   *indrels;
 	LVShared   *lvshared;
-	LVDeadTuples *dead_tuples;
+	LVDeadItems *dead_items;
 	BufferUsage *buffer_usage;
 	WalUsage   *wal_usage;
 	int			nindexes;
@@ -4181,10 +4157,10 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
 	Assert(nindexes > 0);
 
-	/* Set dead tuple space */
-	dead_tuples = (LVDeadTuples *) shm_toc_lookup(toc,
-												  PARALLEL_VACUUM_KEY_DEAD_TUPLES,
-												  false);
+	/* Lookup dead item space */
+	dead_items = (LVDeadItems *) shm_toc_lookup(toc,
+												PARALLEL_VACUUM_KEY_DEAD_ITEMS,
+												false);
 
 	/* Set cost-based vacuum delay */
 	VacuumCostActive = (VacuumCostDelay > 0);
@@ -4214,7 +4190,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	vacrel.relname = pstrdup(RelationGetRelationName(rel));
 	vacrel.indname = NULL;
 	vacrel.phase = VACUUM_ERRCB_PHASE_UNKNOWN;	/* Not yet processing */
-	vacrel.dead_tuples = dead_tuples;
+	vacrel.dead_items = dead_items;
 
 	/* Setup error traceback support for ereport() */
 	errcallback.callback = vacuum_error_callback;
-- 
2.30.2

Dilip Kumar

dilipbalaut@gmail.com

about 4 years ago

In reply to: Peter Geoghegan (#1)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On Wed, Nov 24, 2021 at 11:16 AM Peter Geoghegan <pg@bowt.ie> wrote:

Attached patch performs polishing within vacuumlazy.c, as follow-up
work to the refactoring work in Postgres 14. This mainly consists of
changing references of dead tuples to dead items, which reflects the
fact that VACUUM no longer deals with TIDs that might point to
remaining heap tuples with storage -- the TIDs in the array must now
strictly point to LP_DEAD stub line pointers that remain in the heap,
following pruning.

I've also simplified header comments, and comments above the main
entry point functions. These comments made much more sense back when
lazy_scan_heap() was simpler, but wasn't yet broken up into smaller,
better-scoped functions.

If there are no objections, I'll move on this soon. It's mostly just
mechanical changes.

-#define PROGRESS_VACUUM_NUM_DEAD_TUPLES 6
+#define PROGRESS_VACUUM_MAX_DEAD_ITEMS 5
+#define PROGRESS_VACUUM_NUM_DEAD_ITEMS 6

Wouldn't this be more logical to change to DEAD_TIDS instead of DEAD_ITEMS?

+ /* Sorted list of TIDs to delete from indexes */
+ ItemPointerData dead[FLEXIBLE_ARRAY_MEMBER];

Instead of just dead, why not deadtid or deaditem?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Masahiko Sawada

sawada.mshk@gmail.com

about 4 years ago

In reply to: Peter Geoghegan (#1)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On Wed, Nov 24, 2021 at 2:46 PM Peter Geoghegan <pg@bowt.ie> wrote:

Attached patch performs polishing within vacuumlazy.c, as follow-up
work to the refactoring work in Postgres 14. This mainly consists of
changing references of dead tuples to dead items, which reflects the
fact that VACUUM no longer deals with TIDs that might point to
remaining heap tuples with storage -- the TIDs in the array must now
strictly point to LP_DEAD stub line pointers that remain in the heap,
following pruning.

If there are no objections, I'll move on this soon. It's mostly just
mechanical changes.

The patch renames dead tuples to dead items at some places and to
dead TIDs at some places. For instance, it renames dead tuples to dead
TIDs here:

- * Return the maximum number of dead tuples we can record.
+ * Computes the number of dead TIDs that VACUUM will have to store in the
+ * worst case, where all line pointers are allocated, and all are LP_DEAD

whereas renames to dead items here:

-         * extra cost of bsearch(), especially if dead tuples on the heap are
+         * extra cost of bsearch(), especially if dead items on the heap are

I think it's more consistent if we change it to one side. I prefer "dead items".

---
There is one more place where we can rename "dead tuples":

/*
* Allocate the space for dead tuples. Note that this handles parallel
* VACUUM initialization as part of allocating shared memory space used
* for dead_items.
*/

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Robert Haas

robertmhaas@gmail.com

about 4 years ago

In reply to: Masahiko Sawada (#3)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On Wed, Nov 24, 2021 at 7:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I think it's more consistent if we change it to one side. I prefer "dead items".

I feel like "items" is quite a generic word, so I think I would prefer
TIDs. But it's probably not a big deal.

--
Robert Haas
EDB: http://www.enterprisedb.com

Alvaro Herrera

alvherre@alvh.no-ip.org

about 4 years ago

In reply to: Robert Haas (#4)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On 2021-Nov-24, Robert Haas wrote:

On Wed, Nov 24, 2021 at 7:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I think it's more consistent if we change it to one side. I prefer
"dead items".

I feel like "items" is quite a generic word, so I think I would prefer
TIDs. But it's probably not a big deal.

Is there clarity on what each term means?

Since this patch only changes things that are specific to heap
vacuuming, it seems OK to rely the convention that "item" means "heap
item" (not just any generic item). However, I'm not sure that we fully
agree exactly what a heap item is. Maybe if we agree to a single non
ambiguous definition for each of those terms we can agree what
terminology to use.

It seems to me we have the following terms:

- tuple
- line pointer
- [heap] item
- TID

My mental model is that "tuple" (in the narrow context of heap vacuum)
is the variable-size on-disk representation of a row in a page; "line
pointer" is the fixed-size struct at the bottom of each page that
contains location, size and flags of a tuple: struct ItemIdData. The
TID is the address of a line pointer -- an ItemPointerData.

What is an item? Is an item the same as a line pointer? That seems
confusing. I think "item" means the tuple as a whole. In that light,
using the term TID for some of the things that the patch renames to
"item" seems more appropriate.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

Robert Haas

robertmhaas@gmail.com

about 4 years ago

In reply to: Alvaro Herrera (#5)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On Wed, Nov 24, 2021 at 9:37 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

My mental model is that "tuple" (in the narrow context of heap vacuum)
is the variable-size on-disk representation of a row in a page; "line
pointer" is the fixed-size struct at the bottom of each page that
contains location, size and flags of a tuple: struct ItemIdData. The
TID is the address of a line pointer -- an ItemPointerData.

What is an item? Is an item the same as a line pointer? That seems
confusing. I think "item" means the tuple as a whole. In that light,
using the term TID for some of the things that the patch renames to
"item" seems more appropriate.

Hmm. I think in my model an item and an item pointer and a line
pointer are all the same thing, but a TID is different. When I talk
about a TID, I mean the location of an item pointer, not its contents.
So a TID is what tells me that I want block 5 and the 4th slot in the
item pointer array. The item pointer tells me that the associate tuple
is at a certain position in the page and has a certain length.

--
Robert Haas
EDB: http://www.enterprisedb.com

Alvaro Herrera

alvherre@alvh.no-ip.org

about 4 years ago

In reply to: Robert Haas (#6)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On 2021-Nov-24, Robert Haas wrote:

Hmm. I think in my model an item and an item pointer and a line
pointer are all the same thing, but a TID is different. When I talk
about a TID, I mean the location of an item pointer, not its contents.
So a TID is what tells me that I want block 5 and the 4th slot in the
item pointer array. The item pointer tells me that the associate tuple
is at a certain position in the page and has a certain length.

OK, but you can have item pointers that don't have any item.
LP_REDIRECT, LP_DEAD, LP_UNUSED item pointers don't have items.

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
"Nunca confiaré en un traidor. Ni siquiera si el traidor lo he creado yo"
(Barón Vladimir Harkonnen)

Alvaro Herrera

alvherre@alvh.no-ip.org

about 4 years ago

In reply to: Alvaro Herrera (#7)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On 2021-Nov-24, Alvaro Herrera wrote:

On 2021-Nov-24, Robert Haas wrote:

Hmm. I think in my model an item and an item pointer and a line
pointer are all the same thing, but a TID is different. When I talk
about a TID, I mean the location of an item pointer, not its contents.
So a TID is what tells me that I want block 5 and the 4th slot in the
item pointer array. The item pointer tells me that the associate tuple
is at a certain position in the page and has a certain length.

OK, but you can have item pointers that don't have any item.
LP_REDIRECT, LP_DEAD, LP_UNUSED item pointers don't have items.

Sorry to reply to myself, but I realized that I forgot to return to the
main point of this thread. If we agree that "an LP_DEAD item pointer
does not point to any item" (an assertion that gives a precise meaning
to both those terms), then a patch that renames "tuples" to "items" is
not doing anything useful IMO, because those two terms are synonyms.

Now maybe Peter doesn't agree with the definitions I suggest, in which
case I would like to know what his definitions are.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"How strange it is to find the words "Perl" and "saner" in such close
proximity, with no apparent sense of irony. I doubt that Larry himself
could have managed it." (ncm, http://lwn.net/Articles/174769/)

Robert Haas

robertmhaas@gmail.com

about 4 years ago

In reply to: Alvaro Herrera (#7)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On Wed, Nov 24, 2021 at 9:51 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2021-Nov-24, Robert Haas wrote:

Hmm. I think in my model an item and an item pointer and a line
pointer are all the same thing, but a TID is different. When I talk
about a TID, I mean the location of an item pointer, not its contents.
So a TID is what tells me that I want block 5 and the 4th slot in the
item pointer array. The item pointer tells me that the associate tuple
is at a certain position in the page and has a certain length.

OK, but you can have item pointers that don't have any item.
LP_REDIRECT, LP_DEAD, LP_UNUSED item pointers don't have items.

I guess so. I said before that I thought an item and an item pointer
were the same, but on reflection, that doesn't entirely make sense.
But I don't know that I like making item and tuple synonymous either.
I think perhaps the term "item" by itself is not very clear.

--
Robert Haas
EDB: http://www.enterprisedb.com

#10

Peter Geoghegan

pg@bowt.ie

about 4 years ago

In reply to: Alvaro Herrera (#8)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On Wed, Nov 24, 2021 at 7:16 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Sorry to reply to myself, but I realized that I forgot to return to the
main point of this thread. If we agree that "an LP_DEAD item pointer
does not point to any item" (an assertion that gives a precise meaning
to both those terms), then a patch that renames "tuples" to "items" is
not doing anything useful IMO, because those two terms are synonyms.

TIDs (ItemPointerData structs) are of course not the same thing as
line pointers (ItemIdData structs). There is a tendency to refer to
the latter as "item pointers" all the same, which was confusing. I
personally corrected/normalized this in commit ae7291ac in 2019. I
think that it's worth being careful about precisely because they're
closely related (but distinct) concepts. And so FWIW "LP_DEAD item
pointer" is not a thing. I agree that an LP_DEAD item pointer has no
tuple storage, and so you could say that it points to nothing (though
only in heapam). I probably would just say that it has no tuple
storage, though.

Now maybe Peter doesn't agree with the definitions I suggest, in which
case I would like to know what his definitions are.

I agree with others that the term "item" is vague, but I don't think
that that's necessarily a bad thing here -- I deliberately changed the
comments to say either "TIDs" or "LP_DEAD items", emphasizing whatever
the important aspect seemed to be in each context (they're LP_DEAD
items to the heap structure, TIDs to index structures).

I'm not attached to the term "item". To me the truly important point
is what these items are *not*: they're not tuples. The renaming is
intended to enforce the concepts that I went into at the end of the
commit message for commit 8523492d. Now the pruning steps in
lazy_scan_prune always avoiding keeping around a DEAD tuple with tuple
storage on return to lazy_scan_heap (only LP_DEAD items can remain),
since (as of that commit) lazy_scan_prune alone is responsible for
things involving the "logical database".

This means that index vacuuming and heap vacuuming can now be thought
of as removing garbage items from physical data structures (they're
purely "physical database" concepts), and nothing else. They don't
need recovery conflicts. How could they? Where are you supposed to get
the XIDs for that from, when you've only got LP_DEAD items?

This is also related to the idea that pruning by VACUUM isn't
necessarily all that special compared to earlier pruning or concurrent
opportunistic pruning. As I go into on the other recent thread on
removing special cases in vacuumlazy.c, ISTM that we ought to do
everything except pruning itself (and freezing tuples, which
effectively depends on pruning) without even acquiring a cleanup lock.
Which is actually quite a lot of things.

--
Peter Geoghegan

#11

Alvaro Herrera

alvherre@alvh.no-ip.org

about 4 years ago

In reply to: Peter Geoghegan (#10)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On 2021-Nov-24, Peter Geoghegan wrote:

TIDs (ItemPointerData structs) are of course not the same thing as
line pointers (ItemIdData structs). There is a tendency to refer to
the latter as "item pointers" all the same, which was confusing. I
personally corrected/normalized this in commit ae7291ac in 2019. I
think that it's worth being careful about precisely because they're
closely related (but distinct) concepts. And so FWIW "LP_DEAD item
pointer" is not a thing. I agree that an LP_DEAD item pointer has no
tuple storage, and so you could say that it points to nothing (though
only in heapam). I probably would just say that it has no tuple
storage, though.

OK, this makes a lot more sense. I wasn't aware of ae7291ac (and I
wasn't aware of the significance of 8523492d either, but that's not
really relevant here.)

I agree with others that the term "item" is vague, but I don't think
that that's necessarily a bad thing here -- I deliberately changed the
comments to say either "TIDs" or "LP_DEAD items", emphasizing whatever
the important aspect seemed to be in each context (they're LP_DEAD
items to the heap structure, TIDs to index structures).

I think we could say "LP_DEAD line pointer" and that would be perfectly
clear. Given how nuanced we have to be if we want to be clear about
this, I would rather not use "LP_DEAD item"; that seems slightly
contradictory, since the item is the storage and such a line pointer
does not have storage. Perhaps change that define in progress.h to
PROGRESS_VACUUM_NUM_DEAD_LPS, and, in the first comment in vacuumlazy.c,
use wording such as

+ * The major space usage for LAZY VACUUM is storage for the array of TIDs
+ * of dead line pointers that are to be removed from indexes.

+ * The major space usage for LAZY VACUUM is storage for the array of TIDs
+ * of LP_DEAD line pointers that are to be removed from indexes.

(The point being that TIDs are not dead themselves, only the line
pointers that they refer to.)

--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
"Most hackers will be perfectly comfortable conceptualizing users as entropy
sources, so let's move on." (Nathaniel Smith)

#12

Peter Geoghegan

pg@bowt.ie

about 4 years ago

In reply to: Alvaro Herrera (#11)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On Wed, Nov 24, 2021 at 9:53 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

OK, this makes a lot more sense. I wasn't aware of ae7291ac (and I
wasn't aware of the significance of 8523492d either, but that's not
really relevant here.)

Thanks for hearing me out about the significance of 8523492d.

Having the right formalisms seems to really matter here, because they
enable decoupling, which is generally very useful. This makes it easy
to understand (just for example) that index vacuuming and heap
vacuuming are just additive, optional steps (in principle) -- an idea
that will become even more important once we get Robert's pending TID
conveyor belt design. I believe that that design goes one step further
than what we have today, by making index vacuuming and heap vacuuming
occur in a distinct operation to VACUUM proper (VACUUM would only need
to set up the LP_DEAD item list for index vacuuming and heap
vacuuming, which may or may not happen immediately after).

An interesting question (at least to me) is: within a non-aggressive
VACUUM, what remaining steps are *not* technically optional?

I am pretty sure that they're all optional in principle (or will be
soon), because soon we will be able to independently advance
relfrozenxid without freezing all tuples with XIDs before our original
FreezeLimit (FreezeLimit should only be used to decide which tuples to
freeze, not to decide on a new relfrozenxid). Soon almost everything
will be decoupled, without changing the basic invariants that we've
had for many years. This flexibility seems really important to me.

That just leaves avoiding pruning without necessarily avoiding
ostensibly related processing for indexes. We can already
independently prune without doing index/heap vacuuming (the bypass
indexes optimization). We will also be able to do the opposite thing,
with my new patch: we can perform index/heap vacuuming *without*
pruning ourselves. This makes sense in the case where we cannot
acquire a cleanup lock on a heap page with preexisting LP_DEAD items.

I think we could say "LP_DEAD line pointer" and that would be perfectly
clear. Given how nuanced we have to be if we want to be clear about
this, I would rather not use "LP_DEAD item"; that seems slightly
contradictory, since the item is the storage and such a line pointer
does not have storage. Perhaps change that define in progress.h to
PROGRESS_VACUUM_NUM_DEAD_LPS, and, in the first comment in vacuumlazy.c,
use wording such as

I agree with all that, I think. But it's still not clear what the
variable dead_tuples should be renamed to within the structure that
you lay out (I imagine that you agree with me that dead_tuples is now
a bad name). This one detail affects more individual lines of code
than the restructuring of comments.

--
Peter Geoghegan

#13

Peter Geoghegan

pg@bowt.ie

about 4 years ago

In reply to: Masahiko Sawada (#3)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On Wed, Nov 24, 2021 at 4:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

The patch renames dead tuples to dead items at some places and to
dead TIDs at some places.

I think it's more consistent if we change it to one side. I prefer "dead items".

I just pushed a version of the patch that still uses both terms when
talking about dead_items. But the final commit actually makes it clear
why, in comments above the LVDeadItems struct itself: LVDeadItems is
used by both index vacuuming and heap vacuuming.

Thanks
--
Peter Geoghegan

#14

Masahiko Sawada

sawada.mshk@gmail.com

about 4 years ago

In reply to: Peter Geoghegan (#13)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On Tue, Nov 30, 2021 at 3:00 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Nov 24, 2021 at 4:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

The patch renames dead tuples to dead items at some places and to
dead TIDs at some places.

I think it's more consistent if we change it to one side. I prefer "dead items".

I just pushed a version of the patch that still uses both terms when
talking about dead_items.

Thanks! I'll change my parallel vacuum refactoring patch accordingly.

Regarding the commit, I think that there still is one place in
lazyvacuum.c where we can change "dead tuples” to "dead items”:

/*
* Allocate the space for dead tuples. Note that this handles parallel
* VACUUM initialization as part of allocating shared memory space used
* for dead_items.
*/
dead_items_alloc(vacrel, params->nworkers);
dead_items = vacrel->dead_items;

Also, the commit doesn't change both PROGRESS_VACUUM_MAX_DEAD_TUPLES
and PROGRESS_VACUUM_NUM_DEAD_TUPLES. Did you leave them on purpose?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#15

Peter Geoghegan

pg@bowt.ie

about 4 years ago

In reply to: Masahiko Sawada (#14)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On Mon, Nov 29, 2021 at 7:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thanks! I'll change my parallel vacuum refactoring patch accordingly.

Thanks again for working on that.

Regarding the commit, I think that there still is one place in
lazyvacuum.c where we can change "dead tuples” to "dead items”:

/*
* Allocate the space for dead tuples. Note that this handles parallel
* VACUUM initialization as part of allocating shared memory space used
* for dead_items.
*/
dead_items_alloc(vacrel, params->nworkers);
dead_items = vacrel->dead_items;

Oops. Pushed a fixup for that just now.

Also, the commit doesn't change both PROGRESS_VACUUM_MAX_DEAD_TUPLES
and PROGRESS_VACUUM_NUM_DEAD_TUPLES. Did you leave them on purpose?

That was deliberate.

It would be a bit strange to alter these constants without also
updating the corresponding column names for the
pg_stat_progress_vacuum system view. But if I kept the definition from
system_views.sql in sync, then I would break user scripts -- for
reasons that users don't care about. That didn't seem like the right
approach.

Also, the system as a whole still assumes "DEAD tuples and LP_DEAD
items are the same, and are just as much of a problem in the table as
they are in each index". As you know, this is not really true, which
is an important problem for us. Fixing it (perhaps as part of adding
something like Robert's conveyor belt design) will likely require
revising this model quite fundamentally (e.g, the vacthresh
calculation in autovacuum.c:relation_needs_vacanalyze() would be
replaced). When this happens, we'll probably need to update system
views that have columns with names like "dead_tuples" -- because maybe
we no longer specifically count dead items/tuples at all. I strongly
suspect that the approach to statistics that we take for pg_statistic
optimizer stats just doesn't work for dead items/tuples -- statistical
sampling only produces useful statistics for the optimizer because
certain delicate assumptions are met (even these assumptions only
really work with a properly normalized database schema).

Maybe revising the model used for autovacuum scheduling wouldn't
include changing pg_stat_progress_vacuum, since that isn't technically
"part of the model" --- I'm not sure. But it's not something that I am
in a hurry to fix.

--
Peter Geoghegan

#16

Masahiko Sawada

sawada.mshk@gmail.com

about 4 years ago

In reply to: Peter Geoghegan (#15)

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On Wed, Dec 1, 2021 at 4:42 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Nov 29, 2021 at 7:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thanks! I'll change my parallel vacuum refactoring patch accordingly.

Thanks again for working on that.

Regarding the commit, I think that there still is one place in
lazyvacuum.c where we can change "dead tuples” to "dead items”:

/*
* Allocate the space for dead tuples. Note that this handles parallel
* VACUUM initialization as part of allocating shared memory space used
* for dead_items.
*/
dead_items_alloc(vacrel, params->nworkers);
dead_items = vacrel->dead_items;

Oops. Pushed a fixup for that just now.

Thanks!

Also, the commit doesn't change both PROGRESS_VACUUM_MAX_DEAD_TUPLES
and PROGRESS_VACUUM_NUM_DEAD_TUPLES. Did you leave them on purpose?

That was deliberate.

It would be a bit strange to alter these constants without also
updating the corresponding column names for the
pg_stat_progress_vacuum system view. But if I kept the definition from
system_views.sql in sync, then I would break user scripts -- for
reasons that users don't care about. That didn't seem like the right
approach.

Agreed.

Also, the system as a whole still assumes "DEAD tuples and LP_DEAD
items are the same, and are just as much of a problem in the table as
they are in each index". As you know, this is not really true, which
is an important problem for us. Fixing it (perhaps as part of adding
something like Robert's conveyor belt design) will likely require
revising this model quite fundamentally (e.g, the vacthresh
calculation in autovacuum.c:relation_needs_vacanalyze() would be
replaced). When this happens, we'll probably need to update system
views that have columns with names like "dead_tuples" -- because maybe
we no longer specifically count dead items/tuples at all. I strongly
suspect that the approach to statistics that we take for pg_statistic
optimizer stats just doesn't work for dead items/tuples -- statistical
sampling only produces useful statistics for the optimizer because
certain delicate assumptions are met (even these assumptions only
really work with a properly normalized database schema).

Maybe revising the model used for autovacuum scheduling wouldn't
include changing pg_stat_progress_vacuum, since that isn't technically
"part of the model" --- I'm not sure. But it's not something that I am
in a hurry to fix.

Understood.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/