From b544d0de0f98f8ba1ea36c4e50032016d6561844 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 11 Dec 2024 14:13:34 -0500
Subject: [PATCH v5 2/3] Add more general summary to vacuumlazy.c

Add more comments at the top of vacuumlazy.c on heap relation vacuuming
implementation.

Previously vacuumlazy.c only had details related to the dead TID storage
added in Postgres 17. This commit adds a more general summary to help
future developers understand the heap relation vacuum design and
implementation at a high level.

Reviewed-by: Robert Haas, Bilal Yavuz
Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 42 ++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 09fab08b8e1..c047fb20f7a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3,6 +3,48 @@
  * vacuumlazy.c
  *	  Concurrent ("lazy") vacuuming.
  *
+ * Heap relations are vacuumed in three main phases. In phase I, vacuum scans
+ * relation pages, pruning and freezing tuples and saving dead tuples' TIDs in
+ * a TID store. If that TID store fills up or vacuum finishes scanning the
+ * relation, it progresses to phase II: index vacuuming. Index vacuuming
+ * deletes the dead index entries referenced in the TID store. In phase III,
+ * vacuum scans the blocks of the relation indicated by the TIDs in the TID
+ * store and reaps the dead tuples, freeing that space for future tuples.
+ *
+ * If there are no indexes or index scanning is disabled, phase II may be
+ * skipped. If phase I identified very few dead index entries, vacuum may skip
+ * phases II and III. If the TID store fills up in phase I, vacuum suspends
+ * phase I, proceeds to phases II and II and cleans up the dead tuples
+ * referenced in the current TID store. This empties the TID store and allows
+ * vacuum to resume phase I. In this sense, the phases are more like states in
+ * a state machine, but they have been referred to colloquially as phases for
+ * long enough that it makes sense to refer to them in that way here.
+ *
+ * Finally, vacuum may truncate the relation if it has emptied pages at the
+ * end. After finishing all phases of work, vacuum updates relation statistics
+ * in pg_class and the cumulative statistics subsystem.
+ *
+ * Relation Scanning:
+ *
+ * Vacuum scans the heap relation, starting at the beginning and progressing
+ * to the end, skipping pages as permitted by their visibility status, vacuum
+ * options, and the eagerness level of the vacuum.
+ *
+ * When page skipping is enabled, non-aggressive vacuums may skip scanning
+ * pages that are marked all-visible in the visibility map. We may choose not
+ * to skip pages if the range of skippable pages is below
+ * SKIP_PAGES_THRESHOLD.
+ *
+ * Once vacuum has decided to scan a given block, it must read in the block
+ * and obtain a cleanup lock to prune tuples on the page. A non-aggressive
+ * vacuum may choose to skip pruning and freezing if it cannot acquire a
+ * cleanup lock on the buffer right away.
+ *
+ * After pruning and freezing, pages that are newly all-visible and all-frozen
+ * are marked as such in the visibility map.
+ *
+ * Dead TID Storage:
+ *
  * The major space usage for vacuuming is storage for the dead tuple IDs that
  * are to be removed from indexes.  We want to ensure we can vacuum even the
  * very largest relations with finite memory space usage.  To do that, we set
-- 
2.34.1

