From a560d52a6cc212f7de30b085e777b955078f4aa2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 11 Dec 2024 14:13:34 -0500
Subject: [PATCH v7 1/2] Add more general summary to vacuumlazy.c

Add more comments at the top of vacuumlazy.c on heap relation vacuuming
implementation.

Previously vacuumlazy.c only had details related to the dead TID storage
added in Postgres 17. This commit adds a more general summary to help
future developers understand the heap relation vacuum design and
implementation at a high level.

Reviewed-by: Alena Rybakina, Robert Haas, Bilal Yavuz
Discussion: https://postgr.es/m/flat/CAAKRu_ZF_KCzZuOrPrOqjGVe8iRVWEAJSpzMgRQs%3D5-v84cXUg%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 54 ++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 09fab08b8e1..5028759024b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3,6 +3,60 @@
  * vacuumlazy.c
  *	  Concurrent ("lazy") vacuuming.
  *
+ * Heap relations are vacuumed in three main phases. In phase I, vacuum scans
+ * relation pages, pruning and freezing tuples and saving dead tuples' TIDs in
+ * a TID store. If that TID store fills up or vacuum finishes scanning the
+ * relation, it progresses to phase II: index vacuuming. Index vacuuming
+ * deletes the dead index entries referenced in the TID store. In phase III,
+ * vacuum scans the blocks of the relation indicated by the TIDs in the TID
+ * store and reaps the dead tuples, freeing that space for future tuples.
+ *
+ * If there are no indexes or index scanning is disabled, phase II may be
+ * skipped. If phase I identified very few dead index entries or if vacuum's
+ * failsafe mechanism has triggered (to avoid transaction ID wraparound),
+ * vacuum may skip phases II and III.
+ *
+ * If the TID store fills up in phase I, vacuum suspends phase I, proceeds to
+ * phases II and II and cleans up the dead tuples referenced in the current
+ * TID store. This empties the TID store and allows vacuum to resume phase I.
+ *
+ * In a way, the phases are more like states in a state machine, but they have
+ * been referred to colloquially as phases for so long that they are referred
+ * to as such here.
+ *
+ * Manually invoked VACUUMs may scan indexes during phase II in parallel. For
+ * more information on this, see the comment at the top of vacuumparallel.c.
+ *
+ * In between phases, vacuum updates the freespace map (every
+ * VACUUM_FSM_EVERY_PAGES).
+ *
+ * Finally, vacuum may truncate the relation if it has emptied pages at the
+ * end. After finishing all phases of work, vacuum updates relation statistics
+ * in pg_class and the cumulative statistics subsystem.
+ *
+ * Relation Scanning:
+ *
+ * Vacuum scans the heap relation, starting at the beginning and progressing
+ * to the end, skipping pages as permitted by their visibility status, vacuum
+ * options, and the eagerness level of the vacuum.
+ *
+ * When page skipping is enabled, non-aggressive vacuums may skip scanning
+ * pages that are marked all-visible in the visibility map. It may choose not
+ * to skip pages if the range of skippable pages is below
+ * SKIP_PAGES_THRESHOLD.
+ *
+ * Once vacuum has decided to scan a given block, it must read the block and
+ * obtain a cleanup lock to prune tuples on the page. A non-aggressive vacuum
+ * may choose to skip pruning and freezing if it cannot acquire a cleanup lock
+ * on the buffer right away. In this case, it may miss cleaning up dead tuples
+ * (and their associated index entries); though it is free to reap any
+ * existing dead items on the page.
+ *
+ * After pruning and freezing, pages that are newly all-visible and all-frozen
+ * are marked as such in the visibility map.
+ *
+ * Dead TID Storage:
+ *
  * The major space usage for vacuuming is storage for the dead tuple IDs that
  * are to be removed from indexes.  We want to ensure we can vacuum even the
  * very largest relations with finite memory space usage.  To do that, we set
-- 
2.34.1

