From 03e98bcbd1e30815f2219f0d21da0da1758fe0a0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 20 Jun 2024 17:21:11 -0400
Subject: [PATCH v1 2/2] Ensure vacuum removes all dead tuples older than
 OldestXmin

If vacuum fails to remove a tuple with xmax older than
VacuumCutoffs->OldestXmin and younger than GlobalVisState->maybe_needed,
it will ERROR out when determining whether or not to freeze the tuple
with "cannot freeze committed xmax".

Fix this by having vacuum alway remove tuples older than OldestXmin.

Further explanation:

Pruning uses the GlobalVisState (via GlobalVisTestIsRemovableXid()) to
determine which dead tuples are removable. Pruning initiated by vacuum
also considers whether or not tuples should be frozen. Tuples which are
HEAPTUPLE_RECENTLY_DEAD may be frozen if their xmax is older than
VacuumCutoffs->OldestXmin -- which is determined at the beginning of
vacuuming the relation.

For vacuum, the GlobalVisState is updated at the beginning of vacuuming
the relation -- at the same time and with the same value as
VacuumCutoffs->OldestXmin.

A backend's GlobalVisState may be updated when it is accessed if a new
snapshot is taken or if something caused ComputeXidHorizons() to be
called.

This can happen during vacuum at the end of a round of index vacuuming
when GetOldestNonRemovableTransactionId() is called.

If a disconnected standby with a running transaction older than
VacuumCutoffs->OldestXmin reconnects to the primary after vacuum
initially calculates GlobalVisState and OldestXmin but before
GlobalVisState is updated, the value of GlobalVisState->maybe_needed
could go backwards.

If this happens in the middle of vacuum's first pruning and freezing
pass, it is possible that pruning/freezing could then encounter a tuple
whose xmax is younger than GlobalVisState->maybe_needed and older than
VacuumCutoffs->OldestXmin. heap_prune_satisfies_vacuum() would deem the
tuple HEAPTUPLE_RECENTLY_DEAD and would not remove it. But the
heap_pre_freeze_checks() would ERROR out with "cannot freeze committed
xmax". This check is to avoid ever freezing dead tuples.

In back branches starting with 14, failing to remove tuples older than
OldestXmin during pruning caused vacuum to infinitely loop in
lazy_scan_prune(), as investigated on this [1] thread. After 1ccc1e05ae
removed the retry loop in lazy_scan_prune() and stopped comparing tuples
to OldestXmin, the hang could no longer happen, but we could still
attempt to freeze dead tuples with xmax older than OldestXmin --
resulting in an ERROR.

Fix this by always removing dead tuples with xmax older than
VacuumCutoffs->OldestXmin. This is okay because the standby won't replay
the tuple removal until that tuple is removable on the standby. Thus,
the worst that can happen is a recovery conflict.

[1] https://postgr.es/m/20240415173913.4zyyrwaftujxthf2%40awork3.anarazel.de#1b216b7768b5bd577a3d3d51bd5aadee
---
 src/backend/access/heap/pruneheap.c  | 22 +++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 14 +++++++-------
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3cdfc5b7f1b..72e5fe6dad8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -325,6 +325,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  *
  * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
  * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
+ * cutoffs->OldestXmin is also used to determine if dead tuples are
+ * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -922,8 +924,26 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
 	if (res != HEAPTUPLE_RECENTLY_DEAD)
 		return res;
 
+	/*
+	 * For VACUUM, we must be sure to prune tuples whose xmax is older than
+	 * OldestXmin -- a visibility cutoff determined at the beginning of
+	 * vacuuming the relation. OldestXmin is used to decide whether or not to
+	 * freeze a tuple, and we should not freeze dead tuples.
+	 */
+	if (prstate->cutoffs &&
+		TransactionIdIsValid(prstate->cutoffs->OldestXmin) &&
+		NormalTransactionIdPrecedes(dead_after, prstate->cutoffs->OldestXmin))
+		return HEAPTUPLE_DEAD;
+
+	/*
+	 * HOT pruning uses the GlobalVisState to determine if the tuple is dead.
+	 * And for vacuum, even if the tuple's xmax is not older than OldestXmin,
+	 * GlobalVisTestIsRemovableXid() could find the row dead if the
+	 * GlobalVisState has been updated since the beginning of vacuuming the
+	 * relation.
+	 */
 	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
-		res = HEAPTUPLE_DEAD;
+		return HEAPTUPLE_DEAD;
 
 	return res;
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3f88cf1e8ef..abb47ae5960 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -438,13 +438,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	 * as an upper bound on the XIDs stored in the pages we'll actually scan
 	 * (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
 	 *
-	 * Next acquire vistest, a related cutoff that's used in pruning.  We
-	 * expect vistest will always make heap_page_prune_and_freeze() remove any
-	 * deleted tuple whose xmax is < OldestXmin.  lazy_scan_prune must never
-	 * become confused about whether a tuple should be frozen or removed.  (In
-	 * the future we might want to teach lazy_scan_prune to recompute vistest
-	 * from time to time, to increase the number of dead tuples it can prune
-	 * away.)
+	 * Next acquire vistest, a related cutoff that's used in pruning.  We use
+	 * vistest in combination with OldestXmin to ensure that
+	 * heap_page_prune_and_freeze() always removes any deleted tuple whose
+	 * xmax is < OldestXmin.  lazy_scan_prune must never become confused about
+	 * whether a tuple should be frozen or removed.  (In the future we might
+	 * want to teach lazy_scan_prune to recompute vistest from time to time,
+	 * to increase the number of dead tuples it can prune away.)
 	 */
 	vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
 	vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
-- 
2.34.1

