Combine Prune and Freeze records emitted by vacuum
Hi,
Attached is a patch set which combines the freeze and prune records
for vacuum -- eliminating the overhead of a separate freeze WAL record
for every block frozen by vacuum. The contents of the freeze record
are added to the PRUNE record.
In cases where vacuum does freeze and prune, combining the WAL records
can reduce WAL bytes substantially and, as a consequence, reduce WAL
sync and write time.
For example:
psql -c "CREATE TABLE foo(id INT, a INT, b INT, c INT, d INT, e INT, f
INT, g INT, h TEXT) WITH (autovacuum_enabled=false);"
for x in $(seq 1 16);
do
psql -c "INSERT INTO foo SELECT i, i, i, i, i, i, i, i, repeat('b',
30) FROM generate_series(1,2000000)i;"
done
psql -c "UPDATE foo SET a = 2 WHERE id % 7 = 0;"
psql -c "VACUUM (FREEZE) foo;"
Generates 30% fewer WAL records and 12% fewer WAL bytes -- which,
depending on what else is happening on the system, can lead to vacuum
spending substantially less time on WAL writing and syncing (often 15%
less time on WAL writes and 10% less time on syncing WAL in my
testing).
Though heap_page_prune() is also used by on-access pruning, on-access
pruning does not pass in the parameter used for freezing, so it should
incur limited additional overhead. The primary additional overhead
would be checking tuples' xmins against the GlobalVisState to
determine if the page would be all_visible and identify the visibility
cutoff xid. This is required to determine whether or not to
opportunistically freeze. We could condition this on the caller being
vacuum if needed.
Though, in the future, we may want to consider opportunistic/eager
freezing on access. This could allow us to, for example, freeze
bulk-loaded read-only data before it goes cold and avoid expensive
wraparound vacuuming.
There are other steps that we can take to decrease vacuum WAL volume
even further. Many of those are natural follow-ons to combining the
prune and freeze record. For example, I intend to propose combining
the visibility map update record into the Prune/Freeze and Vacuum
records -- eliminating an extra visibility map update record. This
would mean a single WAL record emitted per block for vacuum's first
pass.
On master, for my example above, of the roughly 1 million WAL records
emitted by vacuum, about 1/3 of them are prune records, 1/3 are freeze
records, and 1/3 are visibility map update records. So we will achieve
another substantial reduction in the number of WAL records and bytes
of WAL record overhead by eliminating a separate record for updating
the VM.
The attached patch set is broken up into many separate commits for
ease of review. Each patch does a single thing which can be explained
plainly in the commit message. Every commit passes tests and works on
its own.
0001 - 0003 cover checking tuples' xmins against the GlobalVisState in
heap_page_prune().
0004 - 0007 executes freezing in heap_page_prune() (for vacuum only).
0008 translates the eager/opportunistic freeze heuristic into one that
will work without relying on having a separate prune record. Elsewhere
in [1]/messages/by-id/CAAKRu_ZTDm1d9M+ENf6oXhW9nRT3J76vOL1ianiCW4+4M6hMoA@mail.gmail.com we are discussing how to improve this heuristic.
0009 - 0012 merges the freeze record into the prune record.
0013 - 0015 removes the loop through the page in lazy_scan_prune() by
doing the accounting it did in heap_page_prune(). A nice bonus of this
patch set is that we can eliminate one of vacuum's loops through the
page.
- Melanie
[1]: /messages/by-id/CAAKRu_ZTDm1d9M+ENf6oXhW9nRT3J76vOL1ianiCW4+4M6hMoA@mail.gmail.com
Attachments:
v1-0004-Add-reference-to-VacuumCutoffs-in-HeapPageFreeze.patchtext/x-patch; charset=US-ASCII; name=v1-0004-Add-reference-to-VacuumCutoffs-in-HeapPageFreeze.patchDownload
From 7e9ca57da9c80918c3b4c391874869cca8939456 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 16:22:17 -0500
Subject: [PATCH v1 04/15] Add reference to VacuumCutoffs in HeapPageFreeze
Future commits will move opportunistic freezing into the main path of
pruning in heap_page_prune(). Because on-access pruning will not do
opportunistic freezing, it is cleaner to keep the visibility information
required for calling heap_prepare_freeze_tuple() inside of the
HeapPageFreeze structure itself by saving a reference to VacuumCutoffs.
---
src/backend/access/heap/heapam.c | 32 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 3 ++-
src/include/access/heapam.h | 2 +-
3 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 707460a5364..76eb67f746a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6377,7 +6377,6 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
*/
bool
heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen)
{
@@ -6405,14 +6404,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
xmin_already_frozen = true;
else
{
- if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found xmin %u from before relfrozenxid %u",
- xid, cutoffs->relfrozenxid)));
+ xid, pagefrz->cutoffs->relfrozenxid)));
/* Will set freeze_xmin flags in freeze plan below */
- freeze_xmin = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
+ freeze_xmin = TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin);
/* Verify that xmin committed if and when freeze plan is executed */
if (freeze_xmin)
@@ -6426,8 +6425,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
xid = HeapTupleHeaderGetXvac(tuple);
if (TransactionIdIsNormal(xid))
{
- Assert(TransactionIdPrecedesOrEquals(cutoffs->relfrozenxid, xid));
- Assert(TransactionIdPrecedes(xid, cutoffs->OldestXmin));
+ Assert(TransactionIdPrecedesOrEquals(pagefrz->cutoffs->relfrozenxid, xid));
+ Assert(TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin));
/*
* For Xvac, we always freeze proactively. This allows totally_frozen
@@ -6452,7 +6451,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* perform no-op xmax processing. The only constraint is that the
* FreezeLimit/MultiXactCutoff postcondition must never be violated.
*/
- newxmax = FreezeMultiXactId(xid, tuple->t_infomask, cutoffs,
+ newxmax = FreezeMultiXactId(xid, tuple->t_infomask, pagefrz->cutoffs,
&flags, pagefrz);
if (flags & FRM_NOOP)
@@ -6476,7 +6475,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* (This repeats work from FreezeMultiXactId, but allows "no
* freeze" tracker maintenance to happen in only one place.)
*/
- Assert(!MultiXactIdPrecedes(newxmax, cutoffs->MultiXactCutoff));
+ Assert(!MultiXactIdPrecedes(newxmax, pagefrz->cutoffs->MultiXactCutoff));
Assert(MultiXactIdIsValid(newxmax) && xid == newxmax);
}
else if (flags & FRM_RETURN_IS_XID)
@@ -6485,7 +6484,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* xmax will become an updater Xid (original MultiXact's updater
* member Xid will be carried forward as a simple Xid in Xmax).
*/
- Assert(!TransactionIdPrecedes(newxmax, cutoffs->OldestXmin));
+ Assert(!TransactionIdPrecedes(newxmax, pagefrz->cutoffs->OldestXmin));
/*
* NB -- some of these transformations are only valid because we
@@ -6509,7 +6508,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* xmax is an old MultiXactId that we have to replace with a new
* MultiXactId, to carry forward two or more original member XIDs.
*/
- Assert(!MultiXactIdPrecedes(newxmax, cutoffs->OldestMxact));
+ Assert(!MultiXactIdPrecedes(newxmax, pagefrz->cutoffs->OldestMxact));
/*
* We can't use GetMultiXactIdHintBits directly on the new multi
@@ -6544,14 +6543,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
else if (TransactionIdIsNormal(xid))
{
/* Raw xmax is normal XID */
- if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found xmax %u from before relfrozenxid %u",
- xid, cutoffs->relfrozenxid)));
+ xid, pagefrz->cutoffs->relfrozenxid)));
/* Will set freeze_xmax flags in freeze plan below */
- freeze_xmax = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
+ freeze_xmax = TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin);
/*
* Verify that xmax aborted if and when freeze plan is executed,
@@ -6631,7 +6630,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* Does this tuple force caller to freeze the entire page?
*/
pagefrz->freeze_required =
- heap_tuple_should_freeze(tuple, cutoffs,
+ heap_tuple_should_freeze(tuple, pagefrz->cutoffs,
&pagefrz->NoFreezePageRelfrozenXid,
&pagefrz->NoFreezePageRelminMxid);
}
@@ -6953,8 +6952,9 @@ heap_freeze_tuple(HeapTupleHeader tuple,
pagefrz.NoFreezePageRelfrozenXid = FreezeLimit;
pagefrz.NoFreezePageRelminMxid = MultiXactCutoff;
- do_freeze = heap_prepare_freeze_tuple(tuple, &cutoffs,
- &pagefrz, &frz, &totally_frozen);
+ pagefrz.cutoffs = &cutoffs;
+
+ do_freeze = heap_prepare_freeze_tuple(tuple, &pagefrz, &frz, &totally_frozen);
/*
* Note that because this is not a WAL-logged operation, we don't need to
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 94c4a4cf1da..8651040f8de 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1413,6 +1413,7 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ pagefrz.cutoffs = &vacrel->cutoffs;
tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
@@ -1558,7 +1559,7 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
+ if (heap_prepare_freeze_tuple(htup, &pagefrz,
&frozen[tuples_frozen], &totally_frozen))
{
/* Save prepared freeze plan for later */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4cfaf9ea46c..6823ab8b658 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -189,6 +189,7 @@ typedef struct HeapPageFreeze
TransactionId NoFreezePageRelfrozenXid;
MultiXactId NoFreezePageRelminMxid;
+ struct VacuumCutoffs *cutoffs;
} HeapPageFreeze;
/*
@@ -295,7 +296,6 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
--
2.37.2
v1-0001-lazy_scan_prune-tests-tuple-vis-with-GlobalVisTes.patchtext/x-patch; charset=US-ASCII; name=v1-0001-lazy_scan_prune-tests-tuple-vis-with-GlobalVisTes.patchDownload
From 91dc3d05c56b7587f94ac0220510d565d61557e0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:14:47 -0500
Subject: [PATCH v1 01/15] lazy_scan_prune tests tuple vis with GlobalVisTest
One requirement for eventually combining the prune and freeze records,
is that we must check during pruning if live tuples on the page are
visible to everyone and thus, whether or not the page is all visible. We
only consider opportunistically freezing tuples if the whole page is all
visible and could be set all frozen.
During pruning (in heap_page_prune()), we do not have access to
VacuumCutoffs -- as on access pruning also calls heap_page_prune(). We
do, however, have access to a GlobalVisState. This can be used to
determine whether or not the tuple is visible to everyone. It also has
the potential of being more up-to-date than VacuumCutoffs->OldestXmin.
This commit simply modifies lazy_scan_prune() to use GlobalVisState
instead of OldestXmin. Future commits will move the
all_visible/all_frozen calculation into heap_page_prune().
---
src/backend/access/heap/vacuumlazy.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0fb3953513d..7428c83bdfd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1553,8 +1553,7 @@ lazy_scan_prune(LVRelState *vacrel,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(htup);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ if (!GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
{
all_visible = false;
break;
--
2.37.2
v1-0002-Pass-heap_prune_chain-PruneResult-output-paramete.patchtext/x-patch; charset=US-ASCII; name=v1-0002-Pass-heap_prune_chain-PruneResult-output-paramete.patchDownload
From 61fe6147df9127022b8a1030ab6717b1d5d17f05 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:39:59 -0500
Subject: [PATCH v1 02/15] Pass heap_prune_chain() PruneResult output parameter
Future commits will set other members of PruneResult in
heap_prune_chain(), so start passing it as an output parameter now. This
eliminates the output parameter htsv -- the array of HTSV_Results --
since that is a member of the PruneResult.
---
src/backend/access/heap/pruneheap.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 59176335676..3e968f9d9b7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -63,8 +63,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- int8 *htsv,
- PruneState *prstate);
+ PruneState *prstate, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -327,7 +326,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Process this item or chain of items */
presult->ndeleted += heap_prune_chain(buffer, offnum,
- presult->htsv, &prstate);
+ &prstate, presult);
}
/* Clear the offset information once we have processed the given page. */
@@ -456,7 +455,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in htsv.
+ * Tuple visibility information is provided in presult->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -486,7 +485,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- int8 *htsv, PruneState *prstate)
+ PruneState *prstate, PruneResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -507,7 +506,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(htsv[rootoffnum] != -1);
+ Assert(presult->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -530,7 +529,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -627,7 +626,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(htsv[offnum]))
+ switch (htsv_get_valid_status(presult->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
--
2.37.2
v1-0003-heap_page_prune-sets-all_visible-and-frz_conflict.patchtext/x-patch; charset=US-ASCII; name=v1-0003-heap_page_prune-sets-all_visible-and-frz_conflict.patchDownload
From edf6b8938ce550a34bb63a697f72f31cddd7d0cb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 14:01:37 -0500
Subject: [PATCH v1 03/15] heap_page_prune sets all_visible and
frz_conflict_horizon
In order to combine the prune and freeze records, we must know if the
page is eligible to be opportunistically frozen before finishing
pruning. Save all_visible in the PruneResult and set it to false when we
see non-removable tuples which are not visible to everyone.
We will also need to ensure that the snapshotConflictHorizon for the combined
prune + freeze record is the more conservative of that calculated for each of
pruning and freezing. Calculate the visibility_cutoff_xid for the purposes of
freezing -- the newest xmin on the page -- in heap_page_prune() and save it in
PruneResult.frz_conflict_horizon.
---
src/backend/access/heap/pruneheap.c | 122 +++++++++++++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 116 +++++++------------------
src/include/access/heapam.h | 3 +
3 files changed, 146 insertions(+), 95 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3e968f9d9b7..5b2a27d5366 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -67,8 +67,10 @@ static int heap_prune_chain(Buffer buffer,
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
-static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -251,6 +253,14 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ /*
+ * Keep track of whether or not the page is all_visible in case the caller
+ * wants to use this information to update the VM.
+ */
+ presult->all_visible = true;
+ /* for recovery conflicts */
+ presult->frz_conflict_horizon = InvalidTransactionId;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(prstate.rel);
@@ -302,8 +312,92 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
+ switch (presult->htsv[offnum])
+ {
+ case HEAPTUPLE_DEAD:
+
+ /*
+ * Deliberately delay unsetting all_visible until later during
+ * pruning. Removable dead tuples shouldn't preclude freezing
+ * the page. After finishing this first pass of tuple
+ * visibility checks, initialize all_visible_except_removable
+ * with the current value of all_visible to indicate whether
+ * or not the page is all visible except for dead tuples. This
+ * will allow us to attempt to freeze the page after pruning.
+ * Later during pruning, if we encounter an LP_DEAD item or
+ * are setting an item LP_DEAD, we will unset all_visible. As
+ * long as we unset it before updating the visibility map,
+ * this will be correct.
+ */
+ break;
+ case HEAPTUPLE_LIVE:
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed
+ * asynchronously. See SetHintBits for more info. Check that
+ * the tuple is hinted xmin-committed because of that.
+ */
+ if (presult->all_visible)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed?
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+ if (!GlobalVisTestIsRemovableXid(vistest, xmin))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, presult->frz_conflict_horizon) &&
+ TransactionIdIsNormal(xmin))
+ presult->frz_conflict_horizon = xmin;
+ }
+ break;
+ case HEAPTUPLE_RECENTLY_DEAD:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+ /* This is an expected case during concurrent vacuum */
+ presult->all_visible = false;
+ break;
+ default:
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+ break;
+ }
}
+ /*
+ * For vacuum, if the whole page will become frozen, we consider
+ * opportunistically freezing tuples. Dead tuples which will be removed by
+ * the end of vacuuming should not preclude us from opportunistically
+ * freezing. We will not be able to freeze the whole page if there are
+ * tuples present which are not visible to everyone or if there are dead
+ * tuples which are not yet removable. We need all_visible to be false if
+ * LP_DEAD tuples remain after pruning so that we do not incorrectly
+ * update the visibility map or page hint bit. So, we will update
+ * presult->all_visible to reflect the presence of LP_DEAD items while
+ * pruning and keep all_visible_except_removable to permit freezing if the
+ * whole page will eventually become all visible after removing tuples.
+ */
+ presult->all_visible_except_removable = presult->all_visible;
+
/* Scan the page */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -598,10 +692,14 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
/*
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now. We don't increment ndeleted here since
- * the LP was already marked dead.
+ * the LP was already marked dead. If it will not be marked
+ * LP_UNUSED, it will remain LP_DEAD, making the page not
+ * all_visible.
*/
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
+ else
+ presult->all_visible = false;
break;
}
@@ -738,7 +836,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
}
@@ -751,7 +849,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect item. We can clean up by setting the redirect item to
* DEAD state or LP_UNUSED if the caller indicated.
*/
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
return ndeleted;
@@ -788,13 +886,20 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
-heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+
+ /*
+ * Setting the line pointer LP_DEAD means the page will definitely not be
+ * all_visible.
+ */
+ presult->all_visible = false;
}
/*
@@ -804,7 +909,8 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
* pointers LP_DEAD if mark_unused_now is true.
*/
static void
-heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
@@ -815,7 +921,7 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
- heap_prune_record_dead(prstate, offnum);
+ heap_prune_record_dead(prstate, offnum, presult);
}
/* Record line pointer to be marked unused */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7428c83bdfd..94c4a4cf1da 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1393,9 +1393,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_visible,
- all_frozen;
- TransactionId visibility_cutoff_xid;
+ bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1436,17 +1434,16 @@ lazy_scan_prune(LVRelState *vacrel,
&presult, &vacrel->offnum);
/*
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Keep track of whether or not the page is all_visible and
- * all_frozen and use this information to update the VM. all_visible
- * implies 0 lpdead_items, but don't trust all_frozen result unless
- * all_visible is also set to true.
+ * Now scan the page to collect LP_DEAD items and check for tuples
+ * requiring freezing among remaining tuples with storage. We will update
+ * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
+ * have determined whether or not the page is all_visible. Keep track of
+ * whether or not the page is all_frozen and use this information to
+ * update the VM. all_visible implies lpdead_items == 0, but don't trust
+ * all_frozen result unless all_visible is also set to true.
*
- * Also keep track of the visibility cutoff xid for recovery conflicts.
*/
- all_visible = true;
all_frozen = true;
- visibility_cutoff_xid = InvalidTransactionId;
/*
* Now scan the page to collect LP_DEAD items and update the variables set
@@ -1487,11 +1484,6 @@ lazy_scan_prune(LVRelState *vacrel,
* will only happen every other VACUUM, at most. Besides, VACUUM
* must treat hastup/nonempty_pages as provisional no matter how
* LP_DEAD items are handled (handled here, or handled later on).
- *
- * Also deliberately delay unsetting all_visible until just before
- * we return to lazy_scan_heap caller, as explained in full below.
- * (This is another case where it's useful to anticipate that any
- * LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
*/
deadoffsets[lpdead_items++] = offnum;
continue;
@@ -1529,41 +1521,6 @@ lazy_scan_prune(LVRelState *vacrel,
* what acquire_sample_rows() does.
*/
live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (!GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
- {
- all_visible = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
break;
case HEAPTUPLE_RECENTLY_DEAD:
@@ -1573,7 +1530,6 @@ lazy_scan_prune(LVRelState *vacrel,
* pruning.)
*/
recently_dead_tuples++;
- all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1584,16 +1540,13 @@ lazy_scan_prune(LVRelState *vacrel,
* results. This assumption is a bit shaky, but it is what
* acquire_sample_rows() does, so be consistent.
*/
- all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
- all_visible = false;
/*
- * Count such rows as live. As above, we assume the deleting
- * transaction will commit and update the counters after we
- * report.
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
*/
live_tuples++;
break;
@@ -1636,7 +1589,7 @@ lazy_scan_prune(LVRelState *vacrel,
* page all-frozen afterwards (might not happen until final heap pass).
*/
if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (all_visible && all_frozen &&
+ (presult.all_visible_except_removable && all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1669,16 +1622,16 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->frozen_pages++;
/*
- * We can use visibility_cutoff_xid as our cutoff for conflicts
+ * We can use frz_conflict_horizon as our cutoff for conflicts
* when the whole page is eligible to become all-frozen in the VM
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (all_visible && all_frozen)
+ if (presult.all_visible_except_removable && all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = visibility_cutoff_xid;
- visibility_cutoff_xid = InvalidTransactionId;
+ snapshotConflictHorizon = presult.frz_conflict_horizon;
+ presult.frz_conflict_horizon = InvalidTransactionId;
}
else
{
@@ -1714,17 +1667,19 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && lpdead_items == 0)
+ if (presult.all_visible)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
+ Assert(lpdead_items == 0);
+
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
Assert(false);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == visibility_cutoff_xid);
+ debug_cutoff == presult.frz_conflict_horizon);
}
#endif
@@ -1749,19 +1704,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(dead_items->num_items <= dead_items->max_items);
pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
dead_items->num_items);
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on
- * to make the choice of whether or not to freeze the page unaffected
- * by the short-term presence of LP_DEAD items. These LP_DEAD items
- * were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible. It needs
- * to reflect the present state of things, as expected by our caller.
- */
- all_visible = false;
}
/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1778,20 +1720,20 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (lpdead_items > 0);
- Assert(!all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_visible || !(*has_lpdead_items));
/*
* Handle setting visibility map bit based on information from the VM (as
* of last lazy_scan_skip() call), and from all_visible and all_frozen
* variables
*/
- if (!all_visible_according_to_vm && all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (all_frozen)
{
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1811,7 +1753,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
+ vmbuffer, presult.frz_conflict_horizon,
flags);
}
@@ -1859,7 +1801,7 @@ lazy_scan_prune(LVRelState *vacrel,
* it as all-frozen. Note that all_frozen is only valid if all_visible is
* true, so we must check both all_visible and all_frozen.
*/
- else if (all_visible_according_to_vm && all_visible &&
+ else if (all_visible_according_to_vm && presult.all_visible &&
all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
@@ -1876,11 +1818,11 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Set the page all-frozen (and all-visible) in the VM.
*
- * We can pass InvalidTransactionId as our visibility_cutoff_xid,
- * since a snapshotConflictHorizon sufficient to make everything safe
- * for REDO was logged when the page's tuples were frozen.
+ * We can pass InvalidTransactionId as our frz_conflict_horizon, since
+ * a snapshotConflictHorizon sufficient to make everything safe for
+ * REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f68593..4cfaf9ea46c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,6 +198,8 @@ typedef struct PruneResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ bool all_visible; /* Whether or not the page is all visible */
+ TransactionId frz_conflict_horizon; /* Newest xmin on the page */
/*
* Tuple visibility is only computed once for each tuple, for correctness
@@ -209,6 +211,7 @@ typedef struct PruneResult
* 1. Otherwise every access would need to subtract 1.
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+ bool all_visible_except_removable;
} PruneResult;
/*
--
2.37.2
v1-0005-Prepare-freeze-tuples-in-heap_page_prune.patchtext/x-patch; charset=US-ASCII; name=v1-0005-Prepare-freeze-tuples-in-heap_page_prune.patchDownload
From d12d129702ec254f4c794cc1c60e466cc3eecf18 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 11:18:52 -0500
Subject: [PATCH v1 05/15] Prepare freeze tuples in heap_page_prune()
In order to combine the freeze and prune records, we must determine
which tuples are freezable before actually executing pruning. All of the
page modifications should be made in the same critical section along
with emitting the combined WAL. Determine whether or not tuples should
or must be frozen and whether or not the page will be all frozen as a
consequence during pruning.
---
src/backend/access/heap/pruneheap.c | 78 ++++++++++++++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 68 ++++++------------------
src/include/access/heapam.h | 13 +++++
3 files changed, 102 insertions(+), 57 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5b2a27d5366..d05bd5c0723 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -64,6 +64,9 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
PruneState *prstate, PruneResult *presult);
+
+static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
+ HeapPageFreeze *pagefrz, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -157,7 +160,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false,
+ heap_page_prune(relation, buffer, vistest, false, NULL,
&presult, NULL);
/*
@@ -206,6 +209,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED during
* pruning.
*
+ * pagefrz contains both input and output parameters used if the caller is
+ * interested in potentially freezing tuples on the page.
+ *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -217,6 +223,7 @@ void
heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
OffsetNumber *off_loc)
{
@@ -252,6 +259,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ presult->nfrozen = 0;
/*
* Keep track of whether or not the page is all_visible in case the caller
@@ -398,6 +406,15 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
presult->all_visible_except_removable = presult->all_visible;
+ /*
+ * We will update the VM after pruning, collecting LP_DEAD items, and
+ * freezing tuples. Keep track of whether or not the page is all_visible
+ * and all_frozen and use this information to update the VM. all_visible
+ * implies lpdead_items == 0, but don't trust all_frozen result unless
+ * all_visible is also set to true.
+ */
+ presult->all_frozen = true;
+
/* Scan the page */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -405,14 +422,18 @@ heap_page_prune(Relation relation, Buffer buffer,
{
ItemId itemid;
- /* Ignore items already processed as part of an earlier chain */
- if (prstate.marked[offnum])
- continue;
-
/* see preceding loop */
if (off_loc)
*off_loc = offnum;
+ if (pagefrz)
+ prune_prepare_freeze_tuple(page, offnum,
+ pagefrz, presult);
+
+ /* Ignore items already processed as part of an earlier chain */
+ if (prstate.marked[offnum])
+ continue;
+
/* Nothing to do if slot is empty */
itemid = PageGetItemId(page, offnum);
if (!ItemIdIsUsed(itemid))
@@ -855,6 +876,53 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
return ndeleted;
}
+/*
+ * While pruning, before actually executing pruning and updating the line
+ * pointers, we may consider freezing tuples referred to by LP_NORMAL line
+ * pointers whose visibility status is not HEAPTUPLE_DEAD. That is to say, we
+ * want to consider freezing normal tuples which will not be removed.
+*/
+static void
+prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
+ HeapPageFreeze *pagefrz,
+ PruneResult *presult)
+{
+ bool totally_frozen;
+ HeapTupleHeader htup;
+ ItemId itemid;
+
+ Assert(pagefrz);
+
+ itemid = PageGetItemId(page, offnum);
+
+ if (!ItemIdIsNormal(itemid))
+ return;
+
+ /* We do not consider freezing tuples which will be removed. */
+ if (presult->htsv[offnum] == HEAPTUPLE_DEAD ||
+ presult->htsv[offnum] == -1)
+ return;
+
+ htup = (HeapTupleHeader) PageGetItem(page, itemid);
+
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, pagefrz,
+ &presult->frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ presult->frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to become
+ * totally frozen (according to its freeze plan), then the page definitely
+ * cannot be set all-frozen in the visibility map later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+}
+
/* Record lowest soon-prunable XID */
static void
heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8651040f8de..f4ea4d603c0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1387,16 +1387,13 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff;
ItemId itemid;
PruneResult presult;
- int tuples_frozen,
- lpdead_items,
+ int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1414,7 +1411,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
recently_dead_tuples = 0;
@@ -1432,31 +1428,20 @@ lazy_scan_prune(LVRelState *vacrel,
* false otherwise.
*/
heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
- &presult, &vacrel->offnum);
+ &pagefrz, &presult, &vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and check for tuples
* requiring freezing among remaining tuples with storage. We will update
* the VM after collecting LP_DEAD items and freezing tuples. Pruning will
- * have determined whether or not the page is all_visible. Keep track of
- * whether or not the page is all_frozen and use this information to
- * update the VM. all_visible implies lpdead_items == 0, but don't trust
- * all_frozen result unless all_visible is also set to true.
+ * have determined whether or not the page is all_visible and able to
+ * become all_frozen.
*
*/
- all_frozen = true;
-
- /*
- * Now scan the page to collect LP_DEAD items and update the variables set
- * just above.
- */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
- HeapTupleHeader htup;
- bool totally_frozen;
-
/*
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
@@ -1492,8 +1477,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(ItemIdIsNormal(itemid));
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
/*
* The criteria for counting a tuple as live in this block need to
* match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
@@ -1558,29 +1541,8 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
- /* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &pagefrz,
- &frozen[tuples_frozen], &totally_frozen))
- {
- /* Save prepared freeze plan for later */
- frozen[tuples_frozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the page
- * definitely cannot be set all-frozen in the visibility map later on
- */
- if (!totally_frozen)
- all_frozen = false;
}
- /*
- * We have now divided every item on the page into either an LP_DEAD item
- * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
- * that remains and needs to be considered for freezing now (LP_UNUSED and
- * LP_REDIRECT items also remain, but are of no further interest to us).
- */
vacrel->offnum = InvalidOffsetNumber;
/*
@@ -1589,8 +1551,8 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (presult.all_visible_except_removable && all_frozen &&
+ if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ (presult.all_visible_except_removable && presult.all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1600,7 +1562,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (tuples_frozen == 0)
+ if (presult.nfrozen == 0)
{
/*
* We have no freeze plans to execute, so there's no added cost
@@ -1628,7 +1590,7 @@ lazy_scan_prune(LVRelState *vacrel,
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (presult.all_visible_except_removable && all_frozen)
+ if (presult.all_visible_except_removable && presult.all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
snapshotConflictHorizon = presult.frz_conflict_horizon;
@@ -1644,7 +1606,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(vacrel->rel, buf,
snapshotConflictHorizon,
- frozen, tuples_frozen);
+ presult.frozen, presult.nfrozen);
}
}
else
@@ -1655,8 +1617,8 @@ lazy_scan_prune(LVRelState *vacrel,
*/
vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- all_frozen = false;
- tuples_frozen = 0; /* avoid miscounts in instrumentation */
+ presult.all_frozen = false;
+ presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1679,6 +1641,8 @@ lazy_scan_prune(LVRelState *vacrel,
&debug_cutoff, &debug_all_frozen))
Assert(false);
+ Assert(presult.all_frozen == debug_all_frozen);
+
Assert(!TransactionIdIsValid(debug_cutoff) ||
debug_cutoff == presult.frz_conflict_horizon);
}
@@ -1709,7 +1673,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += tuples_frozen;
+ vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
vacrel->live_tuples += live_tuples;
vacrel->recently_dead_tuples += recently_dead_tuples;
@@ -1732,7 +1696,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (all_frozen)
+ if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1803,7 +1767,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && presult.all_visible &&
- all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 6823ab8b658..bea35afc4bd 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -212,7 +212,19 @@ typedef struct PruneResult
* 1. Otherwise every access would need to subtract 1.
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+
bool all_visible_except_removable;
+
+ /* Whether or not the page can be set all frozen in the VM */
+ bool all_frozen;
+
+ /* Number of newly frozen tuples */
+ int nfrozen;
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneResult;
/*
@@ -324,6 +336,7 @@ extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer,
--
2.37.2
v1-0006-lazy_scan_prune-reorder-freeze-execution-logic.patchtext/x-patch; charset=US-ASCII; name=v1-0006-lazy_scan_prune-reorder-freeze-execution-logic.patchDownload
From 058dc9caf0d9b84d4a05c6c01194dfa1ef6db543 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 14:50:12 -0500
Subject: [PATCH v1 06/15] lazy_scan_prune reorder freeze execution logic
To combine the prune and freeze records, freezing must be done before a
pruning WAL record is emitted. We will move the freeze execution into
heap_page_prune() in future commits. lazy_scan_prune() currently
executes freezing, updates vacrel->NewRelfrozenXid and
vacrel->NewRelminMxid, and resets the snapshotConflictHorizon that the
visibility map update record may use in the same block of if statements.
This commit starts reordering that logic so that the freeze execution
can be separated from the other updates which should not be done in
pruning. It also adds a helper calculating freeze snapshot conflict
horizon. This will be useful when the freeze execution is moved into
pruning because not all callers of heap_page_prune() have access to
VacuumCutoffs.
---
src/backend/access/heap/vacuumlazy.c | 112 ++++++++++++++++-----------
1 file changed, 67 insertions(+), 45 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f4ea4d603c0..90741aad17b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -270,6 +270,8 @@ static void update_vacuum_error_info(LVRelState *vacrel,
static void restore_vacuum_error_info(LVRelState *vacrel,
const LVSavedErrInfo *saved_vacrel);
+static TransactionId heap_frz_conflict_horizon(PruneResult *presult,
+ HeapPageFreeze *pagefrz);
/*
* heap_vacuum_rel() -- perform VACUUM for one heap relation
@@ -1344,6 +1346,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * Determine the snapshotConflictHorizon for freezing. Must only be called
+ * after pruning and determining if the page is freezable.
+ */
+static TransactionId
+heap_frz_conflict_horizon(PruneResult *presult, HeapPageFreeze *pagefrz)
+{
+ TransactionId result;
+
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts when the
+ * whole page is eligible to become all-frozen in the VM once we're done
+ * with it. Otherwise we generate a conservative cutoff by stepping back
+ * from OldestXmin.
+ */
+ if (presult->all_visible_except_removable && presult->all_frozen)
+ result = presult->frz_conflict_horizon;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ result = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(result);
+ }
+
+ return result;
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -1392,6 +1421,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
+ bool do_freeze;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1551,10 +1581,15 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ do_freeze = pagefrz.freeze_required ||
(presult.all_visible_except_removable && presult.all_frozen &&
- fpi_before != pgWalUsage.wal_fpi))
+ presult.nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+
+ if (do_freeze)
{
+ TransactionId snapshotConflictHorizon;
+
/*
* We're freezing the page. Our final NewRelfrozenXid doesn't need to
* be affected by the XIDs that are just about to be frozen anyway.
@@ -1562,52 +1597,39 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (presult.nfrozen == 0)
- {
- /*
- * We have no freeze plans to execute, so there's no added cost
- * from following the freeze path. That's why it was chosen. This
- * is important in the case where the page only contains totally
- * frozen tuples at this point (perhaps only following pruning).
- * Such pages can be marked all-frozen in the VM by our caller,
- * even though none of its tuples were newly frozen here (note
- * that the "no freeze" path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter
- * here, since it only counts pages with newly frozen tuples
- * (don't confuse that with pages newly set all-frozen in VM).
- */
- }
- else
- {
- TransactionId snapshotConflictHorizon;
+ vacrel->frozen_pages++;
- vacrel->frozen_pages++;
+ snapshotConflictHorizon = heap_frz_conflict_horizon(&presult, &pagefrz);
- /*
- * We can use frz_conflict_horizon as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM
- * once we're done with it. Otherwise we generate a conservative
- * cutoff by stepping back from OldestXmin.
- */
- if (presult.all_visible_except_removable && presult.all_frozen)
- {
- /* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = presult.frz_conflict_horizon;
- presult.frz_conflict_horizon = InvalidTransactionId;
- }
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
+ /* Using same cutoff when setting VM is now unnecessary */
+ if (presult.all_visible_except_removable && presult.all_frozen)
+ presult.frz_conflict_horizon = InvalidTransactionId;
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(vacrel->rel, buf,
+ snapshotConflictHorizon,
+ presult.frozen, presult.nfrozen);
+ }
+ else if (presult.all_frozen && presult.nfrozen == 0)
+ {
+ /* Page should be all visible except to-be-removed tuples */
+ Assert(presult.all_visible_except_removable);
+
+ /*
+ * We have no freeze plans to execute, so there's no added cost from
+ * following the freeze path. That's why it was chosen. This is
+ * important in the case where the page only contains totally frozen
+ * tuples at this point (perhaps only following pruning). Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here (note that the "no freeze"
+ * path never sets pages all-frozen).
+ *
+ * We never increment the frozen_pages instrumentation counter here,
+ * since it only counts pages with newly frozen tuples (don't confuse
+ * that with pages newly set all-frozen in VM).
+ */
+ vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
+ vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
}
else
{
--
2.37.2
v1-0009-Separate-tuple-pre-freeze-checks-and-invoke-earli.patchtext/x-patch; charset=US-ASCII; name=v1-0009-Separate-tuple-pre-freeze-checks-and-invoke-earli.patchDownload
From aa1b1d236fec0787dadeba8e3966ba5b772ab7a2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 16:53:45 -0500
Subject: [PATCH v1 09/15] Separate tuple pre freeze checks and invoke earlier
When combining the prune and freeze records their critical sections will
have to be combined. heap_freeze_execute_prepared() does a set of pre
freeze validations before starting its critical section. Move these
validations into a helper function, heap_pre_freeze_checks(), and invoke
it in heap_page_prune() before the pruning critical section.
Also move up the calculation of the freeze snapshot conflict horizon.
---
src/backend/access/heap/heapam.c | 58 ++++++++++++++++-------------
src/backend/access/heap/pruneheap.c | 9 +++--
src/include/access/heapam.h | 3 ++
3 files changed, 42 insertions(+), 28 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 76eb67f746a..91f8a0f3a9e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6664,35 +6664,19 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
- */
+* Perform xmin/xmax XID status sanity checks before calling
+* heap_freeze_execute_prepared().
+*
+* heap_prepare_freeze_tuple doesn't perform these checks directly because
+* pg_xact lookups are relatively expensive. They shouldn't be repeated
+* by successive VACUUMs that each decide against freezing the same page.
+*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- /*
- * Perform xmin/xmax XID status sanity checks before critical section.
- *
- * heap_prepare_freeze_tuple doesn't perform these checks directly because
- * pg_xact lookups are relatively expensive. They shouldn't be repeated
- * by successive VACUUMs that each decide against freezing the same page.
- */
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6731,6 +6715,30 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xmax)));
}
}
+}
+
+/*
+ * heap_freeze_execute_prepared
+ *
+ * Executes freezing of one or more heap tuples on a page on behalf of caller.
+ * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
+ * Caller must set 'offset' in each plan for us. Note that we destructively
+ * sort caller's tuples array in-place, so caller had better be done with it.
+ *
+ * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
+ * later on without any risk of unsafe pg_xact lookups, even following a hard
+ * crash (or when querying from a standby). We represent freezing by setting
+ * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
+ * See section on buffer access rules in src/backend/storage/buffer/README.
+ */
+void
+heap_freeze_execute_prepared(Relation rel, Buffer buffer,
+ TransactionId snapshotConflictHorizon,
+ HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
+
+ Assert(ntuples > 0);
START_CRIT_SECTION();
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2d697ab9eaf..211f24f1d42 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -500,6 +500,12 @@ heap_page_prune(Relation relation, Buffer buffer,
(pagefrz->freeze_required ||
(whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+ if (do_freeze)
+ {
+ heap_pre_freeze_checks(buffer, frozen, presult->nfrozen);
+ frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
+ }
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -598,9 +604,6 @@ heap_page_prune(Relation relation, Buffer buffer,
if (do_freeze)
{
-
- frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
-
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(relation, buffer,
frz_conflict_horizon,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e89ebc8cace..9ce5bf6e513 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -308,6 +308,9 @@ extern TransactionId heap_frz_conflict_horizon(PruneResult *presult,
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
+
+extern void heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
TransactionId snapshotConflictHorizon,
HeapTupleFreeze *tuples, int ntuples);
--
2.37.2
v1-0010-Inline-heap_freeze_execute_prepared.patchtext/x-patch; charset=US-ASCII; name=v1-0010-Inline-heap_freeze_execute_prepared.patchDownload
From a52326f3b1a279826685d01a63c223591e77ca03 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 17:03:17 -0500
Subject: [PATCH v1 10/15] Inline heap_freeze_execute_prepared()
In order to merge freeze and prune records, the execution of tuple
freezing and the WAL logging of the changes to the page must be
separated so that the WAL logging can be combined with prune WAL
logging. This commit makes a helper for the tuple freezing and then
inlines the contents of heap_freeze_execute_prepared() where it is
called in heap_page_prune(). The original function,
heap_freeze_execute_prepared() is retained because the "no prune" case
in heap_page_prune() must still be able to emit a freeze record.
---
src/backend/access/heap/heapam.c | 61 +++++++++++++++++------------
src/backend/access/heap/pruneheap.c | 51 ++++++++++++++++++++++--
src/include/access/heapam.h | 8 ++++
3 files changed, 90 insertions(+), 30 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 91f8a0f3a9e..e1e3b454964 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -95,9 +95,6 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
ItemPointer ctid, TransactionId xid,
LockTupleMode mode);
-static int heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
- xl_heap_freeze_plan *plans_out,
- OffsetNumber *offsets_out);
static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
uint16 *new_infomask2);
static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
@@ -6718,30 +6715,17 @@ heap_pre_freeze_checks(Buffer buffer,
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
+ * Helper which executes freezing of one or more heap tuples on a page on
+ * behalf of caller. Caller passes an array of tuple plans from
+ * heap_prepare_freeze_tuple. Caller must set 'offset' in each plan for us.
+ * Must be called in a critical section that also marks the buffer dirty and,
+ * if needed, emits WAL.
*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- START_CRIT_SECTION();
-
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6751,6 +6735,29 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
htup = (HeapTupleHeader) PageGetItem(page, itemid);
heap_execute_freeze_tuple(htup, frz);
}
+}
+
+/*
+ * heap_freeze_execute_prepared
+ *
+ * Execute freezing of prepared tuples and WAL-logs the changes so that VACUUM
+ * can advance the rel's relfrozenxid later on without any risk of unsafe
+ * pg_xact lookups, even following a hard crash (or when querying from a
+ * standby). We represent freezing by setting infomask bits in tuple headers,
+ * but this shouldn't be thought of as a hint. See section on buffer access
+ * rules in src/backend/storage/buffer/README. Must be called from within a
+ * critical section.
+ */
+void
+heap_freeze_execute_prepared(Relation rel, Buffer buffer,
+ TransactionId snapshotConflictHorizon,
+ HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
+
+ Assert(ntuples > 0);
+
+ heap_freeze_prepared_tuples(buffer, tuples, ntuples);
MarkBufferDirty(buffer);
@@ -6763,7 +6770,11 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xl_heap_freeze_page xlrec;
XLogRecPtr recptr;
- /* Prepare deduplicated representation for use in WAL record */
+ /*
+ * Prepare deduplicated representation for use in WAL record
+ * Destructively sorts tuples array in-place, so caller had better be
+ * done with it.
+ */
nplans = heap_log_freeze_plan(tuples, ntuples, plans, offsets);
xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
@@ -6788,8 +6799,6 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
PageSetLSN(page, recptr);
}
-
- END_CRIT_SECTION();
}
/*
@@ -6879,7 +6888,7 @@ heap_log_freeze_new_plan(xl_heap_freeze_plan *plan, HeapTupleFreeze *frz)
* (actually there is one array per freeze plan, but that's not of immediate
* concern to our caller).
*/
-static int
+int
heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
xl_heap_freeze_plan *plans_out,
OffsetNumber *offsets_out)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 211f24f1d42..758ab9a0404 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -604,10 +604,53 @@ heap_page_prune(Relation relation, Buffer buffer,
if (do_freeze)
{
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(relation, buffer,
- frz_conflict_horizon,
- frozen, presult->nfrozen);
+ START_CRIT_SECTION();
+
+ Assert(presult->nfrozen > 0);
+
+ heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
+
+ MarkBufferDirty(buffer);
+
+ /* Now WAL-log freezing if necessary */
+ if (RelationNeedsWAL(relation))
+ {
+ xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
+ OffsetNumber offsets[MaxHeapTuplesPerPage];
+ int nplans;
+ xl_heap_freeze_page xlrec;
+ XLogRecPtr recptr;
+
+ /*
+ * Prepare deduplicated representation for use in WAL record
+ * Destructively sorts tuples array in-place.
+ */
+ nplans = heap_log_freeze_plan(frozen, presult->nfrozen, plans, offsets);
+
+ xlrec.snapshotConflictHorizon = frz_conflict_horizon;
+ xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
+ xlrec.nplans = nplans;
+
+ XLogBeginInsert();
+ XLogRegisterData((char *) &xlrec, SizeOfHeapFreezePage);
+
+ /*
+ * The freeze plan array and offset array are not actually in the
+ * buffer, but pretend that they are. When XLogInsert stores the
+ * whole buffer, the arrays need not be stored too.
+ */
+ XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBufData(0, (char *) plans,
+ nplans * sizeof(xl_heap_freeze_plan));
+ XLogRegisterBufData(0, (char *) offsets,
+ presult->nfrozen * sizeof(OffsetNumber));
+
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE_PAGE);
+
+ PageSetLSN(page, recptr);
+ }
+
+ END_CRIT_SECTION();
}
else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
{
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9ce5bf6e513..1ea8b87d627 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -14,6 +14,7 @@
#ifndef HEAPAM_H
#define HEAPAM_H
+#include "access/heapam_xlog.h"
#include "access/relation.h" /* for backward compatibility */
#include "access/relscan.h"
#include "access/sdir.h"
@@ -314,9 +315,16 @@ extern void heap_pre_freeze_checks(Buffer buffer,
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
TransactionId snapshotConflictHorizon,
HeapTupleFreeze *tuples, int ntuples);
+
+extern void heap_freeze_prepared_tuples(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern bool heap_freeze_tuple(HeapTupleHeader tuple,
TransactionId relfrozenxid, TransactionId relminmxid,
TransactionId FreezeLimit, TransactionId MultiXactCutoff);
+
+extern int heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
+ xl_heap_freeze_plan *plans_out,
+ OffsetNumber *offsets_out);
extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
TransactionId *NoFreezePageRelfrozenXid,
--
2.37.2
v1-0007-Execute-freezing-in-heap_page_prune.patchtext/x-patch; charset=US-ASCII; name=v1-0007-Execute-freezing-in-heap_page_prune.patchDownload
From c174ee1a38459d0b84ae92e1dcf21e12d0aadb1d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 15:35:58 -0500
Subject: [PATCH v1 07/15] Execute freezing in heap_page_prune()
As a step toward combining the prune and freeze WAL records, execute
freezing in heap_page_prune(). The logic to determine whether or not to
execute freeze plans was moved from lazy_scan_prune() over to
heap_page_prune() with little modification. Updating
vacrel->NewRelfrozenXid and NewRelminMixid remain in lazy_scan_prune().
---
src/backend/access/heap/pruneheap.c | 54 +++++++++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 66 +++++-----------------------
src/include/access/heapam.h | 8 ++--
3 files changed, 65 insertions(+), 63 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d05bd5c0723..7770da38d84 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "catalog/catalog.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
@@ -66,7 +67,8 @@ static int heap_prune_chain(Buffer buffer,
PruneState *prstate, PruneResult *presult);
static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
- HeapPageFreeze *pagefrz, PruneResult *presult);
+ HeapPageFreeze *pagefrz, HeapTupleFreeze *frozen,
+ PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -233,6 +235,14 @@ heap_page_prune(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
+ bool do_freeze;
+ int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -428,7 +438,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (pagefrz)
prune_prepare_freeze_tuple(page, offnum,
- pagefrz, presult);
+ pagefrz, frozen, presult);
/* Ignore items already processed as part of an earlier chain */
if (prstate.marked[offnum])
@@ -543,6 +553,41 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ */
+ if (pagefrz)
+ do_freeze = pagefrz->freeze_required ||
+ (presult->all_visible_except_removable && presult->all_frozen &&
+ presult->nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+ else
+ do_freeze = false;
+
+ if (do_freeze)
+ {
+
+ frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
+
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(relation, buffer,
+ frz_conflict_horizon,
+ frozen, presult->nfrozen);
+ }
+ else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
+ {
+ /*
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all frozen and there
+ * will be no newly frozen tuples.
+ */
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
}
@@ -885,6 +930,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
static void
prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
HeapPageFreeze *pagefrz,
+ HeapTupleFreeze *frozen,
PruneResult *presult)
{
bool totally_frozen;
@@ -907,11 +953,11 @@ prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
/* Tuple with storage -- consider need to freeze */
if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &presult->frozen[presult->nfrozen],
+ &frozen[presult->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- presult->frozen[presult->nfrozen++].offset = offnum;
+ frozen[presult->nfrozen++].offset = offnum;
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 90741aad17b..ef9abeb9c87 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -270,9 +270,6 @@ static void update_vacuum_error_info(LVRelState *vacrel,
static void restore_vacuum_error_info(LVRelState *vacrel,
const LVSavedErrInfo *saved_vacrel);
-static TransactionId heap_frz_conflict_horizon(PruneResult *presult,
- HeapPageFreeze *pagefrz);
-
/*
* heap_vacuum_rel() -- perform VACUUM for one heap relation
*
@@ -1350,7 +1347,7 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
* Determine the snapshotConflictHorizon for freezing. Must only be called
* after pruning and determining if the page is freezable.
*/
-static TransactionId
+TransactionId
heap_frz_conflict_horizon(PruneResult *presult, HeapPageFreeze *pagefrz)
{
TransactionId result;
@@ -1421,8 +1418,6 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool do_freeze;
- int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1575,21 +1570,8 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = InvalidOffsetNumber;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- do_freeze = pagefrz.freeze_required ||
- (presult.all_visible_except_removable && presult.all_frozen &&
- presult.nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
-
- if (do_freeze)
+ if (presult.all_frozen || presult.nfrozen > 0)
{
- TransactionId snapshotConflictHorizon;
-
/*
* We're freezing the page. Our final NewRelfrozenXid doesn't need to
* be affected by the XIDs that are just about to be frozen anyway.
@@ -1597,50 +1579,26 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- vacrel->frozen_pages++;
-
- snapshotConflictHorizon = heap_frz_conflict_horizon(&presult, &pagefrz);
+ /*
+ * We never increment the frozen_pages instrumentation counter when
+ * nfrozen == 0, since it only counts pages with newly frozen tuples
+ * (don't confuse that with pages newly set all-frozen in VM).
+ */
+ if (presult.nfrozen > 0)
+ vacrel->frozen_pages++;
/* Using same cutoff when setting VM is now unnecessary */
- if (presult.all_visible_except_removable && presult.all_frozen)
+ if (presult.nfrozen > 0 && presult.all_frozen)
presult.frz_conflict_horizon = InvalidTransactionId;
-
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
- else if (presult.all_frozen && presult.nfrozen == 0)
- {
- /* Page should be all visible except to-be-removed tuples */
- Assert(presult.all_visible_except_removable);
-
- /*
- * We have no freeze plans to execute, so there's no added cost from
- * following the freeze path. That's why it was chosen. This is
- * important in the case where the page only contains totally frozen
- * tuples at this point (perhaps only following pruning). Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here (note that the "no freeze"
- * path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter here,
- * since it only counts pages with newly frozen tuples (don't confuse
- * that with pages newly set all-frozen in VM).
- */
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
}
else
{
/*
- * Page requires "no freeze" processing. It might be set all-visible
- * in the visibility map, but it can never be set all-frozen.
+ * Page was "no freeze" processed. It might be set all-visible in the
+ * visibility map, but it can never be set all-frozen.
*/
vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- presult.all_frozen = false;
- presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bea35afc4bd..e89ebc8cace 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -220,11 +220,6 @@ typedef struct PruneResult
/* Number of newly frozen tuples */
int nfrozen;
-
- /*
- * One entry for every tuple that we may freeze.
- */
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneResult;
/*
@@ -307,6 +302,9 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
Buffer *buffer, struct TM_FailureData *tmfd);
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+
+extern TransactionId heap_frz_conflict_horizon(PruneResult *presult,
+ HeapPageFreeze *pagefrz);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
--
2.37.2
v1-0008-Make-opp-freeze-heuristic-compatible-with-prune-f.patchtext/x-patch; charset=US-ASCII; name=v1-0008-Make-opp-freeze-heuristic-compatible-with-prune-f.patchDownload
From 4ae48cc792f8ec2113b4dbf4155aaa327758a731 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 16:11:35 -0500
Subject: [PATCH v1 08/15] Make opp freeze heuristic compatible with
prune+freeze record
Once the prune and freeze records are combined, we will no longer be
able to use a test of whether or not pruning emitted an FPI to decide
whether or not to opportunistically freeze a freezable page.
While this heuristic should be improved, for now, approximate the
previous logic by keeping track of whether or not a hint bit FPI was
emitted during visibility checks (when checksums are on) and combine
that with checking XLogCheckBufferNeedsBackup(). If we just finished
deciding whether or not to prune and the current buffer seems to need an
FPI after modification, it is likely that pruning would have emitted an
FPI.
---
src/backend/access/heap/pruneheap.c | 58 +++++++++++++++++++++--------
1 file changed, 43 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7770da38d84..2d697ab9eaf 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -236,6 +236,10 @@ heap_page_prune(Relation relation, Buffer buffer,
PruneState prstate;
HeapTupleData tup;
bool do_freeze;
+ bool do_prune;
+ bool whole_page_freezable;
+ bool hint_bit_fpi;
+ bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
TransactionId frz_conflict_horizon = InvalidTransactionId;
@@ -401,6 +405,13 @@ heap_page_prune(Relation relation, Buffer buffer,
}
}
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted. Then reset fpi_before for no prune case.
+ */
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ fpi_before = pgWalUsage.wal_fpi;
+
/*
* For vacuum, if the whole page will become frozen, we consider
* opportunistically freezing tuples. Dead tuples which will be removed by
@@ -458,11 +469,42 @@ heap_page_prune(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = InvalidOffsetNumber;
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
+ /*
+ * Only incur overhead of checking if we will do an FPI if we might use
+ * the information.
+ */
+ if (do_prune && pagefrz)
+ prune_fpi = XLogCheckBufferNeedsBackup(buffer);
+
+ /* Is the whole page freezable? And is there something to freeze */
+ whole_page_freezable = presult->all_visible_except_removable &&
+ presult->all_frozen;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and prune
+ * records are combined, this heuristic couldn't be used anymore. The
+ * opportunistic freeze heuristic must be improved; however, for now, try
+ * to approximate it.
+ */
+
+ do_freeze = pagefrz &&
+ (pagefrz->freeze_required ||
+ (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
/* Have we found any prunable items? */
- if (prstate.nredirected > 0 || prstate.ndead > 0 || prstate.nunused > 0)
+ if (do_prune)
{
/*
* Apply the planned item changes, then repair page fragmentation, and
@@ -554,20 +596,6 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- if (pagefrz)
- do_freeze = pagefrz->freeze_required ||
- (presult->all_visible_except_removable && presult->all_frozen &&
- presult->nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
- else
- do_freeze = false;
-
if (do_freeze)
{
--
2.37.2
v1-0013-Set-hastup-in-heap_page_prune.patchtext/x-patch; charset=US-ASCII; name=v1-0013-Set-hastup-in-heap_page_prune.patchDownload
From 200abbaebb8b5abe1dec6958d35360b8ea795167 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Jan 2024 14:53:36 -0500
Subject: [PATCH v1 13/15] Set hastup in heap_page_prune
lazy_scan_prune() loops through the line pointers and tuple visibility
information for each tuple on a page, setting hastup to true if there
are any LP_REDIRECT line pointers or tuples with storage which will not
be removed. We want to remove this extra loop from lazy_scan_prune(),
and we know about non-removable tuples during heap_page_prune() anyway.
Set hastup when recording LP_REDIRECT line pointers in
heap_prune_chain() and when LP_NORMAL line pointers refer to tuples
whose visibility status is not HEAPTUPLE_DEAD.
---
src/backend/access/heap/pruneheap.c | 33 ++++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 25 ++-------------------
src/include/access/heapam.h | 1 +
3 files changed, 32 insertions(+), 27 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 09ac49d84a8..157ee4dc170 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -71,7 +71,8 @@ static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum);
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneResult *presult);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
PruneResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
@@ -275,6 +276,8 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->nnewlpdead = 0;
presult->nfrozen = 0;
+ presult->hastup = false;
+
/*
* Keep track of whether or not the page is all_visible in case the caller
* wants to use this information to update the VM.
@@ -451,18 +454,37 @@ heap_page_prune(Relation relation, Buffer buffer,
prune_prepare_freeze_tuple(page, offnum,
pagefrz, frozen, presult);
+ itemid = PageGetItemId(page, offnum);
+
+ if (ItemIdIsNormal(itemid) &&
+ presult->htsv[offnum] != HEAPTUPLE_DEAD)
+ {
+ Assert(presult->htsv[offnum] != -1);
+
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the
+ * soft assumption that any LP_DEAD items encountered here will
+ * become LP_UNUSED later on, before count_nondeletable_pages is
+ * reached. If we don't make this assumption then rel truncation
+ * will only happen every other VACUUM, at most. Besides, VACUUM
+ * must treat hastup/nonempty_pages as provisional no matter how
+ * LP_DEAD items are handled (handled here, or handled later on).
+ */
+ presult->hastup = true;
+ }
+
/* Ignore items already processed as part of an earlier chain */
if (prstate.marked[offnum])
continue;
/* Nothing to do if slot is empty */
- itemid = PageGetItemId(page, offnum);
if (!ItemIdIsUsed(itemid))
continue;
/* Process this item or chain of items */
presult->ndeleted += heap_prune_chain(buffer, offnum,
&prstate, presult);
+
}
/* Clear the offset information once we have processed the given page. */
@@ -993,7 +1015,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (i >= nchain)
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
- heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
+ heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], presult);
}
else if (nchain < 2 && ItemIdIsRedirected(rootlp))
{
@@ -1075,7 +1097,8 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
/* Record line pointer to be redirected */
static void
heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum)
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneResult *presult)
{
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
@@ -1085,6 +1108,8 @@ heap_prune_record_redirect(PruneState *prstate,
prstate->marked[offnum] = true;
Assert(!prstate->marked[rdoffnum]);
prstate->marked[rdoffnum] = true;
+
+ presult->hastup = true;
}
/* Record line pointer to be marked dead */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ef9abeb9c87..eb415a0aa6f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1417,7 +1417,6 @@ lazy_scan_prune(LVRelState *vacrel,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
- bool hastup = false;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1461,7 +1460,6 @@ lazy_scan_prune(LVRelState *vacrel,
* the VM after collecting LP_DEAD items and freezing tuples. Pruning will
* have determined whether or not the page is all_visible and able to
* become all_frozen.
- *
*/
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -1474,28 +1472,12 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- if (!ItemIdIsUsed(itemid))
- continue;
-
/* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid))
- {
- /* page makes rel truncation unsafe */
- hastup = true;
+ if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
continue;
- }
if (ItemIdIsDead(itemid))
{
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- */
deadoffsets[lpdead_items++] = offnum;
continue;
}
@@ -1563,9 +1545,6 @@ lazy_scan_prune(LVRelState *vacrel,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
-
- hastup = true; /* page makes rel truncation unsafe */
-
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1659,7 +1638,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->recently_dead_tuples += recently_dead_tuples;
/* Can't truncate this page */
- if (hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 1ea8b87d627..4e41cf68957 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -201,6 +201,7 @@ typedef struct PruneResult
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
bool all_visible; /* Whether or not the page is all visible */
+ bool hastup; /* Does page make rel truncation unsafe */
TransactionId frz_conflict_horizon; /* Newest xmin on the page */
/*
--
2.37.2
v1-0014-Count-tuples-for-vacuum-logging-in-heap_page_prun.patchtext/x-patch; charset=US-ASCII; name=v1-0014-Count-tuples-for-vacuum-logging-in-heap_page_prun.patchDownload
From e642b2fa85fd21626fa6830e769867ce8aafe54e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Jan 2024 17:25:56 -0500
Subject: [PATCH v1 14/15] Count tuples for vacuum logging in heap_page_prune
lazy_scan_prune() loops through all of the tuple visibility information
that was recorded in heap_page_prune() and then counts live and recently
dead tuples. That information is available in heap_page_prune(), so just
record it there. Add live and recently dead tuple counters to the
PruneResult. Doing this counting in heap_page_prune() eliminates the
need for saving the tuple visibility status information in the
PruneResult. Instead, save it in the PruneState where it can be
referenced by heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 109 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 75 +-----------------
src/include/access/heapam.h | 27 +------
3 files changed, 97 insertions(+), 114 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 157ee4dc170..88adad99c39 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -56,6 +56,17 @@ typedef struct
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune() for details.
+ * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
+ * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
} PruneState;
/* Local functions */
@@ -66,7 +77,8 @@ static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
PruneState *prstate, PruneResult *presult);
-static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
+static inline HTSV_Result htsv_get_valid_status(int status);
+static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum, PruneState *prstate,
HeapPageFreeze *pagefrz, HeapTupleFreeze *frozen,
PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
@@ -278,6 +290,9 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->hastup = false;
+ presult->live_tuples = 0;
+ presult->recently_dead_tuples = 0;
+
/*
* Keep track of whether or not the page is all_visible in case the caller
* wants to use this information to update the VM.
@@ -319,7 +334,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsNormal(itemid))
{
- presult->htsv[offnum] = -1;
+ prstate.htsv[offnum] = -1;
continue;
}
@@ -335,9 +350,30 @@ heap_page_prune(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
- presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
- switch (presult->htsv[offnum])
+ prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
+ Assert(ItemIdIsNormal(itemid));
+
+ /*
+ * The criteria for counting a tuple as live in this block need to
+ * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
+ * and ANALYZE may produce wildly different reltuples values, e.g.
+ * when there are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as
+ * VACUUM can't run inside a transaction block, which makes some cases
+ * impossible (e.g. in-progress insert from the same transaction).
+ *
+ * We treat LP_DEAD items (which are the closest thing to DEAD tuples
+ * that might be seen here) differently, too: we assume that they'll
+ * become LP_UNUSED before VACUUM finishes. This difference is only
+ * superficial. VACUUM effectively agrees with ANALYZE about DEAD
+ * items, in the end. VACUUM won't remember LP_DEAD items, but only
+ * because they're not supposed to be left behind when it is done.
+ * (Cases where we bypass index vacuuming will violate this optimistic
+ * assumption, but the overall impact of that should be negligible.)
+ */
+ switch (prstate.htsv[offnum])
{
case HEAPTUPLE_DEAD:
@@ -357,6 +393,12 @@ heap_page_prune(Relation relation, Buffer buffer,
break;
case HEAPTUPLE_LIVE:
+ /*
+ * Count it as live. Not only is this natural, but it's also
+ * what acquire_sample_rows() does.
+ */
+ presult->live_tuples++;
+
/*
* Is the tuple definitely visible to all transactions?
*
@@ -393,13 +435,34 @@ heap_page_prune(Relation relation, Buffer buffer,
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
+
+ /*
+ * If tuple is recently dead then we must not remove it from
+ * the relation. (We only remove items that are LP_DEAD from
+ * pruning.)
+ */
+ presult->recently_dead_tuples++;
presult->all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and
+ * we assume that will happen only after we report our
+ * results. This assumption is a bit shaky, but it is what
+ * acquire_sample_rows() does, so be consistent.
+ */
presult->all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
+
+ /*
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
+ */
+ presult->live_tuples++;
presult->all_visible = false;
break;
default:
@@ -451,15 +514,15 @@ heap_page_prune(Relation relation, Buffer buffer,
*off_loc = offnum;
if (pagefrz)
- prune_prepare_freeze_tuple(page, offnum,
+ prune_prepare_freeze_tuple(page, offnum, &prstate,
pagefrz, frozen, presult);
itemid = PageGetItemId(page, offnum);
if (ItemIdIsNormal(itemid) &&
- presult->htsv[offnum] != HEAPTUPLE_DEAD)
+ prstate.htsv[offnum] != HEAPTUPLE_DEAD)
{
- Assert(presult->htsv[offnum] != -1);
+ Assert(prstate.htsv[offnum] != -1);
/*
* Deliberately don't set hastup for LP_DEAD items. We make the
@@ -723,10 +786,24 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
}
+/*
+ * Pruning calculates tuple visibility once and saves the results in an array
+ * of int8. See PruneState.htsv for details. This helper function is meant to
+ * guard against examining visibility status array members which have not yet
+ * been computed.
+ */
+static inline HTSV_Result
+htsv_get_valid_status(int status)
+{
+ Assert(status >= HEAPTUPLE_DEAD &&
+ status <= HEAPTUPLE_DELETE_IN_PROGRESS);
+ return (HTSV_Result) status;
+}
+
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in presult->htsv.
+ * Tuple visibility information is provided in prstate->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -777,7 +854,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(presult->htsv[rootoffnum] != -1);
+ Assert(prstate->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -800,7 +877,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -901,7 +978,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(presult->htsv[offnum]))
+ switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
@@ -1039,7 +1116,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* want to consider freezing normal tuples which will not be removed.
*/
static void
-prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
+prune_prepare_freeze_tuple(Page page, OffsetNumber offnum, PruneState *prstate,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frozen,
PruneResult *presult)
@@ -1056,8 +1133,8 @@ prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
return;
/* We do not consider freezing tuples which will be removed. */
- if (presult->htsv[offnum] == HEAPTUPLE_DEAD ||
- presult->htsv[offnum] == -1)
+ if (prstate->htsv[offnum] == HEAPTUPLE_DEAD ||
+ prstate->htsv[offnum] == -1)
return;
htup = (HeapTupleHeader) PageGetItem(page, itemid);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index eb415a0aa6f..4770fcea021 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1413,9 +1413,6 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff;
ItemId itemid;
PruneResult presult;
- int lpdead_items,
- live_tuples,
- recently_dead_tuples;
HeapPageFreeze pagefrz;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1436,8 +1433,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
lpdead_items = 0;
- live_tuples = 0;
- recently_dead_tuples = 0;
/*
* Prune all HOT-update chains in this page.
@@ -1472,9 +1467,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- /* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
- continue;
if (ItemIdIsDead(itemid))
{
@@ -1482,69 +1474,6 @@ lazy_scan_prune(LVRelState *vacrel,
continue;
}
- Assert(ItemIdIsNormal(itemid));
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (htsv_get_valid_status(presult.htsv[offnum]))
- {
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- live_tuples++;
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- recently_dead_tuples++;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This an expected case during concurrent vacuum. Count such
- * rows as live. As above, we assume the deleting transaction
- * will commit and update the counters after we report.
- */
- live_tuples++;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1634,8 +1563,8 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
+ vacrel->live_tuples += presult.live_tuples;
+ vacrel->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4e41cf68957..989515a628d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,23 +198,14 @@ typedef struct HeapPageFreeze
*/
typedef struct PruneResult
{
+ int live_tuples;
+ int recently_dead_tuples;
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
bool all_visible; /* Whether or not the page is all visible */
bool hastup; /* Does page make rel truncation unsafe */
TransactionId frz_conflict_horizon; /* Newest xmin on the page */
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
- *
- * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
- * 1. Otherwise every access would need to subtract 1.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
-
bool all_visible_except_removable;
/* Whether or not the page can be set all frozen in the VM */
@@ -224,20 +215,6 @@ typedef struct PruneResult
int nfrozen;
} PruneResult;
-/*
- * Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneResult.htsv for details. This helper function is meant to
- * guard against examining visibility status array members which have not yet
- * been computed.
- */
-static inline HTSV_Result
-htsv_get_valid_status(int status)
-{
- Assert(status >= HEAPTUPLE_DEAD &&
- status <= HEAPTUPLE_DELETE_IN_PROGRESS);
- return (HTSV_Result) status;
-}
-
/* ----------------
* function prototypes for heap access method
*
--
2.37.2
v1-0011-Exit-heap_page_prune-early-if-no-prune.patchtext/x-patch; charset=US-ASCII; name=v1-0011-Exit-heap_page_prune-early-if-no-prune.patchDownload
From e7d3d2842e59fc7f556e9042667b3558923f0180 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 17:42:05 -0500
Subject: [PATCH v1 11/15] Exit heap_page_prune() early if no prune
If there is nothing to be pruned on the page, heap_page_prune() will
consider whether or not to update the page's pd_prune_xid and whether or
not to freeze the page. In this case, if we decide to freeze the page,
we will need to emit a freeze record.
Future commits will emit a combined freeze+prune record for cases in
which we are both pruning and freezing. In the no prune case, we are
done with heap_page_prune() after checking whether or not to set
pd_prune_xid. By reversing the prune and no prune cases so that the no
prune case is first, we can exit early in the no prune case. This allows
us to reduce the indentation level of the remaining code and not have to
validate whether or not we are, in fact, pruning.
Since we now exit early in the no prune case, we must set nfrozen and
all_frozen to their final values before executing pruning or freezing.
---
src/backend/access/heap/pruneheap.c | 192 ++++++++++++++++------------
1 file changed, 108 insertions(+), 84 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 758ab9a0404..62dc85fd77c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -505,80 +505,27 @@ heap_page_prune(Relation relation, Buffer buffer,
heap_pre_freeze_checks(buffer, frozen, presult->nfrozen);
frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
}
-
- /* Any error while applying the changes is critical */
- START_CRIT_SECTION();
-
- /* Have we found any prunable items? */
- if (do_prune)
+ else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
{
/*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
- */
- heap_page_prune_execute(buffer,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
-
- /*
- * Update the page's pd_prune_xid field to either zero, or the lowest
- * XID of any soon-prunable tuple.
- */
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
-
- /*
- * Also clear the "page is full" flag, since there's no point in
- * repeating the prune/defrag process until something else happens to
- * the page.
- */
- PageClearFull(page);
-
- MarkBufferDirty(buffer);
-
- /*
- * Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all frozen and there
+ * will be no newly frozen tuples.
*/
- if (RelationNeedsWAL(relation))
- {
- xl_heap_prune xlrec;
- XLogRecPtr recptr;
-
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
- xlrec.nredirected = prstate.nredirected;
- xlrec.ndead = prstate.ndead;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
-
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
-
- /*
- * The OffsetNumber arrays are not actually in the buffer, but we
- * pretend that they are. When XLogInsert stores the whole
- * buffer, the offset arrays need not be stored too.
- */
- if (prstate.nredirected > 0)
- XLogRegisterBufData(0, (char *) prstate.redirected,
- prstate.nredirected *
- sizeof(OffsetNumber) * 2);
-
- if (prstate.ndead > 0)
- XLogRegisterBufData(0, (char *) prstate.nowdead,
- prstate.ndead * sizeof(OffsetNumber));
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
- if (prstate.nunused > 0)
- XLogRegisterBufData(0, (char *) prstate.nowunused,
- prstate.nunused * sizeof(OffsetNumber));
+ /* Record number of newly-set-LP_DEAD items for caller */
+ presult->nnewlpdead = prstate.ndead;
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
- PageSetLSN(BufferGetPage(buffer), recptr);
- }
- }
- else
+ /* Have we found any prunable items? */
+ if (!do_prune)
{
+ /* Any error while applying the changes is critical */
+ START_CRIT_SECTION();
+
/*
* If we didn't prune anything, but have found a new value for the
* pd_prune_xid field, update it and mark the buffer dirty. This is
@@ -595,17 +542,104 @@ heap_page_prune(Relation relation, Buffer buffer,
PageClearFull(page);
MarkBufferDirtyHint(buffer, true);
}
+
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+
+ /*
+ * We may have decided not to opportunistically freeze above because
+ * pruning would not emit an FPI. Now, however, if checksums are
+ * enabled, setting the hint bit may have emitted an FPI. Check again
+ * if we should freeze.
+ */
+ if (!do_freeze && hint_bit_fpi)
+ do_freeze = pagefrz &&
+ (pagefrz->freeze_required ||
+ (whole_page_freezable && presult->nfrozen > 0));
+
+ if (do_freeze)
+ {
+ heap_freeze_execute_prepared(relation, buffer,
+ frz_conflict_horizon,
+ frozen, presult->nfrozen);
+ }
+ else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
+ {
+ presult->all_frozen = false;
+ presult->nfrozen = 0;
+ }
+
+ END_CRIT_SECTION();
+ return;
}
- END_CRIT_SECTION();
+ START_CRIT_SECTION();
- /* Record number of newly-set-LP_DEAD items for caller */
- presult->nnewlpdead = prstate.ndead;
+ /*
+ * Apply the planned item changes, then repair page fragmentation, and
+ * update the page's hint bit about whether it has free line pointers.
+ */
+ heap_page_prune_execute(buffer,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
- if (do_freeze)
+ /*
+ * Update the page's pd_prune_xid field to either zero, or the lowest XID
+ * of any soon-prunable tuple.
+ */
+ ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
+
+ /*
+ * Also clear the "page is full" flag, since there's no point in repeating
+ * the prune/defrag process until something else happens to the page.
+ */
+ PageClearFull(page);
+
+ MarkBufferDirty(buffer);
+
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
+ */
+ if (RelationNeedsWAL(relation))
{
- START_CRIT_SECTION();
+ xl_heap_prune xlrec;
+ XLogRecPtr recptr;
+
+ xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
+ xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
+ xlrec.nredirected = prstate.nredirected;
+ xlrec.ndead = prstate.ndead;
+
+ XLogBeginInsert();
+ XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
+
+ XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ /*
+ * The OffsetNumber arrays are not actually in the buffer, but we
+ * pretend that they are. When XLogInsert stores the whole buffer,
+ * the offset arrays need not be stored too.
+ */
+ if (prstate.nredirected > 0)
+ XLogRegisterBufData(0, (char *) prstate.redirected,
+ prstate.nredirected *
+ sizeof(OffsetNumber) * 2);
+
+ if (prstate.ndead > 0)
+ XLogRegisterBufData(0, (char *) prstate.nowdead,
+ prstate.ndead * sizeof(OffsetNumber));
+
+ if (prstate.nunused > 0)
+ XLogRegisterBufData(0, (char *) prstate.nowunused,
+ prstate.nunused * sizeof(OffsetNumber));
+
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+
+ PageSetLSN(BufferGetPage(buffer), recptr);
+ }
+
+ if (do_freeze)
+ {
Assert(presult->nfrozen > 0);
heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
@@ -649,19 +683,9 @@ heap_page_prune(Relation relation, Buffer buffer,
PageSetLSN(page, recptr);
}
-
- END_CRIT_SECTION();
- }
- else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
- {
- /*
- * If we will neither freeze tuples on the page nor set the page all
- * frozen in the visibility map, the page is not all frozen and there
- * will be no newly frozen tuples.
- */
- presult->all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumentation */
}
+
+ END_CRIT_SECTION();
}
--
2.37.2
v1-0012-Merge-prune-and-freeze-records.patchtext/x-patch; charset=US-ASCII; name=v1-0012-Merge-prune-and-freeze-records.patchDownload
From 910481747232a42f8fb5ff5f95d6438ed0244301 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 17:55:31 -0500
Subject: [PATCH v1 12/15] Merge prune and freeze records
When there are both tuples to prune and freeze on a page, emit a single,
combined prune record containing the offsets for pruning and the freeze
plans and offsets for freezing. This will reduce the number of WAL
records emitted.
---
src/backend/access/heap/heapam.c | 42 ++++++++++++--
src/backend/access/heap/pruneheap.c | 85 +++++++++++++----------------
src/include/access/heapam_xlog.h | 20 +++++--
3 files changed, 90 insertions(+), 57 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e1e3b454964..90feca1d3b2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8808,24 +8808,28 @@ heap_xlog_prune(XLogReaderState *record)
if (action == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
- OffsetNumber *end;
OffsetNumber *redirected;
OffsetNumber *nowdead;
OffsetNumber *nowunused;
int nredirected;
int ndead;
int nunused;
+ int nplans;
Size datalen;
+ xl_heap_freeze_plan *plans;
+ OffsetNumber *frz_offsets;
+ int curoff = 0;
- redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
-
+ nplans = xlrec->nplans;
nredirected = xlrec->nredirected;
ndead = xlrec->ndead;
- end = (OffsetNumber *) ((char *) redirected + datalen);
+ nunused = xlrec->nunused;
+
+ plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, &datalen);
+ redirected = (OffsetNumber *) &plans[nplans];
nowdead = redirected + (nredirected * 2);
nowunused = nowdead + ndead;
- nunused = (end - nowunused);
- Assert(nunused >= 0);
+ frz_offsets = nowunused + nunused;
/* Update all line pointers per the record, and repair fragmentation */
heap_page_prune_execute(buffer,
@@ -8833,6 +8837,32 @@ heap_xlog_prune(XLogReaderState *record)
nowdead, ndead,
nowunused, nunused);
+ for (int p = 0; p < nplans; p++)
+ {
+ HeapTupleFreeze frz;
+
+ /*
+ * Convert freeze plan representation from WAL record into
+ * per-tuple format used by heap_execute_freeze_tuple
+ */
+ frz.xmax = plans[p].xmax;
+ frz.t_infomask2 = plans[p].t_infomask2;
+ frz.t_infomask = plans[p].t_infomask;
+ frz.frzflags = plans[p].frzflags;
+ frz.offset = InvalidOffsetNumber; /* unused, but be tidy */
+
+ for (int i = 0; i < plans[p].ntuples; i++)
+ {
+ OffsetNumber offset = frz_offsets[curoff++];
+ ItemId lp;
+ HeapTupleHeader tuple;
+
+ lp = PageGetItemId(page, offset);
+ tuple = (HeapTupleHeader) PageGetItem(page, lp);
+ heap_execute_freeze_tuple(tuple, &frz);
+ }
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 62dc85fd77c..09ac49d84a8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -595,6 +595,9 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
PageClearFull(page);
+ if (do_freeze)
+ heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
+
MarkBufferDirty(buffer);
/*
@@ -605,10 +608,37 @@ heap_page_prune(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
+ xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
+ OffsetNumber offsets[MaxHeapTuplesPerPage];
+
xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
xlrec.nredirected = prstate.nredirected;
xlrec.ndead = prstate.ndead;
+ xlrec.nunused = prstate.nunused;
+ xlrec.nplans = 0;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions
+ * on the standby older than the youngest xmax of the most recently
+ * removed tuple this record will prune will conflict. If this record
+ * will freeze tuples, any transactions on the standby with xids older
+ * than the youngest tuple this record will freeze will conflict.
+ */
+ if (do_freeze)
+ xlrec.snapshotConflictHorizon = Max(prstate.snapshotConflictHorizon,
+ frz_conflict_horizon);
+ else
+ xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
+
+ /*
+ * Prepare deduplicated representation for use in WAL record
+ * Destructively sorts tuples array in-place.
+ */
+ if (do_freeze)
+ xlrec.nplans = heap_log_freeze_plan(frozen,
+ presult->nfrozen, plans, offsets);
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
@@ -620,6 +650,10 @@ heap_page_prune(Relation relation, Buffer buffer,
* pretend that they are. When XLogInsert stores the whole buffer,
* the offset arrays need not be stored too.
*/
+ if (xlrec.nplans > 0)
+ XLogRegisterBufData(0, (char *) plans,
+ xlrec.nplans * sizeof(xl_heap_freeze_plan));
+
if (prstate.nredirected > 0)
XLogRegisterBufData(0, (char *) prstate.redirected,
prstate.nredirected *
@@ -633,56 +667,13 @@ heap_page_prune(Relation relation, Buffer buffer,
XLogRegisterBufData(0, (char *) prstate.nowunused,
prstate.nunused * sizeof(OffsetNumber));
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
-
- PageSetLSN(BufferGetPage(buffer), recptr);
- }
-
- if (do_freeze)
- {
- Assert(presult->nfrozen > 0);
-
- heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
-
- MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(relation))
- {
- xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
- OffsetNumber offsets[MaxHeapTuplesPerPage];
- int nplans;
- xl_heap_freeze_page xlrec;
- XLogRecPtr recptr;
-
- /*
- * Prepare deduplicated representation for use in WAL record
- * Destructively sorts tuples array in-place.
- */
- nplans = heap_log_freeze_plan(frozen, presult->nfrozen, plans, offsets);
-
- xlrec.snapshotConflictHorizon = frz_conflict_horizon;
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.nplans = nplans;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapFreezePage);
-
- /*
- * The freeze plan array and offset array are not actually in the
- * buffer, but pretend that they are. When XLogInsert stores the
- * whole buffer, the arrays need not be stored too.
- */
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- XLogRegisterBufData(0, (char *) plans,
- nplans * sizeof(xl_heap_freeze_plan));
+ if (xlrec.nplans > 0)
XLogRegisterBufData(0, (char *) offsets,
presult->nfrozen * sizeof(OffsetNumber));
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE_PAGE);
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
- PageSetLSN(page, recptr);
- }
+ PageSetLSN(BufferGetPage(buffer), recptr);
}
END_CRIT_SECTION();
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 6488dad5e64..22f236bb52a 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -231,23 +231,35 @@ typedef struct xl_heap_update
* during opportunistic pruning)
*
* The array of OffsetNumbers following the fixed part of the record contains:
+ * * for each freeze plan: the freeze plan
* * for each redirected item: the item offset, then the offset redirected to
* * for each now-dead item: the item offset
* * for each now-unused item: the item offset
- * The total number of OffsetNumbers is therefore 2*nredirected+ndead+nunused.
- * Note that nunused is not explicitly stored, but may be found by reference
- * to the total record length.
+ * * for each tuple frozen by the freeze plans: the offset of the item corresponding to that tuple
+ * The total number of OffsetNumbers is therefore
+ * (2*nredirected) + ndead + nunused + (sum[plan.ntuples for plan in plans])
*
* Acquires a full cleanup lock.
*/
typedef struct xl_heap_prune
{
TransactionId snapshotConflictHorizon;
+ uint16 nplans;
uint16 nredirected;
uint16 ndead;
+ uint16 nunused;
bool isCatalogRel; /* to handle recovery conflict during logical
* decoding on standby */
- /* OFFSET NUMBERS are in the block reference 0 */
+ /*
+ * OFFSET NUMBERS and freeze plans are in the block reference 0 in the
+ * following order:
+ *
+ * * xl_heap_freeze_plan plans[nplans];
+ * * OffsetNumber redirected[2 * nredirected];
+ * * OffsetNumber nowdead[ndead];
+ * * OffsetNumber nowunused[nunused];
+ * * OffsetNumber frz_offsets[...];
+ */
} xl_heap_prune;
#define SizeOfHeapPrune (offsetof(xl_heap_prune, isCatalogRel) + sizeof(bool))
--
2.37.2
v1-0015-Save-dead-tuple-offsets-during-heap_page_prune.patchtext/x-patch; charset=US-ASCII; name=v1-0015-Save-dead-tuple-offsets-during-heap_page_prune.patchDownload
From 73d434268322f041722f96014568ac034c323dd0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Jan 2024 16:55:28 -0500
Subject: [PATCH v1 15/15] Save dead tuple offsets during heap_page_prune
After heap_page_prune() returned, lazy_scan_prune() looped through all
of the offsets of LP_DEAD items which it later added to
LVRelState->dead_items. Instead take care of this when marking a line
pointer or when an existing non-removable LP_DEAD item is encountered in
heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 7 ++++
src/backend/access/heap/vacuumlazy.c | 60 ++++++----------------------
src/include/access/heapam.h | 2 +
3 files changed, 22 insertions(+), 47 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 88adad99c39..8d531e37606 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -292,6 +292,7 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->live_tuples = 0;
presult->recently_dead_tuples = 0;
+ presult->lpdead_items = 0;
/*
* Keep track of whether or not the page is all_visible in case the caller
@@ -953,7 +954,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
+ {
presult->all_visible = false;
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
+ }
break;
}
@@ -1205,6 +1209,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
* all_visible.
*/
presult->all_visible = false;
+
+ /* Record the dead offset for vacuum */
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4770fcea021..7cacde3f852 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1409,22 +1409,11 @@ lazy_scan_prune(LVRelState *vacrel,
bool *has_lpdead_items)
{
Relation rel = vacrel->rel;
- OffsetNumber offnum,
- maxoff;
- ItemId itemid;
PruneResult presult;
HeapPageFreeze pagefrz;
- OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
- /*
- * maxoff might be reduced following line pointer array truncation in
- * heap_page_prune. That's safe for us to ignore, since the reclaimed
- * space will continue to look like LP_UNUSED items below.
- */
- maxoff = PageGetMaxOffsetNumber(page);
-
/* Initialize (or reset) page-level state */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
@@ -1432,15 +1421,14 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- lpdead_items = 0;
/*
* Prune all HOT-update chains in this page.
*
* We count the number of tuples removed from the page by the pruning step
- * in presult.ndeleted. It should not be confused with lpdead_items;
- * lpdead_items's final value can be thought of as the number of tuples
- * that were deleted from indexes.
+ * in presult.ndeleted. It should not be confused with
+ * presult.lpdead_items; presult.lpdead_items's final value can be thought
+ * of as the number of tuples that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
@@ -1450,32 +1438,10 @@ lazy_scan_prune(LVRelState *vacrel,
&pagefrz, &presult, &vacrel->offnum);
/*
- * Now scan the page to collect LP_DEAD items and check for tuples
- * requiring freezing among remaining tuples with storage. We will update
- * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
- * have determined whether or not the page is all_visible and able to
- * become all_frozen.
+ * We will update the VM after collecting LP_DEAD items and freezing
+ * tuples. Pruning will have determined whether or not the page is
+ * all_visible.
*/
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
- {
- /*
- * Set the offset number so that we can display it along with any
- * error that occurred while processing this tuple.
- */
- vacrel->offnum = offnum;
- itemid = PageGetItemId(page, offnum);
-
-
- if (ItemIdIsDead(itemid))
- {
- deadoffsets[lpdead_items++] = offnum;
- continue;
- }
-
- }
-
vacrel->offnum = InvalidOffsetNumber;
if (presult.all_frozen || presult.nfrozen > 0)
@@ -1523,7 +1489,7 @@ lazy_scan_prune(LVRelState *vacrel,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(lpdead_items == 0);
+ Assert(presult.lpdead_items == 0);
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
@@ -1539,7 +1505,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
- if (lpdead_items > 0)
+ if (presult.lpdead_items > 0)
{
VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
@@ -1548,9 +1514,9 @@ lazy_scan_prune(LVRelState *vacrel,
ItemPointerSetBlockNumber(&tmp, blkno);
- for (int i = 0; i < lpdead_items; i++)
+ for (int i = 0; i < presult.lpdead_items; i++)
{
- ItemPointerSetOffsetNumber(&tmp, deadoffsets[i]);
+ ItemPointerSetOffsetNumber(&tmp, presult.deadoffsets[i]);
dead_items->items[dead_items->num_items++] = tmp;
}
@@ -1562,7 +1528,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += lpdead_items;
+ vacrel->lpdead_items += presult.lpdead_items;
vacrel->live_tuples += presult.live_tuples;
vacrel->recently_dead_tuples += presult.recently_dead_tuples;
@@ -1571,7 +1537,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
- *has_lpdead_items = (lpdead_items > 0);
+ *has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
@@ -1639,7 +1605,7 @@ lazy_scan_prune(LVRelState *vacrel,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (lpdead_items > 0 && PageIsAllVisible(page))
+ else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 989515a628d..52bd5fc1d92 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -213,6 +213,8 @@ typedef struct PruneResult
/* Number of newly frozen tuples */
int nfrozen;
+ OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
+ int lpdead_items; /* includes existing LP_DEAD items */
} PruneResult;
/* ----------------
--
2.37.2
On 25/01/2024 00:49, Melanie Plageman wrote:
Generates 30% fewer WAL records and 12% fewer WAL bytes -- which,
depending on what else is happening on the system, can lead to vacuum
spending substantially less time on WAL writing and syncing (often 15%
less time on WAL writes and 10% less time on syncing WAL in my
testing).
Nice!
The attached patch set is broken up into many separate commits for
ease of review. Each patch does a single thing which can be explained
plainly in the commit message. Every commit passes tests and works on
its own.
About this very first change:
--- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1526,8 +1526,7 @@ lazy_scan_prune(LVRelState *vacrel, * that everyone sees it as committed? */ xmin = HeapTupleHeaderGetXmin(htup); - if (!TransactionIdPrecedes(xmin, - vacrel->cutoffs.OldestXmin)) + if (!GlobalVisTestIsRemovableXid(vacrel->vistest, xmin)) { all_visible = false; break;
Does GlobalVisTestIsRemovableXid() handle FrozenTransactionId correctly?
I read through all the patches in order, and aside from the above they
all look correct to me. Some comments on the set as whole:
I don't think we need XLOG_HEAP2_FREEZE_PAGE as a separate record type
anymore. By removing that, you also get rid of the freeze-only codepath
near the end of heap_page_prune(), and the
heap_freeze_execute_prepared() function.
The XLOG_HEAP2_FREEZE_PAGE record is a little smaller than
XLOG_HEAP2_PRUNE. But we could optimize the XLOG_HEAP2_PRUNE format for
the case that there's no pruning, just freezing. The record format
(xl_heap_prune) looks pretty complex as it is, I think it could be made
both more compact and more clear with some refactoring.
FreezeMultiXactId still takes a separate 'cutoffs' arg, but it could use
pagefrz->cutoffs now.
HeapPageFreeze has two "trackers", for the "freeze" and "no freeze"
cases. heap_page_prune() needs to track both, until it decides whether
to freeze or not. But it doesn't make much sense that the caller
(lazy_scan_prune()) has to initialize both, and has to choose which of
the values to use after the call depending on whether heap_page_prune()
froze or not. The two trackers should be just heap_page_prune()'s
internal business.
HeapPageFreeze is a bit confusing in general, as it's both an input and
an output to heap_page_prune(). Not sure what exactly to do there, but I
feel that we should make heap_page_prune()'s interface more clear.
Perhaps move the output fields to PruneResult.
Let's rename heap_page_prune() to heap_page_prune_and_freeze(), as
freezing is now an integral part of the function. And mention it in the
function comment, too.
In heap_prune_chain:
* Tuple visibility information is provided in presult->htsv.
Not this patch's fault directly, but it's not immediate clear what "is
provided" means here. Does the caller provide it, or does the function
set it, ie. is it an input or output argument? Looking at the code, it's
an input, but now it looks a bit weird that an input argument is called
'presult'.
--
Heikki Linnakangas
Neon (https://neon.tech)
Thanks so much for the review!
On Wed, Mar 6, 2024 at 7:59 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 25/01/2024 00:49, Melanie Plageman wrote:
The attached patch set is broken up into many separate commits for
ease of review. Each patch does a single thing which can be explained
plainly in the commit message. Every commit passes tests and works on
its own.About this very first change:
--- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1526,8 +1526,7 @@ lazy_scan_prune(LVRelState *vacrel, * that everyone sees it as committed? */ xmin = HeapTupleHeaderGetXmin(htup); - if (!TransactionIdPrecedes(xmin, - vacrel->cutoffs.OldestXmin)) + if (!GlobalVisTestIsRemovableXid(vacrel->vistest, xmin)) { all_visible = false; break;Does GlobalVisTestIsRemovableXid() handle FrozenTransactionId correctly?
Okay, so I thought a lot about this, and I don't understand how
GlobalVisTestIsRemovableXid() would not handle FrozenTransactionId
correctly.
vacrel->cutoffs.OldestXmin is computed initially from
GetOldestNonRemovableTransactionId() which uses ComputeXidHorizons().
GlobalVisState is updated by ComputeXidHorizons() (when it is
updated).
I do see that the comment above GlobalVisTestIsRemovableXid() says
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).
and then in
* Convert 32 bit argument to FullTransactionId. We can do so safely
* because we know the xid has to, at the very least, be between
* [oldestXid, nextXid), i.e. within 2 billion of xid.
I'm not sure what oldestXid is here.
It's true that I don't see GlobalVisTestIsRemovableXid() being called
anywhere else with an xmin as an input. I think that hints that it is
not safe with FrozenTransactionId. But I don't see what could go
wrong.
Maybe it has something to do with converting it to a FullTransactionId?
FullTransactionIdFromU64(U64FromFullTransactionId(rel) + (int32)
(xid - rel_xid));
Sorry, I couldn't quite figure it out :(
I read through all the patches in order, and aside from the above they
all look correct to me. Some comments on the set as whole:I don't think we need XLOG_HEAP2_FREEZE_PAGE as a separate record type
anymore. By removing that, you also get rid of the freeze-only codepath
near the end of heap_page_prune(), and the
heap_freeze_execute_prepared() function.
That makes sense to me.
The XLOG_HEAP2_FREEZE_PAGE record is a little smaller than
XLOG_HEAP2_PRUNE. But we could optimize the XLOG_HEAP2_PRUNE format for
the case that there's no pruning, just freezing. The record format
(xl_heap_prune) looks pretty complex as it is, I think it could be made
both more compact and more clear with some refactoring.
I'm happy to change up xl_heap_prune format. In its current form,
according to pahole, it has no holes and just 3 bytes of padding at
the end.
One way we could make it smaller is by moving the isCatalogRel member
to directly after snapshotConflictHorizon and then adding a flags
field and defining flags to indicate whether or not other members
exist at all. We could set bits for HAS_FREEZE_PLANS, HAS_REDIRECTED,
HAS_UNUSED, HAS_DEAD. Then I would remove the non-optional uint16
nredirected, nunused, nplans, ndead and just put the number of
redirected/unused/etc at the beginning of the arrays containing the
offsets. Then I could write various macros for accessing them. That
would make it smaller, but it definitely wouldn't make it less complex
(IMO).
FreezeMultiXactId still takes a separate 'cutoffs' arg, but it could use
pagefrz->cutoffs now.
Yep, I forgot to add a commit to do this. Thanks!
HeapPageFreeze has two "trackers", for the "freeze" and "no freeze"
cases. heap_page_prune() needs to track both, until it decides whether
to freeze or not. But it doesn't make much sense that the caller
(lazy_scan_prune()) has to initialize both, and has to choose which of
the values to use after the call depending on whether heap_page_prune()
froze or not. The two trackers should be just heap_page_prune()'s
internal business.HeapPageFreeze is a bit confusing in general, as it's both an input and
an output to heap_page_prune(). Not sure what exactly to do there, but I
feel that we should make heap_page_prune()'s interface more clear.
Perhaps move the output fields to PruneResult.
Great point. Perhaps I just add NewRelfrozenXid and NewRelminMxid to
PruneResult (and call it PruneFreezeResult) and then make
VacuumCutoffs an optional argument to heap_page_prune() (used by
vacuum and not on-access pruning). Then I eliminate HeapPageFreeze as
a parameter altogether.
Let's rename heap_page_prune() to heap_page_prune_and_freeze(), as
freezing is now an integral part of the function. And mention it in the
function comment, too.
Agreed. Will do in the next version. I want to get some consensus on
what to do with xl_heap_prune before going on my rebase journey with
these 15 patches.
In heap_prune_chain:
* Tuple visibility information is provided in presult->htsv.
Not this patch's fault directly, but it's not immediate clear what "is
provided" means here. Does the caller provide it, or does the function
set it, ie. is it an input or output argument? Looking at the code, it's
an input, but now it looks a bit weird that an input argument is called
'presult'.
So, htsv is a member of PruneResult on master because
heap_page_prune() populates PruneResult->htsv for use in
lazy_scan_prune(). heap_prune_chain() doesn't have access to
PruneResult on master. Once I move PruneResult to being populated both
by heap_page_prune() and heap_prune_chain(), it gets more confusing.
htsv is always populated in heap_page_prune(), but it is not until
later patches in the set that I stop accessing it in
lazy_scan_prune(). Once I do so, I move htsv from PruneResult into
PruneState -- which fixes the heap_prune_chain() confusion.
So, only intermediate commits in the set have PruneResult->htsv used
in heap_prune_chain(). The end state is that heap_prune_chain()
accesses PruneState->htsv. However, I can document how it is used more
clearly in the function comment in the intermediate commits. Or, I can
simply leave htsv as a separate input argument to heap_prune_chain()
in the intermediate commits.
- Melanie
On 09/03/2024 22:41, Melanie Plageman wrote:
On Wed, Mar 6, 2024 at 7:59 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Does GlobalVisTestIsRemovableXid() handle FrozenTransactionId correctly?
Okay, so I thought a lot about this, and I don't understand how
GlobalVisTestIsRemovableXid() would not handle FrozenTransactionId
correctly.vacrel->cutoffs.OldestXmin is computed initially from
GetOldestNonRemovableTransactionId() which uses ComputeXidHorizons().
GlobalVisState is updated by ComputeXidHorizons() (when it is
updated).I do see that the comment above GlobalVisTestIsRemovableXid() says
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).and then in
* Convert 32 bit argument to FullTransactionId. We can do so safely
* because we know the xid has to, at the very least, be between
* [oldestXid, nextXid), i.e. within 2 billion of xid.I'm not sure what oldestXid is here.
It's true that I don't see GlobalVisTestIsRemovableXid() being called
anywhere else with an xmin as an input. I think that hints that it is
not safe with FrozenTransactionId. But I don't see what could go
wrong.Maybe it has something to do with converting it to a FullTransactionId?
FullTransactionIdFromU64(U64FromFullTransactionId(rel) + (int32)
(xid - rel_xid));Sorry, I couldn't quite figure it out :(
I just tested it. No, GlobalVisTestIsRemovableXid does not work for
FrozenTransactionId. I just tested it with state->definitely_needed ==
{0, 4000000000} and xid == FrozenTransactionid, and it incorrectly
returned 'false'. It treats FrozenTransactionId as if was a regular xid '2'.
The XLOG_HEAP2_FREEZE_PAGE record is a little smaller than
XLOG_HEAP2_PRUNE. But we could optimize the XLOG_HEAP2_PRUNE format for
the case that there's no pruning, just freezing. The record format
(xl_heap_prune) looks pretty complex as it is, I think it could be made
both more compact and more clear with some refactoring.I'm happy to change up xl_heap_prune format. In its current form,
according to pahole, it has no holes and just 3 bytes of padding at
the end.One way we could make it smaller is by moving the isCatalogRel member
to directly after snapshotConflictHorizon and then adding a flags
field and defining flags to indicate whether or not other members
exist at all. We could set bits for HAS_FREEZE_PLANS, HAS_REDIRECTED,
HAS_UNUSED, HAS_DEAD. Then I would remove the non-optional uint16
nredirected, nunused, nplans, ndead and just put the number of
redirected/unused/etc at the beginning of the arrays containing the
offsets.
Sounds good.
Then I could write various macros for accessing them. That
would make it smaller, but it definitely wouldn't make it less complex
(IMO).
I don't know, it might turn out not that complex. If you define the
formats of each of those "sub-record types" as explicit structs, per
attached sketch, you won't need so many macros. Some care is still
needed with alignment though.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
On Mon, Mar 11, 2024 at 6:38 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 09/03/2024 22:41, Melanie Plageman wrote:
On Wed, Mar 6, 2024 at 7:59 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Does GlobalVisTestIsRemovableXid() handle FrozenTransactionId correctly?
Okay, so I thought a lot about this, and I don't understand how
GlobalVisTestIsRemovableXid() would not handle FrozenTransactionId
correctly.vacrel->cutoffs.OldestXmin is computed initially from
GetOldestNonRemovableTransactionId() which uses ComputeXidHorizons().
GlobalVisState is updated by ComputeXidHorizons() (when it is
updated).I do see that the comment above GlobalVisTestIsRemovableXid() says
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).and then in
* Convert 32 bit argument to FullTransactionId. We can do so safely
* because we know the xid has to, at the very least, be between
* [oldestXid, nextXid), i.e. within 2 billion of xid.I'm not sure what oldestXid is here.
It's true that I don't see GlobalVisTestIsRemovableXid() being called
anywhere else with an xmin as an input. I think that hints that it is
not safe with FrozenTransactionId. But I don't see what could go
wrong.Maybe it has something to do with converting it to a FullTransactionId?
FullTransactionIdFromU64(U64FromFullTransactionId(rel) + (int32)
(xid - rel_xid));Sorry, I couldn't quite figure it out :(
I just tested it. No, GlobalVisTestIsRemovableXid does not work for
FrozenTransactionId. I just tested it with state->definitely_needed ==
{0, 4000000000} and xid == FrozenTransactionid, and it incorrectly
returned 'false'. It treats FrozenTransactionId as if was a regular xid '2'.
I see. Looking at the original code:
if (!TransactionIdPrecedes(xmin,
vacrel->cutoffs.OldestXmin))
And the source code for TransactionIdPrecedes:
if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
return (id1 < id2);
diff = (int32) (id1 - id2);
return (diff < 0);
In your example, It seems like you mean GlobalVisState->maybe_needed is
0 and GlobalVisState->definitely_needed = 4000000000. In this example,
if vacrel->cutoffs.OldestXmin was 0, we would get a wrong answer also.
But, I do see that the comparison done by TransactionIdPrecedes() is is
quite different than that done by FullTransactionIdPrecedes() because of
the modulo 2**32 arithmetic.
Solving the handling of FrozenTransactionId specifically by
GlobalVisTestIsRemovableXid() seems like it would be done easily in our
case by wrapping it in a function which checks if
TransactionIdIsNormal() and returns true if it is not normal. But, I'm
not sure if I am missing the larger problem.
The XLOG_HEAP2_FREEZE_PAGE record is a little smaller than
XLOG_HEAP2_PRUNE. But we could optimize the XLOG_HEAP2_PRUNE format for
the case that there's no pruning, just freezing. The record format
(xl_heap_prune) looks pretty complex as it is, I think it could be made
both more compact and more clear with some refactoring.I'm happy to change up xl_heap_prune format. In its current form,
according to pahole, it has no holes and just 3 bytes of padding at
the end.One way we could make it smaller is by moving the isCatalogRel member
to directly after snapshotConflictHorizon and then adding a flags
field and defining flags to indicate whether or not other members
exist at all. We could set bits for HAS_FREEZE_PLANS, HAS_REDIRECTED,
HAS_UNUSED, HAS_DEAD. Then I would remove the non-optional uint16
nredirected, nunused, nplans, ndead and just put the number of
redirected/unused/etc at the beginning of the arrays containing the
offsets.Sounds good.
Then I could write various macros for accessing them. That
would make it smaller, but it definitely wouldn't make it less complex
(IMO).I don't know, it might turn out not that complex. If you define the
formats of each of those "sub-record types" as explicit structs, per
attached sketch, you won't need so many macros. Some care is still
needed with alignment though.
In the attached v2, I've done as you suggested and made all members
except flags and snapshotConflictHorizon optional in the xl_heap_prune
struct and obsoleted the xl_heap_freeze struct. I've kept the actual
xl_heap_freeze_page struct and heap_xlog_freeze_page() function so that
we can replay previously made XLOG_HEAP2_FREEZE_PAGE records.
Because we may set line pointers unused during vacuum's first pass, I
couldn't use the presence of nowunused without redirected or dead items
to indicate that this was a record emitted by vacuum's second pass. As
such, I haven't obsoleted the xl_heap_vacuum struct. I was thinking I
could add a flag that indicates the record was emitted by vacuum's
second pass? We would want to distinguish this so that we could set the
items unused without calling heap_page_prune_execute() -- because that
calls PageRepairFragmentation() which requires a full cleanup lock.
I introduced a few sub-record types similar to what you suggested --
they help a bit with alignment, so I think they are worth keeping. There
are comments around them, but perhaps a larger diagram of the layout of
a the new XLOG_HEAP2_PRUNE record would be helpful.
There is a bit of duplicated code between heap_xlog_prune() and
heap2_desc() since they both need to deserialize the record. Before the
code to do this was small and it didn't matter, but it might be worth
refactoring it that way now.
Note that I've made all of the changes related to obsoleting the
XLOG_HEAP2_FREEZE_PAGE record in separate commits on top of the rest of
the set for ease of review. However, I've rebased the other review
feedback into the relevant commits.
On Wed, Mar 6, 2024 at 7:59 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I don't think we need XLOG_HEAP2_FREEZE_PAGE as a separate record type
anymore. By removing that, you also get rid of the freeze-only codepath
near the end of heap_page_prune(), and the
heap_freeze_execute_prepared() function.The XLOG_HEAP2_FREEZE_PAGE record is a little smaller than
XLOG_HEAP2_PRUNE. But we could optimize the XLOG_HEAP2_PRUNE format for
the case that there's no pruning, just freezing. The record format
(xl_heap_prune) looks pretty complex as it is, I think it could be made
both more compact and more clear with some refactoring.
On the point of removing the freeze-only code path from
heap_page_prune() (now heap_page_prune_and_freeze()): while doing this,
I realized that heap_pre_freeze_checks() was not being called in the
case that we decide to freeze because we emitted an FPI while setting
the hint bit. I've fixed that, however, I've done so by moving
heap_pre_freeze_checks() into the critical section. I think that is not
okay? I could move it earlier and not do call it when the hint bit FPI
leads us to freeze tuples. But, I think that would lead to us doing a
lot less validation of tuples being frozen when checksums are enabled.
Or, I could make two critical sections?
FreezeMultiXactId still takes a separate 'cutoffs' arg, but it could use
pagefrz->cutoffs now.
Fixed this.
HeapPageFreeze has two "trackers", for the "freeze" and "no freeze"
cases. heap_page_prune() needs to track both, until it decides whether
to freeze or not. But it doesn't make much sense that the caller
(lazy_scan_prune()) has to initialize both, and has to choose which of
the values to use after the call depending on whether heap_page_prune()
froze or not. The two trackers should be just heap_page_prune()'s
internal business.
I've added new_relminmxid and new_relfrozenxid to PruneFreezeResult and
set them appropriately in heap_page_prune_and_freeze().
It's a bit sad because if it wasn't for vacrel->skippedallvis,
vacrel->NewRelfrozenXid and vacrel->NewRelminMxid would be
vacrel->cutoffs.OldestXmin and vacrel->cutoffs.OldestMxact respectively
and we could avoid having lazy_scan_prune() initializing the
HeapPageFreeze altogether and just pass VacuumCutoffs (and
heap_page_prune_opt() could pass NULL) to heap_page_prune_and_freeze().
I think it is probably worse to add both of them as additional optional
arguments, so I've just left lazy_scan_prune() with the job of
initializing them.
In heap_page_prune_and_freeze(), I initialize new_relminmxid and
new_relfrozenxid to InvalidMultiXactId and InvalidTransactionId
respectively because on-access pruning doesn't have a value to set them
to. But I wasn't sure if this was okay -- since I don't see validation
that TransactionIdIsValid() in vac_update_relstats(). It will work now
-- just worried about future issues. I could add an assert there?
HeapPageFreeze is a bit confusing in general, as it's both an input and
an output to heap_page_prune(). Not sure what exactly to do there, but I
feel that we should make heap_page_prune()'s interface more clear.
Perhaps move the output fields to PruneResult.
HeapPageFrz is now only an input argument to
heap_page_prune_and_freeze() as of the commit in which
heap_page_prune_and_freeze() becomes responsible for executing freezing.
It is still an in/out param to earlier commits because we still need
info from it to execute freezing in lazy_scan_prune().
Let's rename heap_page_prune() to heap_page_prune_and_freeze(), as
freezing is now an integral part of the function. And mention it in the
function comment, too.
I've done this.
In heap_prune_chain:
* Tuple visibility information is provided in presult->htsv.
Not this patch's fault directly, but it's not immediate clear what "is
provided" means here. Does the caller provide it, or does the function
set it, ie. is it an input or output argument? Looking at the code, it's
an input, but now it looks a bit weird that an input argument is called
'presult'.
I haven't updated the comments about this in the intermediate commits
since it ends up in the PruneState as an input.
- Melanie
Attachments:
v2-0001-lazy_scan_prune-tests-tuple-vis-with-GlobalVisTes.patchtext/x-diff; charset=us-asciiDownload
From baf4b34826c0a4726baf404501ea4e37f9522ab8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:14:47 -0500
Subject: [PATCH v2 01/17] lazy_scan_prune tests tuple vis with GlobalVisTest
One requirement for eventually combining the prune and freeze records,
is that we must check during pruning if live tuples on the page are
visible to everyone and thus, whether or not the page is all visible. We
only consider opportunistically freezing tuples if the whole page is all
visible and could be set all frozen.
During pruning (in heap_page_prune()), we do not have access to
VacuumCutoffs -- as on access pruning also calls heap_page_prune(). We
do, however, have access to a GlobalVisState. This can be used to
determine whether or not the tuple is visible to everyone. It also has
the potential of being more up-to-date than VacuumCutoffs->OldestXmin.
This commit simply modifies lazy_scan_prune() to use GlobalVisState
instead of OldestXmin. Future commits will move the
all_visible/all_frozen calculation into heap_page_prune().
---
src/backend/access/heap/vacuumlazy.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 18004907750..d1efd885c88 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1582,8 +1582,7 @@ lazy_scan_prune(LVRelState *vacrel,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(htup);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ if (!GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
{
all_visible = false;
break;
--
2.40.1
v2-0002-Pass-heap_prune_chain-PruneResult-output-paramete.patchtext/x-diff; charset=us-asciiDownload
From 2a72d92dbfb4b73c02b644e258ea924ea3111087 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:39:59 -0500
Subject: [PATCH v2 02/17] Pass heap_prune_chain() PruneResult output parameter
Future commits will set other members of PruneResult in
heap_prune_chain(), so start passing it as an output parameter now. This
eliminates the output parameter htsv -- the array of HTSV_Results --
since that is a member of the PruneResult.
---
src/backend/access/heap/pruneheap.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e2f2c37f4d6..4600ee53751 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -61,8 +61,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- int8 *htsv,
- PruneState *prstate);
+ PruneState *prstate, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -325,7 +324,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Process this item or chain of items */
presult->ndeleted += heap_prune_chain(buffer, offnum,
- presult->htsv, &prstate);
+ &prstate, presult);
}
/* Clear the offset information once we have processed the given page. */
@@ -454,7 +453,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in htsv.
+ * Tuple visibility information is provided in presult->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -484,7 +483,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- int8 *htsv, PruneState *prstate)
+ PruneState *prstate, PruneResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -505,7 +504,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(htsv[rootoffnum] != -1);
+ Assert(presult->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -528,7 +527,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -625,7 +624,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(htsv[offnum]))
+ switch (htsv_get_valid_status(presult->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
--
2.40.1
v2-0003-heap_page_prune-sets-all_visible-and-frz_conflict.patchtext/x-diff; charset=us-asciiDownload
From 635bac1f6bd4747288656097c9423688f8ff1e0e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 14:01:37 -0500
Subject: [PATCH v2 03/17] heap_page_prune sets all_visible and
frz_conflict_horizon
In order to combine the prune and freeze records, we must know if the
page is eligible to be opportunistically frozen before finishing
pruning. Save all_visible in the PruneResult and set it to false when we
see non-removable tuples which are not visible to everyone.
We will also need to ensure that the snapshotConflictHorizon for the combined
prune + freeze record is the more conservative of that calculated for each of
pruning and freezing. Calculate the visibility_cutoff_xid for the purposes of
freezing -- the newest xmin on the page -- in heap_page_prune() and save it in
PruneResult.frz_conflict_horizon.
---
src/backend/access/heap/pruneheap.c | 122 +++++++++++++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 116 +++++++------------------
src/include/access/heapam.h | 3 +
3 files changed, 146 insertions(+), 95 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4600ee53751..b3a7ce06699 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -65,8 +65,10 @@ static int heap_prune_chain(Buffer buffer,
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
-static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -249,6 +251,14 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ /*
+ * Keep track of whether or not the page is all_visible in case the caller
+ * wants to use this information to update the VM.
+ */
+ presult->all_visible = true;
+ /* for recovery conflicts */
+ presult->frz_conflict_horizon = InvalidTransactionId;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(prstate.rel);
@@ -300,8 +310,92 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
+ switch (presult->htsv[offnum])
+ {
+ case HEAPTUPLE_DEAD:
+
+ /*
+ * Deliberately delay unsetting all_visible until later during
+ * pruning. Removable dead tuples shouldn't preclude freezing
+ * the page. After finishing this first pass of tuple
+ * visibility checks, initialize all_visible_except_removable
+ * with the current value of all_visible to indicate whether
+ * or not the page is all visible except for dead tuples. This
+ * will allow us to attempt to freeze the page after pruning.
+ * Later during pruning, if we encounter an LP_DEAD item or
+ * are setting an item LP_DEAD, we will unset all_visible. As
+ * long as we unset it before updating the visibility map,
+ * this will be correct.
+ */
+ break;
+ case HEAPTUPLE_LIVE:
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed
+ * asynchronously. See SetHintBits for more info. Check that
+ * the tuple is hinted xmin-committed because of that.
+ */
+ if (presult->all_visible)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed?
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+ if (!GlobalVisTestIsRemovableXid(vistest, xmin))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, presult->frz_conflict_horizon) &&
+ TransactionIdIsNormal(xmin))
+ presult->frz_conflict_horizon = xmin;
+ }
+ break;
+ case HEAPTUPLE_RECENTLY_DEAD:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+ /* This is an expected case during concurrent vacuum */
+ presult->all_visible = false;
+ break;
+ default:
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+ break;
+ }
}
+ /*
+ * For vacuum, if the whole page will become frozen, we consider
+ * opportunistically freezing tuples. Dead tuples which will be removed by
+ * the end of vacuuming should not preclude us from opportunistically
+ * freezing. We will not be able to freeze the whole page if there are
+ * tuples present which are not visible to everyone or if there are dead
+ * tuples which are not yet removable. We need all_visible to be false if
+ * LP_DEAD tuples remain after pruning so that we do not incorrectly
+ * update the visibility map or page hint bit. So, we will update
+ * presult->all_visible to reflect the presence of LP_DEAD items while
+ * pruning and keep all_visible_except_removable to permit freezing if the
+ * whole page will eventually become all visible after removing tuples.
+ */
+ presult->all_visible_except_removable = presult->all_visible;
+
/* Scan the page */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -596,10 +690,14 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
/*
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now. We don't increment ndeleted here since
- * the LP was already marked dead.
+ * the LP was already marked dead. If it will not be marked
+ * LP_UNUSED, it will remain LP_DEAD, making the page not
+ * all_visible.
*/
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
+ else
+ presult->all_visible = false;
break;
}
@@ -736,7 +834,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
}
@@ -749,7 +847,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect item. We can clean up by setting the redirect item to
* DEAD state or LP_UNUSED if the caller indicated.
*/
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
return ndeleted;
@@ -786,13 +884,20 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
-heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+
+ /*
+ * Setting the line pointer LP_DEAD means the page will definitely not be
+ * all_visible.
+ */
+ presult->all_visible = false;
}
/*
@@ -802,7 +907,8 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
* pointers LP_DEAD if mark_unused_now is true.
*/
static void
-heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
@@ -813,7 +919,7 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
- heap_prune_record_dead(prstate, offnum);
+ heap_prune_record_dead(prstate, offnum, presult);
}
/* Record line pointer to be marked unused */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d1efd885c88..f9892f4cd08 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1422,9 +1422,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_visible,
- all_frozen;
- TransactionId visibility_cutoff_xid;
+ bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1465,17 +1463,16 @@ lazy_scan_prune(LVRelState *vacrel,
&presult, &vacrel->offnum);
/*
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Keep track of whether or not the page is all_visible and
- * all_frozen and use this information to update the VM. all_visible
- * implies 0 lpdead_items, but don't trust all_frozen result unless
- * all_visible is also set to true.
+ * Now scan the page to collect LP_DEAD items and check for tuples
+ * requiring freezing among remaining tuples with storage. We will update
+ * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
+ * have determined whether or not the page is all_visible. Keep track of
+ * whether or not the page is all_frozen and use this information to
+ * update the VM. all_visible implies lpdead_items == 0, but don't trust
+ * all_frozen result unless all_visible is also set to true.
*
- * Also keep track of the visibility cutoff xid for recovery conflicts.
*/
- all_visible = true;
all_frozen = true;
- visibility_cutoff_xid = InvalidTransactionId;
/*
* Now scan the page to collect LP_DEAD items and update the variables set
@@ -1516,11 +1513,6 @@ lazy_scan_prune(LVRelState *vacrel,
* will only happen every other VACUUM, at most. Besides, VACUUM
* must treat hastup/nonempty_pages as provisional no matter how
* LP_DEAD items are handled (handled here, or handled later on).
- *
- * Also deliberately delay unsetting all_visible until just before
- * we return to lazy_scan_heap caller, as explained in full below.
- * (This is another case where it's useful to anticipate that any
- * LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
*/
deadoffsets[lpdead_items++] = offnum;
continue;
@@ -1558,41 +1550,6 @@ lazy_scan_prune(LVRelState *vacrel,
* what acquire_sample_rows() does.
*/
live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (!GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
- {
- all_visible = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
break;
case HEAPTUPLE_RECENTLY_DEAD:
@@ -1602,7 +1559,6 @@ lazy_scan_prune(LVRelState *vacrel,
* pruning.)
*/
recently_dead_tuples++;
- all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1613,16 +1569,13 @@ lazy_scan_prune(LVRelState *vacrel,
* results. This assumption is a bit shaky, but it is what
* acquire_sample_rows() does, so be consistent.
*/
- all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
- all_visible = false;
/*
- * Count such rows as live. As above, we assume the deleting
- * transaction will commit and update the counters after we
- * report.
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
*/
live_tuples++;
break;
@@ -1665,7 +1618,7 @@ lazy_scan_prune(LVRelState *vacrel,
* page all-frozen afterwards (might not happen until final heap pass).
*/
if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (all_visible && all_frozen &&
+ (presult.all_visible_except_removable && all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1698,16 +1651,16 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->frozen_pages++;
/*
- * We can use visibility_cutoff_xid as our cutoff for conflicts
+ * We can use frz_conflict_horizon as our cutoff for conflicts
* when the whole page is eligible to become all-frozen in the VM
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (all_visible && all_frozen)
+ if (presult.all_visible_except_removable && all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = visibility_cutoff_xid;
- visibility_cutoff_xid = InvalidTransactionId;
+ snapshotConflictHorizon = presult.frz_conflict_horizon;
+ presult.frz_conflict_horizon = InvalidTransactionId;
}
else
{
@@ -1743,17 +1696,19 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && lpdead_items == 0)
+ if (presult.all_visible)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
+ Assert(lpdead_items == 0);
+
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
Assert(false);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == visibility_cutoff_xid);
+ debug_cutoff == presult.frz_conflict_horizon);
}
#endif
@@ -1778,19 +1733,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(dead_items->num_items <= dead_items->max_items);
pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
dead_items->num_items);
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on
- * to make the choice of whether or not to freeze the page unaffected
- * by the short-term presence of LP_DEAD items. These LP_DEAD items
- * were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible. It needs
- * to reflect the present state of things, as expected by our caller.
- */
- all_visible = false;
}
/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1807,20 +1749,20 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (lpdead_items > 0);
- Assert(!all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_visible || !(*has_lpdead_items));
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (all_frozen)
{
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1840,7 +1782,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
+ vmbuffer, presult.frz_conflict_horizon,
flags);
}
@@ -1888,7 +1830,7 @@ lazy_scan_prune(LVRelState *vacrel,
* it as all-frozen. Note that all_frozen is only valid if all_visible is
* true, so we must check both all_visible and all_frozen.
*/
- else if (all_visible_according_to_vm && all_visible &&
+ else if (all_visible_according_to_vm && presult.all_visible &&
all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
@@ -1905,11 +1847,11 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Set the page all-frozen (and all-visible) in the VM.
*
- * We can pass InvalidTransactionId as our visibility_cutoff_xid,
- * since a snapshotConflictHorizon sufficient to make everything safe
- * for REDO was logged when the page's tuples were frozen.
+ * We can pass InvalidTransactionId as our frz_conflict_horizon, since
+ * a snapshotConflictHorizon sufficient to make everything safe for
+ * REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f68593..4cfaf9ea46c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,6 +198,8 @@ typedef struct PruneResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ bool all_visible; /* Whether or not the page is all visible */
+ TransactionId frz_conflict_horizon; /* Newest xmin on the page */
/*
* Tuple visibility is only computed once for each tuple, for correctness
@@ -209,6 +211,7 @@ typedef struct PruneResult
* 1. Otherwise every access would need to subtract 1.
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+ bool all_visible_except_removable;
} PruneResult;
/*
--
2.40.1
v2-0004-Add-reference-to-VacuumCutoffs-in-HeapPageFreeze.patchtext/x-diff; charset=us-asciiDownload
From 2f2aac4054db86b02d780801e94cf48897ee2280 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 16:22:17 -0500
Subject: [PATCH v2 04/17] Add reference to VacuumCutoffs in HeapPageFreeze
Future commits will move opportunistic freezing into the main path of
pruning in heap_page_prune(). Because on-access pruning will not do
opportunistic freezing, it is cleaner to keep the visibility information
required for calling heap_prepare_freeze_tuple() inside of the
HeapPageFreeze structure itself by saving a reference to VacuumCutoffs.
---
src/backend/access/heap/heapam.c | 67 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 3 +-
src/include/access/heapam.h | 2 +-
3 files changed, 36 insertions(+), 36 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34bc60f625f..7261c4988d7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6023,7 +6023,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
*/
static TransactionId
FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
- const struct VacuumCutoffs *cutoffs, uint16 *flags,
+ uint16 *flags,
HeapPageFreeze *pagefrz)
{
TransactionId newxmax;
@@ -6049,12 +6049,12 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
pagefrz->freeze_required = true;
return InvalidTransactionId;
}
- else if (MultiXactIdPrecedes(multi, cutoffs->relminmxid))
+ else if (MultiXactIdPrecedes(multi, pagefrz->cutoffs->relminmxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found multixact %u from before relminmxid %u",
- multi, cutoffs->relminmxid)));
- else if (MultiXactIdPrecedes(multi, cutoffs->OldestMxact))
+ multi, pagefrz->cutoffs->relminmxid)));
+ else if (MultiXactIdPrecedes(multi, pagefrz->cutoffs->OldestMxact))
{
TransactionId update_xact;
@@ -6069,7 +6069,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u from before multi freeze cutoff %u found to be still running",
- multi, cutoffs->OldestMxact)));
+ multi, pagefrz->cutoffs->OldestMxact)));
if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
{
@@ -6080,13 +6080,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
/* replace multi with single XID for its updater? */
update_xact = MultiXactIdGetUpdateXid(multi, t_infomask);
- if (TransactionIdPrecedes(update_xact, cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(update_xact, pagefrz->cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains update XID %u from before relfrozenxid %u",
multi, update_xact,
- cutoffs->relfrozenxid)));
- else if (TransactionIdPrecedes(update_xact, cutoffs->OldestXmin))
+ pagefrz->cutoffs->relfrozenxid)));
+ else if (TransactionIdPrecedes(update_xact, pagefrz->cutoffs->OldestXmin))
{
/*
* Updater XID has to have aborted (otherwise the tuple would have
@@ -6098,7 +6098,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains committed update XID %u from before removable cutoff %u",
multi, update_xact,
- cutoffs->OldestXmin)));
+ pagefrz->cutoffs->OldestXmin)));
*flags |= FRM_INVALIDATE_XMAX;
pagefrz->freeze_required = true;
return InvalidTransactionId;
@@ -6150,9 +6150,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
{
TransactionId xid = members[i].xid;
- Assert(!TransactionIdPrecedes(xid, cutoffs->relfrozenxid));
+ Assert(!TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid));
- if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->FreezeLimit))
{
/* Can't violate the FreezeLimit postcondition */
need_replace = true;
@@ -6164,7 +6164,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
/* Can't violate the MultiXactCutoff postcondition, either */
if (!need_replace)
- need_replace = MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff);
+ need_replace = MultiXactIdPrecedes(multi, pagefrz->cutoffs->MultiXactCutoff);
if (!need_replace)
{
@@ -6203,7 +6203,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
TransactionId xid = members[i].xid;
MultiXactStatus mstatus = members[i].status;
- Assert(!TransactionIdPrecedes(xid, cutoffs->relfrozenxid));
+ Assert(!TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid));
if (!ISUPDATE_from_mxstatus(mstatus))
{
@@ -6214,12 +6214,12 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdIsCurrentTransactionId(xid) ||
TransactionIdIsInProgress(xid))
{
- if (TransactionIdPrecedes(xid, cutoffs->OldestXmin))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains running locker XID %u from before removable cutoff %u",
multi, xid,
- cutoffs->OldestXmin)));
+ pagefrz->cutoffs->OldestXmin)));
newmembers[nnewmembers++] = members[i];
has_lockers = true;
}
@@ -6277,11 +6277,11 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* We determined that updater must be kept -- add it to pending new
* members list
*/
- if (TransactionIdPrecedes(xid, cutoffs->OldestXmin))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains committed update XID %u from before removable cutoff %u",
- multi, xid, cutoffs->OldestXmin)));
+ multi, xid, pagefrz->cutoffs->OldestXmin)));
newmembers[nnewmembers++] = members[i];
}
@@ -6373,7 +6373,6 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
*/
bool
heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen)
{
@@ -6401,14 +6400,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
xmin_already_frozen = true;
else
{
- if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found xmin %u from before relfrozenxid %u",
- xid, cutoffs->relfrozenxid)));
+ xid, pagefrz->cutoffs->relfrozenxid)));
/* Will set freeze_xmin flags in freeze plan below */
- freeze_xmin = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
+ freeze_xmin = TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin);
/* Verify that xmin committed if and when freeze plan is executed */
if (freeze_xmin)
@@ -6422,8 +6421,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
xid = HeapTupleHeaderGetXvac(tuple);
if (TransactionIdIsNormal(xid))
{
- Assert(TransactionIdPrecedesOrEquals(cutoffs->relfrozenxid, xid));
- Assert(TransactionIdPrecedes(xid, cutoffs->OldestXmin));
+ Assert(TransactionIdPrecedesOrEquals(pagefrz->cutoffs->relfrozenxid, xid));
+ Assert(TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin));
/*
* For Xvac, we always freeze proactively. This allows totally_frozen
@@ -6448,8 +6447,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* perform no-op xmax processing. The only constraint is that the
* FreezeLimit/MultiXactCutoff postcondition must never be violated.
*/
- newxmax = FreezeMultiXactId(xid, tuple->t_infomask, cutoffs,
- &flags, pagefrz);
+ newxmax = FreezeMultiXactId(xid, tuple->t_infomask, &flags, pagefrz);
if (flags & FRM_NOOP)
{
@@ -6472,7 +6470,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* (This repeats work from FreezeMultiXactId, but allows "no
* freeze" tracker maintenance to happen in only one place.)
*/
- Assert(!MultiXactIdPrecedes(newxmax, cutoffs->MultiXactCutoff));
+ Assert(!MultiXactIdPrecedes(newxmax, pagefrz->cutoffs->MultiXactCutoff));
Assert(MultiXactIdIsValid(newxmax) && xid == newxmax);
}
else if (flags & FRM_RETURN_IS_XID)
@@ -6481,7 +6479,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* xmax will become an updater Xid (original MultiXact's updater
* member Xid will be carried forward as a simple Xid in Xmax).
*/
- Assert(!TransactionIdPrecedes(newxmax, cutoffs->OldestXmin));
+ Assert(!TransactionIdPrecedes(newxmax, pagefrz->cutoffs->OldestXmin));
/*
* NB -- some of these transformations are only valid because we
@@ -6505,7 +6503,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* xmax is an old MultiXactId that we have to replace with a new
* MultiXactId, to carry forward two or more original member XIDs.
*/
- Assert(!MultiXactIdPrecedes(newxmax, cutoffs->OldestMxact));
+ Assert(!MultiXactIdPrecedes(newxmax, pagefrz->cutoffs->OldestMxact));
/*
* We can't use GetMultiXactIdHintBits directly on the new multi
@@ -6540,14 +6538,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
else if (TransactionIdIsNormal(xid))
{
/* Raw xmax is normal XID */
- if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found xmax %u from before relfrozenxid %u",
- xid, cutoffs->relfrozenxid)));
+ xid, pagefrz->cutoffs->relfrozenxid)));
/* Will set freeze_xmax flags in freeze plan below */
- freeze_xmax = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
+ freeze_xmax = TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin);
/*
* Verify that xmax aborted if and when freeze plan is executed,
@@ -6627,7 +6625,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* Does this tuple force caller to freeze the entire page?
*/
pagefrz->freeze_required =
- heap_tuple_should_freeze(tuple, cutoffs,
+ heap_tuple_should_freeze(tuple, pagefrz->cutoffs,
&pagefrz->NoFreezePageRelfrozenXid,
&pagefrz->NoFreezePageRelminMxid);
}
@@ -6949,8 +6947,9 @@ heap_freeze_tuple(HeapTupleHeader tuple,
pagefrz.NoFreezePageRelfrozenXid = FreezeLimit;
pagefrz.NoFreezePageRelminMxid = MultiXactCutoff;
- do_freeze = heap_prepare_freeze_tuple(tuple, &cutoffs,
- &pagefrz, &frz, &totally_frozen);
+ pagefrz.cutoffs = &cutoffs;
+
+ do_freeze = heap_prepare_freeze_tuple(tuple, &pagefrz, &frz, &totally_frozen);
/*
* Note that because this is not a WAL-logged operation, we don't need to
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f9892f4cd08..06e0e841582 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1442,6 +1442,7 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ pagefrz.cutoffs = &vacrel->cutoffs;
tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
@@ -1587,7 +1588,7 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
+ if (heap_prepare_freeze_tuple(htup, &pagefrz,
&frozen[tuples_frozen], &totally_frozen))
{
/* Save prepared freeze plan for later */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4cfaf9ea46c..6823ab8b658 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -189,6 +189,7 @@ typedef struct HeapPageFreeze
TransactionId NoFreezePageRelfrozenXid;
MultiXactId NoFreezePageRelminMxid;
+ struct VacuumCutoffs *cutoffs;
} HeapPageFreeze;
/*
@@ -295,7 +296,6 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
--
2.40.1
v2-0005-Prepare-freeze-tuples-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 6f1fbc5b4833fb5041fa289b2644378f40904248 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 11:18:52 -0500
Subject: [PATCH v2 05/17] Prepare freeze tuples in heap_page_prune()
In order to combine the freeze and prune records, we must determine
which tuples are freezable before actually executing pruning. All of the
page modifications should be made in the same critical section along
with emitting the combined WAL. Determine whether or not tuples should
or must be frozen and whether or not the page will be all frozen as a
consequence during pruning.
---
src/backend/access/heap/pruneheap.c | 78 ++++++++++++++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 68 ++++++------------------
src/include/access/heapam.h | 13 +++++
3 files changed, 102 insertions(+), 57 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b3a7ce06699..44a5c0a917b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -62,6 +62,9 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
PruneState *prstate, PruneResult *presult);
+
+static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
+ HeapPageFreeze *pagefrz, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -155,7 +158,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false,
+ heap_page_prune(relation, buffer, vistest, false, NULL,
&presult, NULL);
/*
@@ -204,6 +207,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED during
* pruning.
*
+ * pagefrz contains both input and output parameters used if the caller is
+ * interested in potentially freezing tuples on the page.
+ *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -215,6 +221,7 @@ void
heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
OffsetNumber *off_loc)
{
@@ -250,6 +257,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ presult->nfrozen = 0;
/*
* Keep track of whether or not the page is all_visible in case the caller
@@ -396,6 +404,15 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
presult->all_visible_except_removable = presult->all_visible;
+ /*
+ * We will update the VM after pruning, collecting LP_DEAD items, and
+ * freezing tuples. Keep track of whether or not the page is all_visible
+ * and all_frozen and use this information to update the VM. all_visible
+ * implies lpdead_items == 0, but don't trust all_frozen result unless
+ * all_visible is also set to true.
+ */
+ presult->all_frozen = true;
+
/* Scan the page */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -403,14 +420,18 @@ heap_page_prune(Relation relation, Buffer buffer,
{
ItemId itemid;
- /* Ignore items already processed as part of an earlier chain */
- if (prstate.marked[offnum])
- continue;
-
/* see preceding loop */
if (off_loc)
*off_loc = offnum;
+ if (pagefrz)
+ prune_prepare_freeze_tuple(page, offnum,
+ pagefrz, presult);
+
+ /* Ignore items already processed as part of an earlier chain */
+ if (prstate.marked[offnum])
+ continue;
+
/* Nothing to do if slot is empty */
itemid = PageGetItemId(page, offnum);
if (!ItemIdIsUsed(itemid))
@@ -853,6 +874,53 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
return ndeleted;
}
+/*
+ * While pruning, before actually executing pruning and updating the line
+ * pointers, we may consider freezing tuples referred to by LP_NORMAL line
+ * pointers whose visibility status is not HEAPTUPLE_DEAD. That is to say, we
+ * want to consider freezing normal tuples which will not be removed.
+*/
+static void
+prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
+ HeapPageFreeze *pagefrz,
+ PruneResult *presult)
+{
+ bool totally_frozen;
+ HeapTupleHeader htup;
+ ItemId itemid;
+
+ Assert(pagefrz);
+
+ itemid = PageGetItemId(page, offnum);
+
+ if (!ItemIdIsNormal(itemid))
+ return;
+
+ /* We do not consider freezing tuples which will be removed. */
+ if (presult->htsv[offnum] == HEAPTUPLE_DEAD ||
+ presult->htsv[offnum] == -1)
+ return;
+
+ htup = (HeapTupleHeader) PageGetItem(page, itemid);
+
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, pagefrz,
+ &presult->frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ presult->frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to become
+ * totally frozen (according to its freeze plan), then the page definitely
+ * cannot be set all-frozen in the visibility map later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+}
+
/* Record lowest soon-prunable XID */
static void
heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 06e0e841582..4187c998d25 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1416,16 +1416,13 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff;
ItemId itemid;
PruneResult presult;
- int tuples_frozen,
- lpdead_items,
+ int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1443,7 +1440,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
recently_dead_tuples = 0;
@@ -1461,31 +1457,20 @@ lazy_scan_prune(LVRelState *vacrel,
* false otherwise.
*/
heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
- &presult, &vacrel->offnum);
+ &pagefrz, &presult, &vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and check for tuples
* requiring freezing among remaining tuples with storage. We will update
* the VM after collecting LP_DEAD items and freezing tuples. Pruning will
- * have determined whether or not the page is all_visible. Keep track of
- * whether or not the page is all_frozen and use this information to
- * update the VM. all_visible implies lpdead_items == 0, but don't trust
- * all_frozen result unless all_visible is also set to true.
+ * have determined whether or not the page is all_visible and able to
+ * become all_frozen.
*
*/
- all_frozen = true;
-
- /*
- * Now scan the page to collect LP_DEAD items and update the variables set
- * just above.
- */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
- HeapTupleHeader htup;
- bool totally_frozen;
-
/*
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
@@ -1521,8 +1506,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(ItemIdIsNormal(itemid));
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
/*
* The criteria for counting a tuple as live in this block need to
* match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
@@ -1587,29 +1570,8 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
- /* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &pagefrz,
- &frozen[tuples_frozen], &totally_frozen))
- {
- /* Save prepared freeze plan for later */
- frozen[tuples_frozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the page
- * definitely cannot be set all-frozen in the visibility map later on
- */
- if (!totally_frozen)
- all_frozen = false;
}
- /*
- * We have now divided every item on the page into either an LP_DEAD item
- * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
- * that remains and needs to be considered for freezing now (LP_UNUSED and
- * LP_REDIRECT items also remain, but are of no further interest to us).
- */
vacrel->offnum = InvalidOffsetNumber;
/*
@@ -1618,8 +1580,8 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (presult.all_visible_except_removable && all_frozen &&
+ if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ (presult.all_visible_except_removable && presult.all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1629,7 +1591,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (tuples_frozen == 0)
+ if (presult.nfrozen == 0)
{
/*
* We have no freeze plans to execute, so there's no added cost
@@ -1657,7 +1619,7 @@ lazy_scan_prune(LVRelState *vacrel,
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (presult.all_visible_except_removable && all_frozen)
+ if (presult.all_visible_except_removable && presult.all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
snapshotConflictHorizon = presult.frz_conflict_horizon;
@@ -1673,7 +1635,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(vacrel->rel, buf,
snapshotConflictHorizon,
- frozen, tuples_frozen);
+ presult.frozen, presult.nfrozen);
}
}
else
@@ -1684,8 +1646,8 @@ lazy_scan_prune(LVRelState *vacrel,
*/
vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- all_frozen = false;
- tuples_frozen = 0; /* avoid miscounts in instrumentation */
+ presult.all_frozen = false;
+ presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1708,6 +1670,8 @@ lazy_scan_prune(LVRelState *vacrel,
&debug_cutoff, &debug_all_frozen))
Assert(false);
+ Assert(presult.all_frozen == debug_all_frozen);
+
Assert(!TransactionIdIsValid(debug_cutoff) ||
debug_cutoff == presult.frz_conflict_horizon);
}
@@ -1738,7 +1702,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += tuples_frozen;
+ vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
vacrel->live_tuples += live_tuples;
vacrel->recently_dead_tuples += recently_dead_tuples;
@@ -1761,7 +1725,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (all_frozen)
+ if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1832,7 +1796,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && presult.all_visible &&
- all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 6823ab8b658..bea35afc4bd 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -212,7 +212,19 @@ typedef struct PruneResult
* 1. Otherwise every access would need to subtract 1.
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+
bool all_visible_except_removable;
+
+ /* Whether or not the page can be set all frozen in the VM */
+ bool all_frozen;
+
+ /* Number of newly frozen tuples */
+ int nfrozen;
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneResult;
/*
@@ -324,6 +336,7 @@ extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer,
--
2.40.1
v2-0006-lazy_scan_prune-reorder-freeze-execution-logic.patchtext/x-diff; charset=us-asciiDownload
From 88826eb5a1c107a35de9b86bd468e88700940c58 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 14:50:12 -0500
Subject: [PATCH v2 06/17] lazy_scan_prune reorder freeze execution logic
To combine the prune and freeze records, freezing must be done before a
pruning WAL record is emitted. We will move the freeze execution into
heap_page_prune() in future commits. lazy_scan_prune() currently
executes freezing, updates vacrel->NewRelfrozenXid and
vacrel->NewRelminMxid, and resets the snapshotConflictHorizon that the
visibility map update record may use in the same block of if statements.
This commit starts reordering that logic so that the freeze execution
can be separated from the other updates which should not be done in
pruning. It also adds a helper calculating freeze snapshot conflict
horizon. This will be useful when the freeze execution is moved into
pruning because not all callers of heap_page_prune() have access to
VacuumCutoffs.
---
src/backend/access/heap/vacuumlazy.c | 112 ++++++++++++++++-----------
1 file changed, 67 insertions(+), 45 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4187c998d25..abbb7ab3ada 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -269,6 +269,8 @@ static void update_vacuum_error_info(LVRelState *vacrel,
static void restore_vacuum_error_info(LVRelState *vacrel,
const LVSavedErrInfo *saved_vacrel);
+static TransactionId heap_frz_conflict_horizon(PruneResult *presult,
+ HeapPageFreeze *pagefrz);
/*
* heap_vacuum_rel() -- perform VACUUM for one heap relation
@@ -1373,6 +1375,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * Determine the snapshotConflictHorizon for freezing. Must only be called
+ * after pruning and determining if the page is freezable.
+ */
+static TransactionId
+heap_frz_conflict_horizon(PruneResult *presult, HeapPageFreeze *pagefrz)
+{
+ TransactionId result;
+
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts when the
+ * whole page is eligible to become all-frozen in the VM once we're done
+ * with it. Otherwise we generate a conservative cutoff by stepping back
+ * from OldestXmin.
+ */
+ if (presult->all_visible_except_removable && presult->all_frozen)
+ result = presult->frz_conflict_horizon;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ result = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(result);
+ }
+
+ return result;
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -1421,6 +1450,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
+ bool do_freeze;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1580,10 +1610,15 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ do_freeze = pagefrz.freeze_required ||
(presult.all_visible_except_removable && presult.all_frozen &&
- fpi_before != pgWalUsage.wal_fpi))
+ presult.nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+
+ if (do_freeze)
{
+ TransactionId snapshotConflictHorizon;
+
/*
* We're freezing the page. Our final NewRelfrozenXid doesn't need to
* be affected by the XIDs that are just about to be frozen anyway.
@@ -1591,52 +1626,39 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (presult.nfrozen == 0)
- {
- /*
- * We have no freeze plans to execute, so there's no added cost
- * from following the freeze path. That's why it was chosen. This
- * is important in the case where the page only contains totally
- * frozen tuples at this point (perhaps only following pruning).
- * Such pages can be marked all-frozen in the VM by our caller,
- * even though none of its tuples were newly frozen here (note
- * that the "no freeze" path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter
- * here, since it only counts pages with newly frozen tuples
- * (don't confuse that with pages newly set all-frozen in VM).
- */
- }
- else
- {
- TransactionId snapshotConflictHorizon;
+ vacrel->frozen_pages++;
- vacrel->frozen_pages++;
+ snapshotConflictHorizon = heap_frz_conflict_horizon(&presult, &pagefrz);
- /*
- * We can use frz_conflict_horizon as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM
- * once we're done with it. Otherwise we generate a conservative
- * cutoff by stepping back from OldestXmin.
- */
- if (presult.all_visible_except_removable && presult.all_frozen)
- {
- /* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = presult.frz_conflict_horizon;
- presult.frz_conflict_horizon = InvalidTransactionId;
- }
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
+ /* Using same cutoff when setting VM is now unnecessary */
+ if (presult.all_visible_except_removable && presult.all_frozen)
+ presult.frz_conflict_horizon = InvalidTransactionId;
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(vacrel->rel, buf,
+ snapshotConflictHorizon,
+ presult.frozen, presult.nfrozen);
+ }
+ else if (presult.all_frozen && presult.nfrozen == 0)
+ {
+ /* Page should be all visible except to-be-removed tuples */
+ Assert(presult.all_visible_except_removable);
+
+ /*
+ * We have no freeze plans to execute, so there's no added cost from
+ * following the freeze path. That's why it was chosen. This is
+ * important in the case where the page only contains totally frozen
+ * tuples at this point (perhaps only following pruning). Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here (note that the "no freeze"
+ * path never sets pages all-frozen).
+ *
+ * We never increment the frozen_pages instrumentation counter here,
+ * since it only counts pages with newly frozen tuples (don't confuse
+ * that with pages newly set all-frozen in VM).
+ */
+ vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
+ vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
}
else
{
--
2.40.1
v2-0007-Execute-freezing-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 4fbcb6b64c99649f76356b27c4ac39b735307585 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 8 Mar 2024 16:45:57 -0500
Subject: [PATCH v2 07/17] Execute freezing in heap_page_prune()
As a step toward combining the prune and freeze WAL records, execute
freezing in heap_page_prune(). The logic to determine whether or not to
execute freeze plans was moved from lazy_scan_prune() over to
heap_page_prune() with little modification.
---
src/backend/access/heap/heapam_handler.c | 2 +-
src/backend/access/heap/pruneheap.c | 151 +++++++++++++++++------
src/backend/access/heap/vacuumlazy.c | 129 ++++++-------------
src/backend/storage/ipc/procarray.c | 6 +-
src/include/access/heapam.h | 41 +++---
src/tools/pgindent/typedefs.list | 2 +-
6 files changed, 180 insertions(+), 151 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 680a50bf8b1..5e522f5b0ba 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1046,7 +1046,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
* We ignore unused and redirect line pointers. DEAD line pointers
* should be counted as dead, because we need vacuum to run to get rid
* of them. Note that this rule agrees with the way that
- * heap_page_prune() counts things.
+ * heap_page_prune_and_freeze() counts things.
*/
if (!ItemIdIsNormal(itemid))
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 44a5c0a917b..9c709315192 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -17,16 +17,18 @@
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
+#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "utils/snapmgr.h"
#include "utils/rel.h"
-/* Working data for heap_page_prune and subroutines */
+/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
Relation rel;
@@ -61,17 +63,18 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- PruneState *prstate, PruneResult *presult);
+ PruneState *prstate, PruneFreezeResult *presult);
static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
- HeapPageFreeze *pagefrz, PruneResult *presult);
+ HeapPageFreeze *pagefrz, HeapTupleFreeze *frozen,
+ PruneFreezeResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult);
+ PruneFreezeResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult);
+ PruneFreezeResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -151,15 +154,15 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
- PruneResult presult;
+ PruneFreezeResult presult;
/*
* For now, pass mark_unused_now as false regardless of whether or
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false, NULL,
- &presult, NULL);
+ heap_page_prune_and_freeze(relation, buffer, vistest, false, NULL,
+ &presult, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -193,7 +196,12 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
/*
- * Prune and repair fragmentation in the specified page.
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * If the page can be marked all-frozen in the visibility map, we may
+ * opportunistically freeze tuples on the page if either its tuples are old
+ * enough or freezing will be cheap enough.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -207,23 +215,24 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED during
* pruning.
*
- * pagefrz contains both input and output parameters used if the caller is
- * interested in potentially freezing tuples on the page.
+ * pagefrz is an input parameter containing visibility cutoff information and
+ * the current relfrozenxid and relminmxids used if the caller is interested in
+ * freezing tuples on the page.
*
* off_loc is the offset location required by the caller to use in error
* callback.
*
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
- * heap_page_prune() is responsible for initializing it.
+ * heap_page_prune_and_freeze() is responsible for initializing it.
*/
void
-heap_page_prune(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
- PruneResult *presult,
- OffsetNumber *off_loc)
+heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ GlobalVisState *vistest,
+ bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
+ PruneFreezeResult *presult,
+ OffsetNumber *off_loc)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -231,6 +240,14 @@ heap_page_prune(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
+ bool do_freeze;
+ int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -267,6 +284,10 @@ heap_page_prune(Relation relation, Buffer buffer,
/* for recovery conflicts */
presult->frz_conflict_horizon = InvalidTransactionId;
+ /* For advancing relfrozenxid and relminmxid */
+ presult->new_relfrozenxid = InvalidTransactionId;
+ presult->new_relminmxid = InvalidMultiXactId;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(prstate.rel);
@@ -426,7 +447,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (pagefrz)
prune_prepare_freeze_tuple(page, offnum,
- pagefrz, presult);
+ pagefrz, frozen, presult);
/* Ignore items already processed as part of an earlier chain */
if (prstate.marked[offnum])
@@ -541,6 +562,61 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ */
+ if (pagefrz)
+ do_freeze = pagefrz->freeze_required ||
+ (presult->all_visible_except_removable && presult->all_frozen &&
+ presult->nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+ else
+ do_freeze = false;
+
+ if (do_freeze)
+ {
+ frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
+
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(relation, buffer,
+ frz_conflict_horizon,
+ frozen, presult->nfrozen);
+ }
+ else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
+ {
+ /*
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all frozen and there
+ * will be no newly frozen tuples.
+ */
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+
+ /* Caller won't update new_relfrozenxid and new_relminmxid */
+ if (!pagefrz)
+ return;
+
+ /*
+ * If we will freeze tuples on the page or, even if we don't freeze tuples
+ * on the page, if we will set the page all-frozen in the visibility map,
+ * we can advance relfrozenxid and relminmxid to the values in
+ * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
+ */
+ if (presult->all_frozen || presult->nfrozen > 0)
+ {
+ presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
+ }
+ else
+ {
+ presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
+ }
}
@@ -598,7 +674,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- PruneState *prstate, PruneResult *presult)
+ PruneState *prstate, PruneFreezeResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -863,10 +939,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
/*
* We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune caused us to
- * visit the dead successor of a redirect item before visiting the
- * redirect item. We can clean up by setting the redirect item to
- * DEAD state or LP_UNUSED if the caller indicated.
+ * item. This can happen if the loop in heap_page_prune_and_freeze()
+ * caused us to visit the dead successor of a redirect item before
+ * visiting the redirect item. We can clean up by setting the
+ * redirect item to DEAD state or LP_UNUSED if the caller indicated.
*/
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
@@ -883,7 +959,8 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
static void
prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
HeapPageFreeze *pagefrz,
- PruneResult *presult)
+ HeapTupleFreeze *frozen,
+ PruneFreezeResult *presult)
{
bool totally_frozen;
HeapTupleHeader htup;
@@ -905,11 +982,11 @@ prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
/* Tuple with storage -- consider need to freeze */
if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &presult->frozen[presult->nfrozen],
+ &frozen[presult->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- presult->frozen[presult->nfrozen++].offset = offnum;
+ frozen[presult->nfrozen++].offset = offnum;
}
/*
@@ -953,7 +1030,7 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult)
+ PruneFreezeResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
@@ -976,7 +1053,7 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
*/
static void
heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult)
+ PruneFreezeResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
@@ -1003,9 +1080,9 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
/*
- * Perform the actual page changes needed by heap_page_prune.
- * It is expected that the caller has a full cleanup lock on the
- * buffer.
+ * Perform the actual page pruning modifications needed by
+ * heap_page_prune_and_freeze(). It is expected that the caller has a full
+ * cleanup lock on the buffer.
*/
void
heap_page_prune_execute(Buffer buffer,
@@ -1119,11 +1196,11 @@ heap_page_prune_execute(Buffer buffer,
#ifdef USE_ASSERT_CHECKING
/*
- * When heap_page_prune() was called, mark_unused_now may have been
- * passed as true, which allows would-be LP_DEAD items to be made
- * LP_UNUSED instead. This is only possible if the relation has no
- * indexes. If there are any dead items, then mark_unused_now was not
- * true and every item being marked LP_UNUSED must refer to a
+ * When heap_page_prune_and_freeze() was called, mark_unused_now may
+ * have been passed as true, which allows would-be LP_DEAD items to be
+ * made LP_UNUSED instead. This is only possible if the relation has
+ * no indexes. If there are any dead items, then mark_unused_now was
+ * not true and every item being marked LP_UNUSED must refer to a
* heap-only tuple.
*/
if (ndead > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index abbb7ab3ada..6dd8d457c9c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -269,9 +269,6 @@ static void update_vacuum_error_info(LVRelState *vacrel,
static void restore_vacuum_error_info(LVRelState *vacrel,
const LVSavedErrInfo *saved_vacrel);
-static TransactionId heap_frz_conflict_horizon(PruneResult *presult,
- HeapPageFreeze *pagefrz);
-
/*
* heap_vacuum_rel() -- perform VACUUM for one heap relation
*
@@ -432,12 +429,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* as an upper bound on the XIDs stored in the pages we'll actually scan
* (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
*
- * Next acquire vistest, a related cutoff that's used in heap_page_prune.
- * We expect vistest will always make heap_page_prune remove any deleted
- * tuple whose xmax is < OldestXmin. lazy_scan_prune must never become
- * confused about whether a tuple should be frozen or removed. (In the
- * future we might want to teach lazy_scan_prune to recompute vistest from
- * time to time, to increase the number of dead tuples it can prune away.)
+ * Next acquire vistest, a related cutoff that's used in
+ * heap_page_prune_and_freeze(). We expect vistest will always make
+ * heap_page_prune_and_freeze() remove any deleted tuple whose xmax is <
+ * OldestXmin. lazy_scan_prune must never become confused about whether a
+ * tuple should be frozen or removed. (In the future we might want to
+ * teach lazy_scan_prune to recompute vistest from time to time, to
+ * increase the number of dead tuples it can prune away.)
*/
vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
@@ -1379,8 +1377,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
* Determine the snapshotConflictHorizon for freezing. Must only be called
* after pruning and determining if the page is freezable.
*/
-static TransactionId
-heap_frz_conflict_horizon(PruneResult *presult, HeapPageFreeze *pagefrz)
+TransactionId
+heap_frz_conflict_horizon(PruneFreezeResult *presult, HeapPageFreeze *pagefrz)
{
TransactionId result;
@@ -1407,21 +1405,21 @@ heap_frz_conflict_horizon(PruneResult *presult, HeapPageFreeze *pagefrz)
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
- * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
- * whether or not a tuple should be considered DEAD. This happened when an
- * inserting transaction concurrently aborted (after our heap_page_prune()
- * call, before our HeapTupleSatisfiesVacuum() call). There was rather a lot
- * of complexity just so we could deal with tuples that were DEAD to VACUUM,
- * but nevertheless were left with storage after pruning.
+ * Prior to PostgreSQL 14 there were very rare cases where
+ * heap_page_prune_and_freeze() was allowed to disagree with our
+ * HeapTupleSatisfiesVacuum() call about whether or not a tuple should be
+ * considered DEAD. This happened when an inserting transaction concurrently
+ * aborted (after our heap_page_prune_and_freeze() call, before our
+ * HeapTupleSatisfiesVacuum() call). There was rather a lot of complexity just
+ * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
+ * left with storage after pruning.
*
* As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune()'s visibility check. Without the second call to
- * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
- * disagreement. We'll just handle such tuples as if they had become fully dead
- * right after this operation completes instead of in the middle of it. Note that
- * any tuple that becomes dead after the call to heap_page_prune() can't need to
- * be frozen, because it was visible to another session when vacuum started.
+ * result of heap_page_prune_and_freeze()'s visibility check. Without the
+ * second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
+ * there can be no disagreement. We'll just handle such tuples as if they had
+ * become fully dead right after this operation completes instead of in the
+ * middle of it.
*
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
@@ -1444,26 +1442,24 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
- PruneResult presult;
+ PruneFreezeResult presult;
int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool do_freeze;
- int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
/*
* maxoff might be reduced following line pointer array truncation in
- * heap_page_prune. That's safe for us to ignore, since the reclaimed
- * space will continue to look like LP_UNUSED items below.
+ * heap_page_prune_and_freeze(). That's safe for us to ignore, since the
+ * reclaimed space will continue to look like LP_UNUSED items below.
*/
maxoff = PageGetMaxOffsetNumber(page);
- /* Initialize (or reset) page-level state */
+ /* Initialize pagefrz */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
@@ -1475,7 +1471,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples = 0;
/*
- * Prune all HOT-update chains in this page.
+ * Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
* in presult.ndeleted. It should not be confused with lpdead_items;
@@ -1486,8 +1482,8 @@ lazy_scan_prune(LVRelState *vacrel,
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
* false otherwise.
*/
- heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
- &pagefrz, &presult, &vacrel->offnum);
+ heap_page_prune_and_freeze(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+ &pagefrz, &presult, &vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and check for tuples
@@ -1604,72 +1600,23 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = InvalidOffsetNumber;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- do_freeze = pagefrz.freeze_required ||
- (presult.all_visible_except_removable && presult.all_frozen &&
- presult.nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
+ Assert(MultiXactIdIsValid(presult.new_relminmxid));
+ vacrel->NewRelfrozenXid = presult.new_relfrozenxid;
+ Assert(TransactionIdIsValid(presult.new_relfrozenxid));
+ vacrel->NewRelminMxid = presult.new_relminmxid;
- if (do_freeze)
+ if (presult.nfrozen > 0)
{
- TransactionId snapshotConflictHorizon;
-
/*
- * We're freezing the page. Our final NewRelfrozenXid doesn't need to
- * be affected by the XIDs that are just about to be frozen anyway.
+ * We never increment the frozen_pages instrumentation counter when
+ * nfrozen == 0, since it only counts pages with newly frozen tuples
+ * (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
-
vacrel->frozen_pages++;
- snapshotConflictHorizon = heap_frz_conflict_horizon(&presult, &pagefrz);
-
/* Using same cutoff when setting VM is now unnecessary */
- if (presult.all_visible_except_removable && presult.all_frozen)
+ if (presult.all_frozen)
presult.frz_conflict_horizon = InvalidTransactionId;
-
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
- else if (presult.all_frozen && presult.nfrozen == 0)
- {
- /* Page should be all visible except to-be-removed tuples */
- Assert(presult.all_visible_except_removable);
-
- /*
- * We have no freeze plans to execute, so there's no added cost from
- * following the freeze path. That's why it was chosen. This is
- * important in the case where the page only contains totally frozen
- * tuples at this point (perhaps only following pruning). Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here (note that the "no freeze"
- * path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter here,
- * since it only counts pages with newly frozen tuples (don't confuse
- * that with pages newly set all-frozen in VM).
- */
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- }
- else
- {
- /*
- * Page requires "no freeze" processing. It might be set all-visible
- * in the visibility map, but it can never be set all-frozen.
- */
- vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- presult.all_frozen = false;
- presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 9eea1ed315a..7bffe09fb5d 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1715,9 +1715,9 @@ TransactionIdIsActive(TransactionId xid)
* Note: the approximate horizons (see definition of GlobalVisState) are
* updated by the computations done here. That's currently required for
* correctness and a small optimization. Without doing so it's possible that
- * heap vacuum's call to heap_page_prune() uses a more conservative horizon
- * than later when deciding which tuples can be removed - which the code
- * doesn't expect (breaking HOT).
+ * heap vacuum's call to heap_page_prune_and_freeze() uses a more conservative
+ * horizon than later when deciding which tuples can be removed - which the
+ * code doesn't expect (breaking HOT).
*/
static void
ComputeXidHorizons(ComputeXidHorizonsResult *h)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bea35afc4bd..69d97bb8ece 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -195,7 +195,7 @@ typedef struct HeapPageFreeze
/*
* Per-page state returned from pruning
*/
-typedef struct PruneResult
+typedef struct PruneFreezeResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
@@ -204,9 +204,10 @@ typedef struct PruneResult
/*
* Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
*
* This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
* 1. Otherwise every access would need to subtract 1.
@@ -221,17 +222,18 @@ typedef struct PruneResult
/* Number of newly frozen tuples */
int nfrozen;
- /*
- * One entry for every tuple that we may freeze.
- */
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
-} PruneResult;
+ /* New value of relfrozenxid found by heap_page_prune_and_freeze() */
+ TransactionId new_relfrozenxid;
+
+ /* New value of relminmxid found by heap_page_prune_and_freeze() */
+ MultiXactId new_relminmxid;
+} PruneFreezeResult;
/*
* Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneResult.htsv for details. This helper function is meant to
- * guard against examining visibility status array members which have not yet
- * been computed.
+ * of int8. See PruneFreezeResult.htsv for details. This helper function is
+ * meant to guard against examining visibility status array members which have
+ * not yet been computed.
*/
static inline HTSV_Result
htsv_get_valid_status(int status)
@@ -307,6 +309,9 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
Buffer *buffer, struct TM_FailureData *tmfd);
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+
+extern TransactionId heap_frz_conflict_horizon(PruneFreezeResult *presult,
+ HeapPageFreeze *pagefrz);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
@@ -333,12 +338,12 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
- PruneResult *presult,
- OffsetNumber *off_loc);
+extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ struct GlobalVisState *vistest,
+ bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
+ PruneFreezeResult *presult,
+ OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index aa7a25b8f8c..1c1a4d305d6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2175,7 +2175,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
-PruneResult
+PruneFreezeResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.40.1
v2-0008-Make-opp-freeze-heuristic-compatible-with-prune-f.patchtext/x-diff; charset=us-asciiDownload
From 4845d94e77dd904ccc7276d0655b9139d1c9bb04 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 16:11:35 -0500
Subject: [PATCH v2 08/17] Make opp freeze heuristic compatible with
prune+freeze record
Once the prune and freeze records are combined, we will no longer be
able to use a test of whether or not pruning emitted an FPI to decide
whether or not to opportunistically freeze a freezable page.
While this heuristic should be improved, for now, approximate the
previous logic by keeping track of whether or not a hint bit FPI was
emitted during visibility checks (when checksums are on) and combine
that with checking XLogCheckBufferNeedsBackup(). If we just finished
deciding whether or not to prune and the current buffer seems to need an
FPI after modification, it is likely that pruning would have emitted an
FPI.
---
src/backend/access/heap/pruneheap.c | 58 +++++++++++++++++++++--------
1 file changed, 43 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9c709315192..e715fc29a83 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -241,6 +241,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PruneState prstate;
HeapTupleData tup;
bool do_freeze;
+ bool do_prune;
+ bool whole_page_freezable;
+ bool hint_bit_fpi;
+ bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
TransactionId frz_conflict_horizon = InvalidTransactionId;
@@ -410,6 +414,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
}
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted. Then reset fpi_before for no prune case.
+ */
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ fpi_before = pgWalUsage.wal_fpi;
+
/*
* For vacuum, if the whole page will become frozen, we consider
* opportunistically freezing tuples. Dead tuples which will be removed by
@@ -467,11 +478,42 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = InvalidOffsetNumber;
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
+ /*
+ * Only incur overhead of checking if we will do an FPI if we might use
+ * the information.
+ */
+ if (do_prune && pagefrz)
+ prune_fpi = XLogCheckBufferNeedsBackup(buffer);
+
+ /* Is the whole page freezable? And is there something to freeze */
+ whole_page_freezable = presult->all_visible_except_removable &&
+ presult->all_frozen;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and prune
+ * records are combined, this heuristic couldn't be used anymore. The
+ * opportunistic freeze heuristic must be improved; however, for now, try
+ * to approximate it.
+ */
+
+ do_freeze = pagefrz &&
+ (pagefrz->freeze_required ||
+ (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
/* Have we found any prunable items? */
- if (prstate.nredirected > 0 || prstate.ndead > 0 || prstate.nunused > 0)
+ if (do_prune)
{
/*
* Apply the planned item changes, then repair page fragmentation, and
@@ -563,20 +605,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- if (pagefrz)
- do_freeze = pagefrz->freeze_required ||
- (presult->all_visible_except_removable && presult->all_frozen &&
- presult->nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
- else
- do_freeze = false;
-
if (do_freeze)
{
frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
--
2.40.1
v2-0009-Separate-tuple-pre-freeze-checks-and-invoke-earli.patchtext/x-diff; charset=us-asciiDownload
From 476e7d43bd991cfcb2aed540ce5fda860ccce6c9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 16:53:45 -0500
Subject: [PATCH v2 09/17] Separate tuple pre freeze checks and invoke earlier
When combining the prune and freeze records their critical sections will
have to be combined. heap_freeze_execute_prepared() does a set of pre
freeze validations before starting its critical section. Move these
validations into a helper function, heap_pre_freeze_checks(), and invoke
it in heap_page_prune() before the pruning critical section.
Also move up the calculation of the freeze snapshot conflict horizon.
---
src/backend/access/heap/heapam.c | 58 ++++++++++++++++-------------
src/backend/access/heap/pruneheap.c | 8 +++-
src/include/access/heapam.h | 3 ++
3 files changed, 42 insertions(+), 27 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7261c4988d7..16e3f2520a4 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6659,35 +6659,19 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
- */
+* Perform xmin/xmax XID status sanity checks before calling
+* heap_freeze_execute_prepared().
+*
+* heap_prepare_freeze_tuple doesn't perform these checks directly because
+* pg_xact lookups are relatively expensive. They shouldn't be repeated
+* by successive VACUUMs that each decide against freezing the same page.
+*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- /*
- * Perform xmin/xmax XID status sanity checks before critical section.
- *
- * heap_prepare_freeze_tuple doesn't perform these checks directly because
- * pg_xact lookups are relatively expensive. They shouldn't be repeated
- * by successive VACUUMs that each decide against freezing the same page.
- */
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6726,6 +6710,30 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xmax)));
}
}
+}
+
+/*
+ * heap_freeze_execute_prepared
+ *
+ * Executes freezing of one or more heap tuples on a page on behalf of caller.
+ * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
+ * Caller must set 'offset' in each plan for us. Note that we destructively
+ * sort caller's tuples array in-place, so caller had better be done with it.
+ *
+ * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
+ * later on without any risk of unsafe pg_xact lookups, even following a hard
+ * crash (or when querying from a standby). We represent freezing by setting
+ * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
+ * See section on buffer access rules in src/backend/storage/buffer/README.
+ */
+void
+heap_freeze_execute_prepared(Relation rel, Buffer buffer,
+ TransactionId snapshotConflictHorizon,
+ HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
+
+ Assert(ntuples > 0);
START_CRIT_SECTION();
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e715fc29a83..bac461940de 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -509,6 +509,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
(pagefrz->freeze_required ||
(whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+ if (do_freeze)
+ {
+ heap_pre_freeze_checks(buffer, frozen, presult->nfrozen);
+ frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
+ }
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -607,8 +613,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
-
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(relation, buffer,
frz_conflict_horizon,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 69d97bb8ece..d14f36d9ce7 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -315,6 +315,9 @@ extern TransactionId heap_frz_conflict_horizon(PruneFreezeResult *presult,
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
+
+extern void heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
TransactionId snapshotConflictHorizon,
HeapTupleFreeze *tuples, int ntuples);
--
2.40.1
v2-0010-Inline-heap_freeze_execute_prepared.patchtext/x-diff; charset=us-asciiDownload
From 78f16c5cc57c1d8f60f69d583411ab403f6dba36 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 17:03:17 -0500
Subject: [PATCH v2 10/17] Inline heap_freeze_execute_prepared()
In order to merge freeze and prune records, the execution of tuple
freezing and the WAL logging of the changes to the page must be
separated so that the WAL logging can be combined with prune WAL
logging. This commit makes a helper for the tuple freezing and then
inlines the contents of heap_freeze_execute_prepared() where it is
called in heap_page_prune(). The original function,
heap_freeze_execute_prepared() is retained because the "no prune" case
in heap_page_prune() must still be able to emit a freeze record.
---
src/backend/access/heap/heapam.c | 61 +++++++++++++++++------------
src/backend/access/heap/pruneheap.c | 51 ++++++++++++++++++++++--
src/include/access/heapam.h | 8 ++++
3 files changed, 90 insertions(+), 30 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 16e3f2520a4..a3691584c55 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -91,9 +91,6 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
ItemPointer ctid, TransactionId xid,
LockTupleMode mode);
-static int heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
- xl_heap_freeze_plan *plans_out,
- OffsetNumber *offsets_out);
static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
uint16 *new_infomask2);
static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
@@ -6713,30 +6710,17 @@ heap_pre_freeze_checks(Buffer buffer,
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
+ * Helper which executes freezing of one or more heap tuples on a page on
+ * behalf of caller. Caller passes an array of tuple plans from
+ * heap_prepare_freeze_tuple. Caller must set 'offset' in each plan for us.
+ * Must be called in a critical section that also marks the buffer dirty and,
+ * if needed, emits WAL.
*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- START_CRIT_SECTION();
-
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6746,6 +6730,29 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
htup = (HeapTupleHeader) PageGetItem(page, itemid);
heap_execute_freeze_tuple(htup, frz);
}
+}
+
+/*
+ * heap_freeze_execute_prepared
+ *
+ * Execute freezing of prepared tuples and WAL-logs the changes so that VACUUM
+ * can advance the rel's relfrozenxid later on without any risk of unsafe
+ * pg_xact lookups, even following a hard crash (or when querying from a
+ * standby). We represent freezing by setting infomask bits in tuple headers,
+ * but this shouldn't be thought of as a hint. See section on buffer access
+ * rules in src/backend/storage/buffer/README. Must be called from within a
+ * critical section.
+ */
+void
+heap_freeze_execute_prepared(Relation rel, Buffer buffer,
+ TransactionId snapshotConflictHorizon,
+ HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
+
+ Assert(ntuples > 0);
+
+ heap_freeze_prepared_tuples(buffer, tuples, ntuples);
MarkBufferDirty(buffer);
@@ -6758,7 +6765,11 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xl_heap_freeze_page xlrec;
XLogRecPtr recptr;
- /* Prepare deduplicated representation for use in WAL record */
+ /*
+ * Prepare deduplicated representation for use in WAL record
+ * Destructively sorts tuples array in-place, so caller had better be
+ * done with it.
+ */
nplans = heap_log_freeze_plan(tuples, ntuples, plans, offsets);
xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
@@ -6783,8 +6794,6 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
PageSetLSN(page, recptr);
}
-
- END_CRIT_SECTION();
}
/*
@@ -6874,7 +6883,7 @@ heap_log_freeze_new_plan(xl_heap_freeze_plan *plan, HeapTupleFreeze *frz)
* (actually there is one array per freeze plan, but that's not of immediate
* concern to our caller).
*/
-static int
+int
heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
xl_heap_freeze_plan *plans_out,
OffsetNumber *offsets_out)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index bac461940de..d8b7eea5c21 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -613,10 +613,53 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(relation, buffer,
- frz_conflict_horizon,
- frozen, presult->nfrozen);
+ START_CRIT_SECTION();
+
+ Assert(presult->nfrozen > 0);
+
+ heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
+
+ MarkBufferDirty(buffer);
+
+ /* Now WAL-log freezing if necessary */
+ if (RelationNeedsWAL(relation))
+ {
+ xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
+ OffsetNumber offsets[MaxHeapTuplesPerPage];
+ int nplans;
+ xl_heap_freeze_page xlrec;
+ XLogRecPtr recptr;
+
+ /*
+ * Prepare deduplicated representation for use in WAL record
+ * Destructively sorts tuples array in-place.
+ */
+ nplans = heap_log_freeze_plan(frozen, presult->nfrozen, plans, offsets);
+
+ xlrec.snapshotConflictHorizon = frz_conflict_horizon;
+ xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
+ xlrec.nplans = nplans;
+
+ XLogBeginInsert();
+ XLogRegisterData((char *) &xlrec, SizeOfHeapFreezePage);
+
+ /*
+ * The freeze plan array and offset array are not actually in the
+ * buffer, but pretend that they are. When XLogInsert stores the
+ * whole buffer, the arrays need not be stored too.
+ */
+ XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBufData(0, (char *) plans,
+ nplans * sizeof(xl_heap_freeze_plan));
+ XLogRegisterBufData(0, (char *) offsets,
+ presult->nfrozen * sizeof(OffsetNumber));
+
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE_PAGE);
+
+ PageSetLSN(page, recptr);
+ }
+
+ END_CRIT_SECTION();
}
else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
{
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index d14f36d9ce7..41ebbb9f931 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -14,6 +14,7 @@
#ifndef HEAPAM_H
#define HEAPAM_H
+#include "access/heapam_xlog.h"
#include "access/relation.h" /* for backward compatibility */
#include "access/relscan.h"
#include "access/sdir.h"
@@ -321,9 +322,16 @@ extern void heap_pre_freeze_checks(Buffer buffer,
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
TransactionId snapshotConflictHorizon,
HeapTupleFreeze *tuples, int ntuples);
+
+extern void heap_freeze_prepared_tuples(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern bool heap_freeze_tuple(HeapTupleHeader tuple,
TransactionId relfrozenxid, TransactionId relminmxid,
TransactionId FreezeLimit, TransactionId MultiXactCutoff);
+
+extern int heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
+ xl_heap_freeze_plan *plans_out,
+ OffsetNumber *offsets_out);
extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
TransactionId *NoFreezePageRelfrozenXid,
--
2.40.1
v2-0011-Exit-heap_page_prune-early-if-no-prune.patchtext/x-diff; charset=us-asciiDownload
From 675273a9c3bd0d2554b023a6c1e2da511931b6ed Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 17:42:05 -0500
Subject: [PATCH v2 11/17] Exit heap_page_prune() early if no prune
If there is nothing to be pruned on the page, heap_page_prune() will
consider whether or not to update the page's pd_prune_xid and whether or
not to freeze the page. In this case, if we decide to freeze the page,
we will need to emit a freeze record.
Future commits will emit a combined freeze+prune record for cases in
which we are both pruning and freezing. In the no prune case, we are
done with heap_page_prune() after checking whether or not to set
pd_prune_xid. By reversing the prune and no prune cases so that the no
prune case is first, we can exit early in the no prune case. This allows
us to reduce the indentation level of the remaining code and not have to
validate whether or not we are, in fact, pruning.
Since we now exit early in the no prune case, we must set nfrozen and
all_frozen to their final values before executing pruning or freezing.
---
src/backend/access/heap/pruneheap.c | 195 ++++++++++++++++------------
1 file changed, 111 insertions(+), 84 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d8b7eea5c21..ca64c45d8a3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -514,80 +514,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
heap_pre_freeze_checks(buffer, frozen, presult->nfrozen);
frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
}
-
- /* Any error while applying the changes is critical */
- START_CRIT_SECTION();
-
- /* Have we found any prunable items? */
- if (do_prune)
+ else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
{
/*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
- */
- heap_page_prune_execute(buffer,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
-
- /*
- * Update the page's pd_prune_xid field to either zero, or the lowest
- * XID of any soon-prunable tuple.
- */
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
-
- /*
- * Also clear the "page is full" flag, since there's no point in
- * repeating the prune/defrag process until something else happens to
- * the page.
- */
- PageClearFull(page);
-
- MarkBufferDirty(buffer);
-
- /*
- * Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all frozen and there
+ * will be no newly frozen tuples.
*/
- if (RelationNeedsWAL(relation))
- {
- xl_heap_prune xlrec;
- XLogRecPtr recptr;
-
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
- xlrec.nredirected = prstate.nredirected;
- xlrec.ndead = prstate.ndead;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
-
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
-
- /*
- * The OffsetNumber arrays are not actually in the buffer, but we
- * pretend that they are. When XLogInsert stores the whole
- * buffer, the offset arrays need not be stored too.
- */
- if (prstate.nredirected > 0)
- XLogRegisterBufData(0, (char *) prstate.redirected,
- prstate.nredirected *
- sizeof(OffsetNumber) * 2);
-
- if (prstate.ndead > 0)
- XLogRegisterBufData(0, (char *) prstate.nowdead,
- prstate.ndead * sizeof(OffsetNumber));
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
- if (prstate.nunused > 0)
- XLogRegisterBufData(0, (char *) prstate.nowunused,
- prstate.nunused * sizeof(OffsetNumber));
+ /* Record number of newly-set-LP_DEAD items for caller */
+ presult->nnewlpdead = prstate.ndead;
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
- PageSetLSN(BufferGetPage(buffer), recptr);
- }
- }
- else
+ /* Have we found any prunable items? */
+ if (!do_prune)
{
+ /* Any error while applying the changes is critical */
+ START_CRIT_SECTION();
+
/*
* If we didn't prune anything, but have found a new value for the
* pd_prune_xid field, update it and mark the buffer dirty. This is
@@ -604,17 +551,105 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageClearFull(page);
MarkBufferDirtyHint(buffer, true);
}
+
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+
+ /*
+ * We may have decided not to opportunistically freeze above because
+ * pruning would not emit an FPI. Now, however, if checksums are
+ * enabled, setting the hint bit may have emitted an FPI. Check again
+ * if we should freeze.
+ */
+ if (!do_freeze && hint_bit_fpi)
+ do_freeze = pagefrz &&
+ (pagefrz->freeze_required ||
+ (whole_page_freezable && presult->nfrozen > 0));
+
+ if (do_freeze)
+ {
+ heap_freeze_execute_prepared(relation, buffer,
+ frz_conflict_horizon,
+ frozen, presult->nfrozen);
+ }
+ else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
+ {
+ presult->all_frozen = false;
+ presult->nfrozen = 0;
+ }
+
+ END_CRIT_SECTION();
+
+ goto update_frozenxids;
}
- END_CRIT_SECTION();
+ START_CRIT_SECTION();
- /* Record number of newly-set-LP_DEAD items for caller */
- presult->nnewlpdead = prstate.ndead;
+ /*
+ * Apply the planned item changes, then repair page fragmentation, and
+ * update the page's hint bit about whether it has free line pointers.
+ */
+ heap_page_prune_execute(buffer,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
- if (do_freeze)
+ /*
+ * Update the page's pd_prune_xid field to either zero, or the lowest XID
+ * of any soon-prunable tuple.
+ */
+ ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
+
+ /*
+ * Also clear the "page is full" flag, since there's no point in repeating
+ * the prune/defrag process until something else happens to the page.
+ */
+ PageClearFull(page);
+
+ MarkBufferDirty(buffer);
+
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
+ */
+ if (RelationNeedsWAL(relation))
{
- START_CRIT_SECTION();
+ xl_heap_prune xlrec;
+ XLogRecPtr recptr;
+
+ xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
+ xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
+ xlrec.nredirected = prstate.nredirected;
+ xlrec.ndead = prstate.ndead;
+
+ XLogBeginInsert();
+ XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
+
+ XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ /*
+ * The OffsetNumber arrays are not actually in the buffer, but we
+ * pretend that they are. When XLogInsert stores the whole buffer,
+ * the offset arrays need not be stored too.
+ */
+ if (prstate.nredirected > 0)
+ XLogRegisterBufData(0, (char *) prstate.redirected,
+ prstate.nredirected *
+ sizeof(OffsetNumber) * 2);
+
+ if (prstate.ndead > 0)
+ XLogRegisterBufData(0, (char *) prstate.nowdead,
+ prstate.ndead * sizeof(OffsetNumber));
+
+ if (prstate.nunused > 0)
+ XLogRegisterBufData(0, (char *) prstate.nowunused,
+ prstate.nunused * sizeof(OffsetNumber));
+
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+
+ PageSetLSN(BufferGetPage(buffer), recptr);
+ }
+
+ if (do_freeze)
+ {
Assert(presult->nfrozen > 0);
heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
@@ -658,20 +693,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageSetLSN(page, recptr);
}
-
- END_CRIT_SECTION();
- }
- else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
- {
- /*
- * If we will neither freeze tuples on the page nor set the page all
- * frozen in the visibility map, the page is not all frozen and there
- * will be no newly frozen tuples.
- */
- presult->all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumentation */
}
+ END_CRIT_SECTION();
+
+update_frozenxids:
+
/* Caller won't update new_relfrozenxid and new_relminmxid */
if (!pagefrz)
return;
--
2.40.1
v2-0012-Merge-prune-and-freeze-records.patchtext/x-diff; charset=us-asciiDownload
From 2f8d219e4524e95bcc4744695182b930f37de90b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 17:55:31 -0500
Subject: [PATCH v2 12/17] Merge prune and freeze records
When there are both tuples to prune and freeze on a page, emit a single,
combined prune record containing the offsets for pruning and the freeze
plans and offsets for freezing. This will reduce the number of WAL
records emitted.
---
src/backend/access/heap/heapam.c | 42 ++++++++++++--
src/backend/access/heap/pruneheap.c | 85 +++++++++++++----------------
src/include/access/heapam_xlog.h | 20 +++++--
3 files changed, 90 insertions(+), 57 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a3691584c55..a8f35eba3c9 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8803,24 +8803,28 @@ heap_xlog_prune(XLogReaderState *record)
if (action == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
- OffsetNumber *end;
OffsetNumber *redirected;
OffsetNumber *nowdead;
OffsetNumber *nowunused;
int nredirected;
int ndead;
int nunused;
+ int nplans;
Size datalen;
+ xl_heap_freeze_plan *plans;
+ OffsetNumber *frz_offsets;
+ int curoff = 0;
- redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
-
+ nplans = xlrec->nplans;
nredirected = xlrec->nredirected;
ndead = xlrec->ndead;
- end = (OffsetNumber *) ((char *) redirected + datalen);
+ nunused = xlrec->nunused;
+
+ plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, &datalen);
+ redirected = (OffsetNumber *) &plans[nplans];
nowdead = redirected + (nredirected * 2);
nowunused = nowdead + ndead;
- nunused = (end - nowunused);
- Assert(nunused >= 0);
+ frz_offsets = nowunused + nunused;
/* Update all line pointers per the record, and repair fragmentation */
heap_page_prune_execute(buffer,
@@ -8828,6 +8832,32 @@ heap_xlog_prune(XLogReaderState *record)
nowdead, ndead,
nowunused, nunused);
+ for (int p = 0; p < nplans; p++)
+ {
+ HeapTupleFreeze frz;
+
+ /*
+ * Convert freeze plan representation from WAL record into
+ * per-tuple format used by heap_execute_freeze_tuple
+ */
+ frz.xmax = plans[p].xmax;
+ frz.t_infomask2 = plans[p].t_infomask2;
+ frz.t_infomask = plans[p].t_infomask;
+ frz.frzflags = plans[p].frzflags;
+ frz.offset = InvalidOffsetNumber; /* unused, but be tidy */
+
+ for (int i = 0; i < plans[p].ntuples; i++)
+ {
+ OffsetNumber offset = frz_offsets[curoff++];
+ ItemId lp;
+ HeapTupleHeader tuple;
+
+ lp = PageGetItemId(page, offset);
+ tuple = (HeapTupleHeader) PageGetItem(page, lp);
+ heap_execute_freeze_tuple(tuple, &frz);
+ }
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ca64c45d8a3..70d35a21e98 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -605,6 +605,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
PageClearFull(page);
+ if (do_freeze)
+ heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
+
MarkBufferDirty(buffer);
/*
@@ -615,10 +618,37 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
+ xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
+ OffsetNumber offsets[MaxHeapTuplesPerPage];
+
xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
xlrec.nredirected = prstate.nredirected;
xlrec.ndead = prstate.ndead;
+ xlrec.nunused = prstate.nunused;
+ xlrec.nplans = 0;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions
+ * on the standby older than the youngest xmax of the most recently
+ * removed tuple this record will prune will conflict. If this record
+ * will freeze tuples, any transactions on the standby with xids older
+ * than the youngest tuple this record will freeze will conflict.
+ */
+ if (do_freeze)
+ xlrec.snapshotConflictHorizon = Max(prstate.snapshotConflictHorizon,
+ frz_conflict_horizon);
+ else
+ xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
+
+ /*
+ * Prepare deduplicated representation for use in WAL record
+ * Destructively sorts tuples array in-place.
+ */
+ if (do_freeze)
+ xlrec.nplans = heap_log_freeze_plan(frozen,
+ presult->nfrozen, plans, offsets);
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
@@ -630,6 +660,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* pretend that they are. When XLogInsert stores the whole buffer,
* the offset arrays need not be stored too.
*/
+ if (xlrec.nplans > 0)
+ XLogRegisterBufData(0, (char *) plans,
+ xlrec.nplans * sizeof(xl_heap_freeze_plan));
+
if (prstate.nredirected > 0)
XLogRegisterBufData(0, (char *) prstate.redirected,
prstate.nredirected *
@@ -643,56 +677,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
XLogRegisterBufData(0, (char *) prstate.nowunused,
prstate.nunused * sizeof(OffsetNumber));
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
-
- PageSetLSN(BufferGetPage(buffer), recptr);
- }
-
- if (do_freeze)
- {
- Assert(presult->nfrozen > 0);
-
- heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
-
- MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(relation))
- {
- xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
- OffsetNumber offsets[MaxHeapTuplesPerPage];
- int nplans;
- xl_heap_freeze_page xlrec;
- XLogRecPtr recptr;
-
- /*
- * Prepare deduplicated representation for use in WAL record
- * Destructively sorts tuples array in-place.
- */
- nplans = heap_log_freeze_plan(frozen, presult->nfrozen, plans, offsets);
-
- xlrec.snapshotConflictHorizon = frz_conflict_horizon;
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.nplans = nplans;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapFreezePage);
-
- /*
- * The freeze plan array and offset array are not actually in the
- * buffer, but pretend that they are. When XLogInsert stores the
- * whole buffer, the arrays need not be stored too.
- */
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- XLogRegisterBufData(0, (char *) plans,
- nplans * sizeof(xl_heap_freeze_plan));
+ if (xlrec.nplans > 0)
XLogRegisterBufData(0, (char *) offsets,
presult->nfrozen * sizeof(OffsetNumber));
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE_PAGE);
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
- PageSetLSN(page, recptr);
- }
+ PageSetLSN(BufferGetPage(buffer), recptr);
}
END_CRIT_SECTION();
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 6488dad5e64..22f236bb52a 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -231,23 +231,35 @@ typedef struct xl_heap_update
* during opportunistic pruning)
*
* The array of OffsetNumbers following the fixed part of the record contains:
+ * * for each freeze plan: the freeze plan
* * for each redirected item: the item offset, then the offset redirected to
* * for each now-dead item: the item offset
* * for each now-unused item: the item offset
- * The total number of OffsetNumbers is therefore 2*nredirected+ndead+nunused.
- * Note that nunused is not explicitly stored, but may be found by reference
- * to the total record length.
+ * * for each tuple frozen by the freeze plans: the offset of the item corresponding to that tuple
+ * The total number of OffsetNumbers is therefore
+ * (2*nredirected) + ndead + nunused + (sum[plan.ntuples for plan in plans])
*
* Acquires a full cleanup lock.
*/
typedef struct xl_heap_prune
{
TransactionId snapshotConflictHorizon;
+ uint16 nplans;
uint16 nredirected;
uint16 ndead;
+ uint16 nunused;
bool isCatalogRel; /* to handle recovery conflict during logical
* decoding on standby */
- /* OFFSET NUMBERS are in the block reference 0 */
+ /*
+ * OFFSET NUMBERS and freeze plans are in the block reference 0 in the
+ * following order:
+ *
+ * * xl_heap_freeze_plan plans[nplans];
+ * * OffsetNumber redirected[2 * nredirected];
+ * * OffsetNumber nowdead[ndead];
+ * * OffsetNumber nowunused[nunused];
+ * * OffsetNumber frz_offsets[...];
+ */
} xl_heap_prune;
#define SizeOfHeapPrune (offsetof(xl_heap_prune, isCatalogRel) + sizeof(bool))
--
2.40.1
v2-0013-Set-hastup-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 24bfadb8308496b429547339268a7de6ca524ca0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Jan 2024 14:53:36 -0500
Subject: [PATCH v2 13/17] Set hastup in heap_page_prune
lazy_scan_prune() loops through the line pointers and tuple visibility
information for each tuple on a page, setting hastup to true if there
are any LP_REDIRECT line pointers or tuples with storage which will not
be removed. We want to remove this extra loop from lazy_scan_prune(),
and we know about non-removable tuples during heap_page_prune() anyway.
Set hastup when recording LP_REDIRECT line pointers in
heap_prune_chain() and when LP_NORMAL line pointers refer to tuples
whose visibility status is not HEAPTUPLE_DEAD.
---
src/backend/access/heap/pruneheap.c | 33 ++++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 25 ++-------------------
src/include/access/heapam.h | 1 +
3 files changed, 32 insertions(+), 27 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 70d35a21e98..faaab375651 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -70,7 +70,8 @@ static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
PruneFreezeResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum);
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneFreezeResult *presult);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
PruneFreezeResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
@@ -280,6 +281,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->nnewlpdead = 0;
presult->nfrozen = 0;
+ presult->hastup = false;
+
/*
* Keep track of whether or not the page is all_visible in case the caller
* wants to use this information to update the VM.
@@ -460,18 +463,37 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prune_prepare_freeze_tuple(page, offnum,
pagefrz, frozen, presult);
+ itemid = PageGetItemId(page, offnum);
+
+ if (ItemIdIsNormal(itemid) &&
+ presult->htsv[offnum] != HEAPTUPLE_DEAD)
+ {
+ Assert(presult->htsv[offnum] != -1);
+
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the
+ * soft assumption that any LP_DEAD items encountered here will
+ * become LP_UNUSED later on, before count_nondeletable_pages is
+ * reached. If we don't make this assumption then rel truncation
+ * will only happen every other VACUUM, at most. Besides, VACUUM
+ * must treat hastup/nonempty_pages as provisional no matter how
+ * LP_DEAD items are handled (handled here, or handled later on).
+ */
+ presult->hastup = true;
+ }
+
/* Ignore items already processed as part of an earlier chain */
if (prstate.marked[offnum])
continue;
/* Nothing to do if slot is empty */
- itemid = PageGetItemId(page, offnum);
if (!ItemIdIsUsed(itemid))
continue;
/* Process this item or chain of items */
presult->ndeleted += heap_prune_chain(buffer, offnum,
&prstate, presult);
+
}
/* Clear the offset information once we have processed the given page. */
@@ -1026,7 +1048,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (i >= nchain)
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
- heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
+ heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], presult);
}
else if (nchain < 2 && ItemIdIsRedirected(rootlp))
{
@@ -1108,7 +1130,8 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
/* Record line pointer to be redirected */
static void
heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum)
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneFreezeResult *presult)
{
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
@@ -1118,6 +1141,8 @@ heap_prune_record_redirect(PruneState *prstate,
prstate->marked[offnum] = true;
Assert(!prstate->marked[rdoffnum]);
prstate->marked[rdoffnum] = true;
+
+ presult->hastup = true;
}
/* Record line pointer to be marked dead */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6dd8d457c9c..aac38f54c0a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1447,7 +1447,6 @@ lazy_scan_prune(LVRelState *vacrel,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
- bool hastup = false;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1491,7 +1490,6 @@ lazy_scan_prune(LVRelState *vacrel,
* the VM after collecting LP_DEAD items and freezing tuples. Pruning will
* have determined whether or not the page is all_visible and able to
* become all_frozen.
- *
*/
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -1504,28 +1502,12 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- if (!ItemIdIsUsed(itemid))
- continue;
-
/* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid))
- {
- /* page makes rel truncation unsafe */
- hastup = true;
+ if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
continue;
- }
if (ItemIdIsDead(itemid))
{
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- */
deadoffsets[lpdead_items++] = offnum;
continue;
}
@@ -1593,9 +1575,6 @@ lazy_scan_prune(LVRelState *vacrel,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
-
- hastup = true; /* page makes rel truncation unsafe */
-
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1677,7 +1656,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->recently_dead_tuples += recently_dead_tuples;
/* Can't truncate this page */
- if (hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 41ebbb9f931..87f8649f79d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -201,6 +201,7 @@ typedef struct PruneFreezeResult
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
bool all_visible; /* Whether or not the page is all visible */
+ bool hastup; /* Does page make rel truncation unsafe */
TransactionId frz_conflict_horizon; /* Newest xmin on the page */
/*
--
2.40.1
v2-0014-Count-tuples-for-vacuum-logging-in-heap_page_prun.patchtext/x-diff; charset=us-asciiDownload
From 373a2eb58914d9426045da264885a8440df35356 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Jan 2024 17:25:56 -0500
Subject: [PATCH v2 14/17] Count tuples for vacuum logging in heap_page_prune
lazy_scan_prune() loops through all of the tuple visibility information
that was recorded in heap_page_prune() and then counts live and recently
dead tuples. That information is available in heap_page_prune(), so just
record it there. Add live and recently dead tuple counters to the
PruneResult. Doing this counting in heap_page_prune() eliminates the
need for saving the tuple visibility status information in the
PruneResult. Instead, save it in the PruneState where it can be
referenced by heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 110 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 77 +------------------
src/include/access/heapam.h | 28 +------
3 files changed, 99 insertions(+), 116 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index faaab375651..46b5173a401 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -55,6 +55,18 @@ typedef struct
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
} PruneState;
/* Local functions */
@@ -65,7 +77,8 @@ static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
PruneState *prstate, PruneFreezeResult *presult);
-static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
+static inline HTSV_Result htsv_get_valid_status(int status);
+static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum, PruneState *prstate,
HeapPageFreeze *pagefrz, HeapTupleFreeze *frozen,
PruneFreezeResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
@@ -283,6 +296,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = false;
+ presult->live_tuples = 0;
+ presult->recently_dead_tuples = 0;
+
/*
* Keep track of whether or not the page is all_visible in case the caller
* wants to use this information to update the VM.
@@ -328,7 +344,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsNormal(itemid))
{
- presult->htsv[offnum] = -1;
+ prstate.htsv[offnum] = -1;
continue;
}
@@ -344,9 +360,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
- presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
- switch (presult->htsv[offnum])
+ prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
+ Assert(ItemIdIsNormal(itemid));
+
+ /*
+ * The criteria for counting a tuple as live in this block need to
+ * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
+ * and ANALYZE may produce wildly different reltuples values, e.g.
+ * when there are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as
+ * VACUUM can't run inside a transaction block, which makes some cases
+ * impossible (e.g. in-progress insert from the same transaction).
+ *
+ * We treat LP_DEAD items (which are the closest thing to DEAD tuples
+ * that might be seen here) differently, too: we assume that they'll
+ * become LP_UNUSED before VACUUM finishes. This difference is only
+ * superficial. VACUUM effectively agrees with ANALYZE about DEAD
+ * items, in the end. VACUUM won't remember LP_DEAD items, but only
+ * because they're not supposed to be left behind when it is done.
+ * (Cases where we bypass index vacuuming will violate this optimistic
+ * assumption, but the overall impact of that should be negligible.)
+ */
+ switch (prstate.htsv[offnum])
{
case HEAPTUPLE_DEAD:
@@ -366,6 +403,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
case HEAPTUPLE_LIVE:
+ /*
+ * Count it as live. Not only is this natural, but it's also
+ * what acquire_sample_rows() does.
+ */
+ presult->live_tuples++;
+
/*
* Is the tuple definitely visible to all transactions?
*
@@ -402,13 +445,34 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
+
+ /*
+ * If tuple is recently dead then we must not remove it from
+ * the relation. (We only remove items that are LP_DEAD from
+ * pruning.)
+ */
+ presult->recently_dead_tuples++;
presult->all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and
+ * we assume that will happen only after we report our
+ * results. This assumption is a bit shaky, but it is what
+ * acquire_sample_rows() does, so be consistent.
+ */
presult->all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
+
+ /*
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
+ */
+ presult->live_tuples++;
presult->all_visible = false;
break;
default:
@@ -460,15 +524,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*off_loc = offnum;
if (pagefrz)
- prune_prepare_freeze_tuple(page, offnum,
+ prune_prepare_freeze_tuple(page, offnum, &prstate,
pagefrz, frozen, presult);
itemid = PageGetItemId(page, offnum);
if (ItemIdIsNormal(itemid) &&
- presult->htsv[offnum] != HEAPTUPLE_DEAD)
+ prstate.htsv[offnum] != HEAPTUPLE_DEAD)
{
- Assert(presult->htsv[offnum] != -1);
+ Assert(prstate.htsv[offnum] != -1);
/*
* Deliberately don't set hastup for LP_DEAD items. We make the
@@ -756,10 +820,24 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
}
+/*
+ * Pruning calculates tuple visibility once and saves the results in an array
+ * of int8. See PruneState.htsv for details. This helper function is meant to
+ * guard against examining visibility status array members which have not yet
+ * been computed.
+ */
+static inline HTSV_Result
+htsv_get_valid_status(int status)
+{
+ Assert(status >= HEAPTUPLE_DEAD &&
+ status <= HEAPTUPLE_DELETE_IN_PROGRESS);
+ return (HTSV_Result) status;
+}
+
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in presult->htsv.
+ * Tuple visibility information is provided in prstate->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -810,7 +888,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(presult->htsv[rootoffnum] != -1);
+ Assert(prstate->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -833,7 +911,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -934,7 +1012,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(presult->htsv[offnum]))
+ switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
@@ -1072,7 +1150,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* want to consider freezing normal tuples which will not be removed.
*/
static void
-prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
+prune_prepare_freeze_tuple(Page page, OffsetNumber offnum, PruneState *prstate,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frozen,
PruneFreezeResult *presult)
@@ -1089,8 +1167,8 @@ prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
return;
/* We do not consider freezing tuples which will be removed. */
- if (presult->htsv[offnum] == HEAPTUPLE_DEAD ||
- presult->htsv[offnum] == -1)
+ if (prstate->htsv[offnum] == HEAPTUPLE_DEAD ||
+ prstate->htsv[offnum] == -1)
return;
htup = (HeapTupleHeader) PageGetItem(page, itemid);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index aac38f54c0a..634f4da9a17 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1442,10 +1442,8 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
+ int lpdead_items = 0;
PruneFreezeResult presult;
- int lpdead_items,
- live_tuples,
- recently_dead_tuples;
HeapPageFreeze pagefrz;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1465,9 +1463,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- lpdead_items = 0;
- live_tuples = 0;
- recently_dead_tuples = 0;
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1502,9 +1497,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- /* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
- continue;
if (ItemIdIsDead(itemid))
{
@@ -1512,69 +1504,6 @@ lazy_scan_prune(LVRelState *vacrel,
continue;
}
- Assert(ItemIdIsNormal(itemid));
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (htsv_get_valid_status(presult.htsv[offnum]))
- {
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- live_tuples++;
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- recently_dead_tuples++;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This an expected case during concurrent vacuum. Count such
- * rows as live. As above, we assume the deleting transaction
- * will commit and update the counters after we report.
- */
- live_tuples++;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1652,8 +1581,8 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
+ vacrel->live_tuples += presult.live_tuples;
+ vacrel->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 87f8649f79d..f4bf60192f8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,24 +198,14 @@ typedef struct HeapPageFreeze
*/
typedef struct PruneFreezeResult
{
+ int live_tuples;
+ int recently_dead_tuples;
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
bool all_visible; /* Whether or not the page is all visible */
bool hastup; /* Does page make rel truncation unsafe */
TransactionId frz_conflict_horizon; /* Newest xmin on the page */
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
- * details. This is of type int8[], instead of HTSV_Result[], so we can
- * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
- * items.
- *
- * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
- * 1. Otherwise every access would need to subtract 1.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
-
bool all_visible_except_removable;
/* Whether or not the page can be set all frozen in the VM */
@@ -231,20 +221,6 @@ typedef struct PruneFreezeResult
MultiXactId new_relminmxid;
} PruneFreezeResult;
-/*
- * Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneFreezeResult.htsv for details. This helper function is
- * meant to guard against examining visibility status array members which have
- * not yet been computed.
- */
-static inline HTSV_Result
-htsv_get_valid_status(int status)
-{
- Assert(status >= HEAPTUPLE_DEAD &&
- status <= HEAPTUPLE_DELETE_IN_PROGRESS);
- return (HTSV_Result) status;
-}
-
/* ----------------
* function prototypes for heap access method
*
--
2.40.1
v2-0015-Save-dead-tuple-offsets-during-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 9a310d238e191445c514f036966dd5af5a7a15c2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Jan 2024 16:55:28 -0500
Subject: [PATCH v2 15/17] Save dead tuple offsets during heap_page_prune
After heap_page_prune() returned, lazy_scan_prune() looped through all
of the offsets of LP_DEAD items which it later added to
LVRelState->dead_items. Instead take care of this when marking a line
pointer or when an existing non-removable LP_DEAD item is encountered in
heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 7 ++++
src/backend/access/heap/vacuumlazy.c | 60 ++++++----------------------
src/include/access/heapam.h | 2 +
3 files changed, 22 insertions(+), 47 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 46b5173a401..c5046da6d1e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -298,6 +298,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->live_tuples = 0;
presult->recently_dead_tuples = 0;
+ presult->lpdead_items = 0;
/*
* Keep track of whether or not the page is all_visible in case the caller
@@ -987,7 +988,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
+ {
presult->all_visible = false;
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
+ }
break;
}
@@ -1239,6 +1243,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
* all_visible.
*/
presult->all_visible = false;
+
+ /* Record the dead offset for vacuum */
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 634f4da9a17..4b45e8be1ad 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1439,23 +1439,11 @@ lazy_scan_prune(LVRelState *vacrel,
bool *has_lpdead_items)
{
Relation rel = vacrel->rel;
- OffsetNumber offnum,
- maxoff;
- ItemId itemid;
- int lpdead_items = 0;
PruneFreezeResult presult;
HeapPageFreeze pagefrz;
- OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
- /*
- * maxoff might be reduced following line pointer array truncation in
- * heap_page_prune_and_freeze(). That's safe for us to ignore, since the
- * reclaimed space will continue to look like LP_UNUSED items below.
- */
- maxoff = PageGetMaxOffsetNumber(page);
-
/* Initialize pagefrz */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
@@ -1468,9 +1456,9 @@ lazy_scan_prune(LVRelState *vacrel,
* Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
- * in presult.ndeleted. It should not be confused with lpdead_items;
- * lpdead_items's final value can be thought of as the number of tuples
- * that were deleted from indexes.
+ * in presult.ndeleted. It should not be confused with
+ * presult.lpdead_items; presult.lpdead_items's final value can be thought
+ * of as the number of tuples that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
@@ -1480,32 +1468,10 @@ lazy_scan_prune(LVRelState *vacrel,
&pagefrz, &presult, &vacrel->offnum);
/*
- * Now scan the page to collect LP_DEAD items and check for tuples
- * requiring freezing among remaining tuples with storage. We will update
- * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
- * have determined whether or not the page is all_visible and able to
- * become all_frozen.
+ * We will update the VM after collecting LP_DEAD items and freezing
+ * tuples. Pruning will have determined whether or not the page is
+ * all_visible.
*/
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
- {
- /*
- * Set the offset number so that we can display it along with any
- * error that occurred while processing this tuple.
- */
- vacrel->offnum = offnum;
- itemid = PageGetItemId(page, offnum);
-
-
- if (ItemIdIsDead(itemid))
- {
- deadoffsets[lpdead_items++] = offnum;
- continue;
- }
-
- }
-
vacrel->offnum = InvalidOffsetNumber;
Assert(MultiXactIdIsValid(presult.new_relminmxid));
@@ -1541,7 +1507,7 @@ lazy_scan_prune(LVRelState *vacrel,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(lpdead_items == 0);
+ Assert(presult.lpdead_items == 0);
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
@@ -1557,7 +1523,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
- if (lpdead_items > 0)
+ if (presult.lpdead_items > 0)
{
VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
@@ -1566,9 +1532,9 @@ lazy_scan_prune(LVRelState *vacrel,
ItemPointerSetBlockNumber(&tmp, blkno);
- for (int i = 0; i < lpdead_items; i++)
+ for (int i = 0; i < presult.lpdead_items; i++)
{
- ItemPointerSetOffsetNumber(&tmp, deadoffsets[i]);
+ ItemPointerSetOffsetNumber(&tmp, presult.deadoffsets[i]);
dead_items->items[dead_items->num_items++] = tmp;
}
@@ -1580,7 +1546,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += lpdead_items;
+ vacrel->lpdead_items += presult.lpdead_items;
vacrel->live_tuples += presult.live_tuples;
vacrel->recently_dead_tuples += presult.recently_dead_tuples;
@@ -1589,7 +1555,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
- *has_lpdead_items = (lpdead_items > 0);
+ *has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
@@ -1657,7 +1623,7 @@ lazy_scan_prune(LVRelState *vacrel,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (lpdead_items > 0 && PageIsAllVisible(page))
+ else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f4bf60192f8..86524ae0c3d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -219,6 +219,8 @@ typedef struct PruneFreezeResult
/* New value of relminmxid found by heap_page_prune_and_freeze() */
MultiXactId new_relminmxid;
+ OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
+ int lpdead_items; /* includes existing LP_DEAD items */
} PruneFreezeResult;
/* ----------------
--
2.40.1
v2-0016-Obsolete-XLOG_HEAP2_FREEZE_PAGE.patchtext/x-diff; charset=us-asciiDownload
From db6a63c2a91e192283f340682e7112acc792b72c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 12 Mar 2024 19:07:38 -0400
Subject: [PATCH v2 16/17] Obsolete XLOG_HEAP2_FREEZE_PAGE
When vacuum freezes tuples, the information needed to replay these
changes is saved in the xl_heap_prune record. As such, we no longer need
to create new xl_heap_freeze records. We can get rid of
heap_freeze_execute_prepared() as well as the special case in
heap_page_prune_and_freeze() for when only freezing is done.
We must retain the xl_heap_freeze_page record and
heap_xlog_freeze_page() in order to replay old freeze records.
---
src/backend/access/heap/heapam.c | 78 ++------------
src/backend/access/heap/pruneheap.c | 155 +++++++++++++---------------
src/include/access/heapam.h | 7 +-
3 files changed, 79 insertions(+), 161 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a8f35eba3c9..12a1a7805f4 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6340,7 +6340,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* XIDs or MultiXactIds that will need to be processed by a future VACUUM.
*
* VACUUM caller must assemble HeapTupleFreeze freeze plan entries for every
- * tuple that we returned true for, and call heap_freeze_execute_prepared to
+ * tuple that we returned true for, and call heap_page_prune_and_freeze() to
* execute freezing. Caller must initialize pagefrz fields for page as a
* whole before first call here for each heap page.
*
@@ -6656,8 +6656,7 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
-* Perform xmin/xmax XID status sanity checks before calling
-* heap_freeze_execute_prepared().
+* Perform xmin/xmax XID status sanity checks before executing freezing.
*
* heap_prepare_freeze_tuple doesn't perform these checks directly because
* pg_xact lookups are relatively expensive. They shouldn't be repeated
@@ -6732,70 +6731,6 @@ heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
}
}
-/*
- * heap_freeze_execute_prepared
- *
- * Execute freezing of prepared tuples and WAL-logs the changes so that VACUUM
- * can advance the rel's relfrozenxid later on without any risk of unsafe
- * pg_xact lookups, even following a hard crash (or when querying from a
- * standby). We represent freezing by setting infomask bits in tuple headers,
- * but this shouldn't be thought of as a hint. See section on buffer access
- * rules in src/backend/storage/buffer/README. Must be called from within a
- * critical section.
- */
-void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
-{
- Page page = BufferGetPage(buffer);
-
- Assert(ntuples > 0);
-
- heap_freeze_prepared_tuples(buffer, tuples, ntuples);
-
- MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(rel))
- {
- xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
- OffsetNumber offsets[MaxHeapTuplesPerPage];
- int nplans;
- xl_heap_freeze_page xlrec;
- XLogRecPtr recptr;
-
- /*
- * Prepare deduplicated representation for use in WAL record
- * Destructively sorts tuples array in-place, so caller had better be
- * done with it.
- */
- nplans = heap_log_freeze_plan(tuples, ntuples, plans, offsets);
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(rel);
- xlrec.nplans = nplans;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapFreezePage);
-
- /*
- * The freeze plan array and offset array are not actually in the
- * buffer, but pretend that they are. When XLogInsert stores the
- * whole buffer, the arrays need not be stored too.
- */
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- XLogRegisterBufData(0, (char *) plans,
- nplans * sizeof(xl_heap_freeze_plan));
- XLogRegisterBufData(0, (char *) offsets,
- ntuples * sizeof(OffsetNumber));
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE_PAGE);
-
- PageSetLSN(page, recptr);
- }
-}
-
/*
* Comparator used to deduplicate XLOG_HEAP2_FREEZE_PAGE freeze plans
*/
@@ -8827,10 +8762,11 @@ heap_xlog_prune(XLogReaderState *record)
frz_offsets = nowunused + nunused;
/* Update all line pointers per the record, and repair fragmentation */
- heap_page_prune_execute(buffer,
- redirected, nredirected,
- nowdead, ndead,
- nowunused, nunused);
+ if (nredirected > 0 || ndead > 0 || nunused > 0)
+ heap_page_prune_execute(buffer,
+ redirected, nredirected,
+ nowdead, ndead,
+ nowunused, nunused);
for (int p = 0; p < nplans; p++)
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c5046da6d1e..7a27c5a3957 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -256,6 +256,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
+ bool do_hint;
bool whole_page_freezable;
bool hint_bit_fpi;
bool prune_fpi = false;
@@ -569,6 +570,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /* Record number of newly-set-LP_DEAD items for caller */
+ presult->nnewlpdead = prstate.ndead;
+
/*
* Only incur overhead of checking if we will do an FPI if we might use
* the information.
@@ -576,7 +580,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_prune && pagefrz)
prune_fpi = XLogCheckBufferNeedsBackup(buffer);
- /* Is the whole page freezable? And is there something to freeze */
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
+
+ /* Is the whole page freezable? And is there something to freeze? */
whole_page_freezable = presult->all_visible_except_removable &&
presult->all_frozen;
@@ -591,55 +603,52 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* opportunistic freeze heuristic must be improved; however, for now, try
* to approximate it.
*/
-
do_freeze = pagefrz &&
(pagefrz->freeze_required ||
(whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
- if (do_freeze)
- {
- heap_pre_freeze_checks(buffer, frozen, presult->nfrozen);
- frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
- }
- else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
- {
- /*
- * If we will neither freeze tuples on the page nor set the page all
- * frozen in the visibility map, the page is not all frozen and there
- * will be no newly frozen tuples.
- */
- presult->all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumentation */
- }
+ /*
+ * If we are going to modify the page contents anyway, we will have to
+ * update more than hint bits.
+ */
+ if (do_freeze || do_prune)
+ do_hint = false;
- /* Record number of newly-set-LP_DEAD items for caller */
- presult->nnewlpdead = prstate.ndead;
+ START_CRIT_SECTION();
+ /*
+ * Update the page's pd_prune_xid field to either zero, or the lowest XID
+ * of any soon-prunable tuple.
+ */
+ ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
- /* Have we found any prunable items? */
- if (!do_prune)
- {
- /* Any error while applying the changes is critical */
- START_CRIT_SECTION();
+ /*
+ * If pruning, freezing, or updating the hint bit, clear the "page is
+ * full" flag if it is set since there's no point in repeating the
+ * prune/defrag process until something else happens to the page.
+ */
+ if (do_prune || do_freeze || do_hint)
+ PageClearFull(page);
- /*
- * If we didn't prune anything, but have found a new value for the
- * pd_prune_xid field, update it and mark the buffer dirty. This is
- * treated as a non-WAL-logged hint.
- *
- * Also clear the "page is full" flag if it is set, since there's no
- * point in repeating the prune/defrag process until something else
- * happens to the page.
- */
- if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page))
- {
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
- PageClearFull(page);
- MarkBufferDirtyHint(buffer, true);
- }
+ /*
+ * Apply the planned item changes, then repair page fragmentation, and
+ * update the page's hint bit about whether it has free line pointers.
+ */
+ if (do_prune)
+ {
+ heap_page_prune_execute(buffer,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
+ }
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ /*
+ * If we aren't pruning or freezing anything, but we updated pd_prune_xid,
+ * this is a non-WAL-logged hint.
+ */
+ if (do_hint)
+ {
+ MarkBufferDirtyHint(buffer, true);
/*
* We may have decided not to opportunistically freeze above because
@@ -647,60 +656,38 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* enabled, setting the hint bit may have emitted an FPI. Check again
* if we should freeze.
*/
- if (!do_freeze && hint_bit_fpi)
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+
+ if (hint_bit_fpi)
do_freeze = pagefrz &&
(pagefrz->freeze_required ||
(whole_page_freezable && presult->nfrozen > 0));
-
- if (do_freeze)
- {
- heap_freeze_execute_prepared(relation, buffer,
- frz_conflict_horizon,
- frozen, presult->nfrozen);
- }
- else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
- {
- presult->all_frozen = false;
- presult->nfrozen = 0;
- }
-
- END_CRIT_SECTION();
-
- goto update_frozenxids;
}
- START_CRIT_SECTION();
-
- /*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
- */
- heap_page_prune_execute(buffer,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
-
- /*
- * Update the page's pd_prune_xid field to either zero, or the lowest XID
- * of any soon-prunable tuple.
- */
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
-
- /*
- * Also clear the "page is full" flag, since there's no point in repeating
- * the prune/defrag process until something else happens to the page.
- */
- PageClearFull(page);
-
if (do_freeze)
+ {
+ heap_pre_freeze_checks(buffer, frozen, presult->nfrozen);
+ frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
+ }
+ else if ((!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
+ {
+ /*
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all-frozen and there
+ * will be no newly frozen tuples.
+ */
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
- MarkBufferDirty(buffer);
+ if (do_prune || do_freeze)
+ MarkBufferDirty(buffer);
/*
* Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
*/
- if (RelationNeedsWAL(relation))
+ if ((do_prune || do_freeze) && RelationNeedsWAL(relation))
{
xl_heap_prune xlrec;
XLogRecPtr recptr;
@@ -775,8 +762,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
-update_frozenxids:
-
/* Caller won't update new_relfrozenxid and new_relminmxid */
if (!pagefrz)
return;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 86524ae0c3d..3e41b4bfd4b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -102,7 +102,7 @@ typedef enum
} HTSV_Result;
/*
- * heap_prepare_freeze_tuple may request that heap_freeze_execute_prepared
+ * heap_prepare_freeze_tuple may request that the heap_page_prune_and_freeze()
* check any tuple's to-be-frozen xmin and/or xmax status using pg_xact
*/
#define HEAP_FREEZE_CHECK_XMIN_COMMITTED 0x01
@@ -155,7 +155,7 @@ typedef struct HeapPageFreeze
/*
* "Freeze" NewRelfrozenXid/NewRelminMxid trackers.
*
- * Trackers used when heap_freeze_execute_prepared freezes, or when there
+ * Trackers used when heap_page_prune_and_freeze() freezes, or when there
* are zero freeze plans for a page. It is always valid for vacuumlazy.c
* to freeze any page, by definition. This even includes pages that have
* no tuples with storage to consider in the first place. That way the
@@ -298,9 +298,6 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
extern void heap_pre_freeze_checks(Buffer buffer,
HeapTupleFreeze *tuples, int ntuples);
-extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples);
extern void heap_freeze_prepared_tuples(Buffer buffer,
HeapTupleFreeze *tuples, int ntuples);
--
2.40.1
v2-0017-Streamline-XLOG_HEAP2_PRUNE-record.patchtext/x-diff; charset=us-asciiDownload
From d81331eb950675fab09ab2cbc6598861bcbf4c84 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 13 Mar 2024 00:28:57 -0400
Subject: [PATCH v2 17/17] Streamline XLOG_HEAP2_PRUNE record
xl_heap_prune struct for the XLOG_HEAP2_PRUNE record type had members
for counting the number of freeze plans and number of redirected, dead,
and newly unused line pointers. However, only some of those are used in
many XLOG_HEAP2_PRUNE records. As part of a refactor to use
XLOG_HEAP2_PRUNE record types instead of XLOG_HEAP2_FREEZE_PAGE records
when only freezing is being done, eliminate those members and instead
use flags to indicate which of those types of modifications will be
done. The resulting record will contain only data about modifications
that must be done.
ci-os-only:
---
src/backend/access/heap/heapam.c | 101 ++++++++++++++-----
src/backend/access/heap/pruneheap.c | 86 ++++++++++++----
src/backend/access/rmgrdesc/heapdesc.c | 130 +++++++++++++++++++------
src/include/access/heapam_xlog.h | 122 ++++++++++++++---------
src/tools/pgindent/typedefs.list | 2 +
5 files changed, 318 insertions(+), 123 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 12a1a7805f4..258f58b53e0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8705,8 +8705,6 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
/*
* Handles XLOG_HEAP2_PRUNE record type.
- *
- * Acquires a full cleanup lock.
*/
static void
heap_xlog_prune(XLogReaderState *record)
@@ -8717,49 +8715,101 @@ heap_xlog_prune(XLogReaderState *record)
RelFileLocator rlocator;
BlockNumber blkno;
XLogRedoAction action;
+ bool get_cleanup_lock;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
+ /*
+ * If there are dead, redirected, or unused items,
+ * heap_page_prune_execute() will call PageRepairFragementation() which
+ * expects a full cleanup lock.
+ */
+ get_cleanup_lock = xlrec->flags & XLHP_HAS_REDIRECTIONS ||
+ xlrec->flags & XLHP_HAS_DEAD_ITEMS ||
+ xlrec->flags & XLHP_HAS_NOW_UNUSED_ITEMS;
+
/*
* We're about to remove tuples. In Hot Standby mode, ensure that there's
* no queries running for which the removed tuples are still visible.
*/
if (InHotStandby)
ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->isCatalogRel,
+ xlrec->flags & XLHP_IS_CATALOG_REL,
rlocator);
/*
- * If we have a full-page image, restore it (using a cleanup lock) and
- * we're done.
+ * If we have a full-page image, restore it and we're done.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true,
- &buffer);
+ action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ get_cleanup_lock, &buffer);
+
if (action == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
- OffsetNumber *redirected;
- OffsetNumber *nowdead;
- OffsetNumber *nowunused;
- int nredirected;
- int ndead;
- int nunused;
- int nplans;
Size datalen;
- xl_heap_freeze_plan *plans;
+ OffsetNumber *redirected = NULL;
+ OffsetNumber *nowdead = NULL;
+ OffsetNumber *nowunused = NULL;
+ int nredirected = 0;
+ int ndead = 0;
+ int nunused = 0;
+ int nplans = 0;
OffsetNumber *frz_offsets;
+ xl_heap_freeze_plan *plans;
int curoff = 0;
- nplans = xlrec->nplans;
- nredirected = xlrec->nredirected;
- ndead = xlrec->ndead;
- nunused = xlrec->nunused;
+ char *cursor = XLogRecGetBlockData(record, 0, &datalen);
+
+ if (xlrec->flags & XLHP_HAS_FREEZE_PLANS)
+ {
+ xlhp_freeze *freeze = (xlhp_freeze *) cursor;
+
+ nplans = freeze->nplans;
+ Assert(nplans > 0);
+ plans = freeze->plans;
+
+ cursor += offsetof(xlhp_freeze, plans);
+ cursor += sizeof(xl_heap_freeze_plan) * freeze->nplans;
+ }
- plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, &datalen);
- redirected = (OffsetNumber *) &plans[nplans];
- nowdead = redirected + (nredirected * 2);
- nowunused = nowdead + ndead;
- frz_offsets = nowunused + nunused;
+ if (xlrec->flags & XLHP_HAS_REDIRECTIONS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ nredirected = subrecord->ntargets;
+ Assert(nredirected > 0);
+ redirected = &subrecord->data[0];
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber[2]) * nredirected;
+ }
+
+ if (xlrec->flags & XLHP_HAS_DEAD_ITEMS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ ndead = subrecord->ntargets;
+ Assert(ndead > 0);
+ nowdead = subrecord->data;
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber) * ndead;
+ }
+
+ if (xlrec->flags & XLHP_HAS_NOW_UNUSED_ITEMS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ nunused = subrecord->ntargets;
+ Assert(nunused > 0);
+ nowunused = subrecord->data;
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber) * nunused;
+ }
+
+ if (nplans > 0)
+ frz_offsets = (OffsetNumber *) cursor;
/* Update all line pointers per the record, and repair fragmentation */
if (nredirected > 0 || ndead > 0 || nunused > 0)
@@ -8798,7 +8848,6 @@ heap_xlog_prune(XLogReaderState *record)
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
-
PageSetLSN(page, lsn);
MarkBufferDirty(buffer);
}
@@ -8810,7 +8859,7 @@ heap_xlog_prune(XLogReaderState *record)
UnlockReleaseBuffer(buffer);
/*
- * After pruning records from a page, it's useful to update the FSM
+ * After modifying records on a page, it's useful to update the FSM
* about it, as it may cause the page become target for insertions
* later even if vacuum decides not to visit it (which is possible if
* gets marked all-visible.)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7a27c5a3957..06739f8ad49 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -691,15 +691,19 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
{
xl_heap_prune xlrec;
XLogRecPtr recptr;
+ xlhp_freeze freeze;
+ xlhp_prune_items redirect,
+ dead,
+ unused;
+ int nplans = 0;
xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
- OffsetNumber offsets[MaxHeapTuplesPerPage];
+ OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.nredirected = prstate.nredirected;
- xlrec.ndead = prstate.ndead;
- xlrec.nunused = prstate.nunused;
- xlrec.nplans = 0;
+ xlrec.flags = 0;
+
+ if (RelationIsAccessibleInLogicalDecoding(relation))
+ xlrec.flags |= XLHP_IS_CATALOG_REL;
/*
* The snapshotConflictHorizon for the whole record should be the most
@@ -721,8 +725,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Destructively sorts tuples array in-place.
*/
if (do_freeze)
- xlrec.nplans = heap_log_freeze_plan(frozen,
- presult->nfrozen, plans, offsets);
+ nplans = heap_log_freeze_plan(frozen,
+ presult->nfrozen, plans,
+ frz_offsets);
+ if (nplans > 0)
+ xlrec.flags |= XLHP_HAS_FREEZE_PLANS;
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
@@ -734,26 +741,71 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* pretend that they are. When XLogInsert stores the whole buffer,
* the offset arrays need not be stored too.
*/
- if (xlrec.nplans > 0)
+ if (nplans > 0)
+ {
+ freeze = (xlhp_freeze)
+ {
+ .nplans = nplans
+ };
+
+ XLogRegisterBufData(0, (char *) &freeze, offsetof(xlhp_freeze, plans));
+
XLogRegisterBufData(0, (char *) plans,
- xlrec.nplans * sizeof(xl_heap_freeze_plan));
+ sizeof(xl_heap_freeze_plan) * freeze.nplans);
+ }
+
if (prstate.nredirected > 0)
+ {
+ xlrec.flags |= XLHP_HAS_REDIRECTIONS;
+
+ redirect = (xlhp_prune_items)
+ {
+ .ntargets = prstate.nredirected
+ };
+
+ XLogRegisterBufData(0, (char *) &redirect,
+ offsetof(xlhp_prune_items, data));
+
XLogRegisterBufData(0, (char *) prstate.redirected,
- prstate.nredirected *
- sizeof(OffsetNumber) * 2);
+ sizeof(OffsetNumber[2]) * prstate.nredirected);
+ }
if (prstate.ndead > 0)
+ {
+ xlrec.flags |= XLHP_HAS_DEAD_ITEMS;
+
+ dead = (xlhp_prune_items)
+ {
+ .ntargets = prstate.ndead
+ };
+
+ XLogRegisterBufData(0, (char *) &dead,
+ offsetof(xlhp_prune_items, data));
+
XLogRegisterBufData(0, (char *) prstate.nowdead,
- prstate.ndead * sizeof(OffsetNumber));
+ sizeof(OffsetNumber) * dead.ntargets);
+ }
if (prstate.nunused > 0)
+ {
+ xlrec.flags |= XLHP_HAS_NOW_UNUSED_ITEMS;
+
+ unused = (xlhp_prune_items)
+ {
+ .ntargets = prstate.nunused
+ };
+
+ XLogRegisterBufData(0, (char *) &unused,
+ offsetof(xlhp_prune_items, data));
+
XLogRegisterBufData(0, (char *) prstate.nowunused,
- prstate.nunused * sizeof(OffsetNumber));
+ sizeof(OffsetNumber) * unused.ntargets);
+ }
- if (xlrec.nplans > 0)
- XLogRegisterBufData(0, (char *) offsets,
- presult->nfrozen * sizeof(OffsetNumber));
+ if (nplans > 0)
+ XLogRegisterBufData(0, (char *) frz_offsets,
+ sizeof(OffsetNumber) * presult->nfrozen);
recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 36a3d83c8c2..462b0d74f80 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -179,43 +179,109 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
xl_heap_prune *xlrec = (xl_heap_prune *) rec;
- appendStringInfo(buf, "snapshotConflictHorizon: %u, nredirected: %u, ndead: %u, isCatalogRel: %c",
+ appendStringInfo(buf, "snapshotConflictHorizon: %u, isCatalogRel: %c",
xlrec->snapshotConflictHorizon,
- xlrec->nredirected,
- xlrec->ndead,
- xlrec->isCatalogRel ? 'T' : 'F');
+ xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
if (XLogRecHasBlockData(record, 0))
{
- OffsetNumber *end;
- OffsetNumber *redirected;
- OffsetNumber *nowdead;
- OffsetNumber *nowunused;
- int nredirected;
- int nunused;
Size datalen;
-
- redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0,
- &datalen);
-
- nredirected = xlrec->nredirected;
- end = (OffsetNumber *) ((char *) redirected + datalen);
- nowdead = redirected + (nredirected * 2);
- nowunused = nowdead + xlrec->ndead;
- nunused = (end - nowunused);
- Assert(nunused >= 0);
-
- appendStringInfo(buf, ", nunused: %d", nunused);
-
- appendStringInfoString(buf, ", redirected:");
- array_desc(buf, redirected, sizeof(OffsetNumber) * 2,
- nredirected, &redirect_elem_desc, NULL);
- appendStringInfoString(buf, ", dead:");
- array_desc(buf, nowdead, sizeof(OffsetNumber), xlrec->ndead,
- &offset_elem_desc, NULL);
- appendStringInfoString(buf, ", unused:");
- array_desc(buf, nowunused, sizeof(OffsetNumber), nunused,
- &offset_elem_desc, NULL);
+ OffsetNumber *redirected = NULL;
+ OffsetNumber *nowdead = NULL;
+ OffsetNumber *nowunused = NULL;
+ int nredirected = 0;
+ int nunused = 0;
+ int ndead = 0;
+ int nplans = 0;
+ xl_heap_freeze_plan *plans = NULL;
+ OffsetNumber *frz_offsets;
+
+ char *cursor = XLogRecGetBlockData(record, 0, &datalen);
+
+ if (xlrec->flags & XLHP_HAS_FREEZE_PLANS)
+ {
+ xlhp_freeze *freeze = (xlhp_freeze *) cursor;
+
+ nplans = freeze->nplans;
+ Assert(nplans > 0);
+ plans = freeze->plans;
+
+ cursor += offsetof(xlhp_freeze, plans);
+ cursor += sizeof(xl_heap_freeze_plan) * freeze->nplans;
+ }
+
+ if (xlrec->flags & XLHP_HAS_REDIRECTIONS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ nredirected = subrecord->ntargets;
+ Assert(nredirected > 0);
+ redirected = &subrecord->data[0];
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber[2]) * nredirected;
+ }
+
+ if (xlrec->flags & XLHP_HAS_DEAD_ITEMS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ ndead = subrecord->ntargets;
+ Assert(ndead > 0);
+ nowdead = subrecord->data;
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber) * ndead;
+ }
+
+ if (xlrec->flags & XLHP_HAS_NOW_UNUSED_ITEMS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ nunused = subrecord->ntargets;
+ Assert(nunused > 0);
+ nowunused = subrecord->data;
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber) * nunused;
+ }
+
+ if (nplans > 0)
+ frz_offsets = (OffsetNumber *) cursor;
+
+ appendStringInfo(buf, ", nredirected: %u, ndead: %u, nunused: %u, nplans: %u,",
+ nredirected,
+ ndead,
+ nunused,
+ nplans);
+
+ if (nredirected > 0)
+ {
+ appendStringInfoString(buf, ", redirected:");
+ array_desc(buf, redirected, sizeof(OffsetNumber) * 2,
+ nredirected, &redirect_elem_desc, NULL);
+ }
+
+ if (ndead > 0)
+ {
+ appendStringInfoString(buf, ", dead:");
+ array_desc(buf, nowdead, sizeof(OffsetNumber), ndead,
+ &offset_elem_desc, NULL);
+ }
+
+ if (nunused > 0)
+ {
+ appendStringInfoString(buf, ", unused:");
+ array_desc(buf, nowunused, sizeof(OffsetNumber), nunused,
+ &offset_elem_desc, NULL);
+ }
+
+ if (nplans > 0)
+ {
+ appendStringInfoString(buf, ", plans:");
+ array_desc(buf, plans, sizeof(xl_heap_freeze_plan), nplans,
+ &plan_elem_desc, &frz_offsets);
+ }
}
}
else if (info == XLOG_HEAP2_VACUUM)
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 22f236bb52a..bebd93422d5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -227,42 +227,84 @@ typedef struct xl_heap_update
#define SizeOfHeapUpdate (offsetof(xl_heap_update, new_offnum) + sizeof(OffsetNumber))
/*
- * This is what we need to know about page pruning (both during VACUUM and
- * during opportunistic pruning)
+ * XXX: As of Postgres 17, XLOG_HEAP2_PRUNE records replace
+ * XLOG_HEAP2_FREEZE_PAGE record types
+ */
+
+/*
+ * This is what we need to know about page pruning and freezing, both during
+ * VACUUM and during opportunistic pruning.
*
- * The array of OffsetNumbers following the fixed part of the record contains:
- * * for each freeze plan: the freeze plan
- * * for each redirected item: the item offset, then the offset redirected to
- * * for each now-dead item: the item offset
- * * for each now-unused item: the item offset
- * * for each tuple frozen by the freeze plans: the offset of the item corresponding to that tuple
- * The total number of OffsetNumbers is therefore
- * (2*nredirected) + ndead + nunused + (sum[plan.ntuples for plan in plans])
+ * If XLPH_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, or XLHP_HAS_NOW_UNUSED is set,
+ * acquires a full cleanup lock. Otherwise an ordinary exclusive lock is
+ * enough. This can happen if freezing was the only modification to the page.
*
- * Acquires a full cleanup lock.
+ * The data for block reference 0 contains "sub-records" depending on which
+ * of the XLHP_HAS_* flags are set. See xlhp_* struct definitions below.
+ *
+ * The layout is in the same order as the XLHP_* flags.
*/
typedef struct xl_heap_prune
{
TransactionId snapshotConflictHorizon;
- uint16 nplans;
- uint16 nredirected;
- uint16 ndead;
- uint16 nunused;
- bool isCatalogRel; /* to handle recovery conflict during logical
- * decoding on standby */
- /*
- * OFFSET NUMBERS and freeze plans are in the block reference 0 in the
- * following order:
- *
- * * xl_heap_freeze_plan plans[nplans];
- * * OffsetNumber redirected[2 * nredirected];
- * * OffsetNumber nowdead[ndead];
- * * OffsetNumber nowunused[nunused];
- * * OffsetNumber frz_offsets[...];
- */
+ uint8 flags;
} xl_heap_prune;
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, isCatalogRel) + sizeof(bool))
+#define XLHP_IS_CATALOG_REL 0x01 /* to handle recovery conflict
+ * during logical decoding on
+ * standby */
+#define XLHP_HAS_FREEZE_PLANS 0x02
+#define XLHP_HAS_REDIRECTIONS 0x04
+#define XLHP_HAS_DEAD_ITEMS 0x08
+#define XLHP_HAS_NOW_UNUSED_ITEMS 0x10
+
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+
+/*
+ * This struct represents a 'freeze plan', which describes how to freeze a
+ * group of one or more heap tuples (appears in xl_heap_freeze_page and
+ * xl_heap_prune's xlhp_freeze records)
+ */
+/* 0x01 was XLH_FREEZE_XMIN */
+#define XLH_FREEZE_XVAC 0x02
+#define XLH_INVALID_XVAC 0x04
+
+typedef struct xl_heap_freeze_plan
+{
+ TransactionId xmax;
+ uint16 t_infomask2;
+ uint16 t_infomask;
+ uint8 frzflags;
+
+ /* Length of individual page offset numbers array for this plan */
+ uint16 ntuples;
+} xl_heap_freeze_plan;
+
+/*
+ * This is what we need to know about a block being frozen during vacuum
+ *
+ * Backup block 0's data contains an array of xl_heap_freeze_plan structs
+ * (with nplans elements), followed by one or more page offset number arrays.
+ * Each such page offset number array corresponds to a single freeze plan
+ * (REDO routine freezes corresponding heap tuples using freeze plan).
+ */
+typedef struct xlhp_freeze
+{
+ uint16 nplans;
+ xl_heap_freeze_plan plans[FLEXIBLE_ARRAY_MEMBER];
+} xlhp_freeze;
+
+/*
+ * Sub-record type contained in block reference 0 of a prune record if
+ * XLHP_HAS_REDIRECTIONS/XLHP_HAS_DEAD_ITEMS/XLHP_HAS_NOW_UNUSED_ITEMS is set.
+ * Note that in the XLHP_HAS_REDIRECTIONS variant, there are actually 2 *
+ * length number of OffsetNumbers in the data.
+ */
+typedef struct xlhp_prune_items
+{
+ uint16 ntargets;
+ OffsetNumber data[FLEXIBLE_ARRAY_MEMBER];
+} xlhp_prune_items;
/*
* The vacuum page record is similar to the prune record, but can only mark
@@ -326,26 +368,6 @@ typedef struct xl_heap_inplace
} xl_heap_inplace;
#define SizeOfHeapInplace (offsetof(xl_heap_inplace, offnum) + sizeof(OffsetNumber))
-
-/*
- * This struct represents a 'freeze plan', which describes how to freeze a
- * group of one or more heap tuples (appears in xl_heap_freeze_page record)
- */
-/* 0x01 was XLH_FREEZE_XMIN */
-#define XLH_FREEZE_XVAC 0x02
-#define XLH_INVALID_XVAC 0x04
-
-typedef struct xl_heap_freeze_plan
-{
- TransactionId xmax;
- uint16 t_infomask2;
- uint16 t_infomask;
- uint8 frzflags;
-
- /* Length of individual page offset numbers array for this plan */
- uint16 ntuples;
-} xl_heap_freeze_plan;
-
/*
* This is what we need to know about a block being frozen during vacuum
*
@@ -353,6 +375,10 @@ typedef struct xl_heap_freeze_plan
* (with nplans elements), followed by one or more page offset number arrays.
* Each such page offset number array corresponds to a single freeze plan
* (REDO routine freezes corresponding heap tuples using freeze plan).
+ *
+ * This is for backwards compatability for reading individual freeze records.
+ * As of Postgres 17, xl_heap_freeze_plan records occur in xl_heap_prune
+ * records.
*/
typedef struct xl_heap_freeze_page
{
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1c1a4d305d6..2702f211d90 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4002,6 +4002,8 @@ xl_xact_stats_items
xl_xact_subxacts
xl_xact_twophase
xl_xact_xinfo
+xlhp_freeze
+xlhp_prune_items
xmlBuffer
xmlBufferPtr
xmlChar
--
2.40.1
On Wed, Mar 13, 2024 at 07:25:56PM -0400, Melanie Plageman wrote:
On Mon, Mar 11, 2024 at 6:38 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 09/03/2024 22:41, Melanie Plageman wrote:
On Wed, Mar 6, 2024 at 7:59 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Does GlobalVisTestIsRemovableXid() handle FrozenTransactionId correctly?
Okay, so I thought a lot about this, and I don't understand how
GlobalVisTestIsRemovableXid() would not handle FrozenTransactionId
correctly.vacrel->cutoffs.OldestXmin is computed initially from
GetOldestNonRemovableTransactionId() which uses ComputeXidHorizons().
GlobalVisState is updated by ComputeXidHorizons() (when it is
updated).I do see that the comment above GlobalVisTestIsRemovableXid() says
* It is crucial that this only gets called for xids from a source that
* protects against xid wraparounds (e.g. from a table and thus protected by
* relfrozenxid).and then in
* Convert 32 bit argument to FullTransactionId. We can do so safely
* because we know the xid has to, at the very least, be between
* [oldestXid, nextXid), i.e. within 2 billion of xid.I'm not sure what oldestXid is here.
It's true that I don't see GlobalVisTestIsRemovableXid() being called
anywhere else with an xmin as an input. I think that hints that it is
not safe with FrozenTransactionId. But I don't see what could go
wrong.Maybe it has something to do with converting it to a FullTransactionId?
FullTransactionIdFromU64(U64FromFullTransactionId(rel) + (int32)
(xid - rel_xid));Sorry, I couldn't quite figure it out :(
I just tested it. No, GlobalVisTestIsRemovableXid does not work for
FrozenTransactionId. I just tested it with state->definitely_needed ==
{0, 4000000000} and xid == FrozenTransactionid, and it incorrectly
returned 'false'. It treats FrozenTransactionId as if was a regular xid '2'.I see. Looking at the original code:
if (!TransactionIdPrecedes(xmin,
vacrel->cutoffs.OldestXmin))And the source code for TransactionIdPrecedes:
if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
return (id1 < id2);diff = (int32) (id1 - id2); return (diff < 0);In your example, It seems like you mean GlobalVisState->maybe_needed is
0 and GlobalVisState->definitely_needed = 4000000000. In this example,
if vacrel->cutoffs.OldestXmin was 0, we would get a wrong answer also.But, I do see that the comparison done by TransactionIdPrecedes() is is
quite different than that done by FullTransactionIdPrecedes() because of
the modulo 2**32 arithmetic.Solving the handling of FrozenTransactionId specifically by
GlobalVisTestIsRemovableXid() seems like it would be done easily in our
case by wrapping it in a function which checks if
TransactionIdIsNormal() and returns true if it is not normal. But, I'm
not sure if I am missing the larger problem.
I've made such a wrapper in attached v3.
The XLOG_HEAP2_FREEZE_PAGE record is a little smaller than
XLOG_HEAP2_PRUNE. But we could optimize the XLOG_HEAP2_PRUNE format for
the case that there's no pruning, just freezing. The record format
(xl_heap_prune) looks pretty complex as it is, I think it could be made
both more compact and more clear with some refactoring.I'm happy to change up xl_heap_prune format. In its current form,
according to pahole, it has no holes and just 3 bytes of padding at
the end.One way we could make it smaller is by moving the isCatalogRel member
to directly after snapshotConflictHorizon and then adding a flags
field and defining flags to indicate whether or not other members
exist at all. We could set bits for HAS_FREEZE_PLANS, HAS_REDIRECTED,
HAS_UNUSED, HAS_DEAD. Then I would remove the non-optional uint16
nredirected, nunused, nplans, ndead and just put the number of
redirected/unused/etc at the beginning of the arrays containing the
offsets.Sounds good.
Then I could write various macros for accessing them. That
would make it smaller, but it definitely wouldn't make it less complex
(IMO).I don't know, it might turn out not that complex. If you define the
formats of each of those "sub-record types" as explicit structs, per
attached sketch, you won't need so many macros. Some care is still
needed with alignment though.In the attached v2, I've done as you suggested and made all members
except flags and snapshotConflictHorizon optional in the xl_heap_prune
struct and obsoleted the xl_heap_freeze struct. I've kept the actual
xl_heap_freeze_page struct and heap_xlog_freeze_page() function so that
we can replay previously made XLOG_HEAP2_FREEZE_PAGE records.Because we may set line pointers unused during vacuum's first pass, I
couldn't use the presence of nowunused without redirected or dead items
to indicate that this was a record emitted by vacuum's second pass. As
such, I haven't obsoleted the xl_heap_vacuum struct. I was thinking I
could add a flag that indicates the record was emitted by vacuum's
second pass? We would want to distinguish this so that we could set the
items unused without calling heap_page_prune_execute() -- because that
calls PageRepairFragmentation() which requires a full cleanup lock.
Okay, so I was going to start using xl_heap_prune for vacuum here too,
but I realized it would be bigger because of the
snapshotConflictHorizon. Do you think there is a non-terrible way to
make the snapshotConflictHorizon optional? Like with a flag?
I introduced a few sub-record types similar to what you suggested --
they help a bit with alignment, so I think they are worth keeping. There
are comments around them, but perhaps a larger diagram of the layout of
a the new XLOG_HEAP2_PRUNE record would be helpful.
I started doing this, but I can't find a way of laying out the diagram
that pgindent doesn't destroy. I thought I remember other diagrams in
the source code showing the layout of something (something with pages
somewhere?) that don't get messed up by pgindent, but I couldn't find
them.
There is a bit of duplicated code between heap_xlog_prune() and
heap2_desc() since they both need to deserialize the record. Before the
code to do this was small and it didn't matter, but it might be worth
refactoring it that way now.
I have added a helper function to do the deserialization,
heap_xlog_deserialize_prune_and_freeze(). But I didn't start using it in
heap2_desc() because of the way the pg_waldump build file works. Do you
think the helper belongs in any of waldump's existing sources?
pg_waldump_sources = files(
'compat.c',
'pg_waldump.c',
'rmgrdesc.c',
)
pg_waldump_sources += rmgr_desc_sources
pg_waldump_sources += xlogreader_sources
pg_waldump_sources += files('../../backend/access/transam/xlogstats.c')
Otherwise, I assume I am suppose to avoid adding some big new include to
waldump.
On Wed, Mar 6, 2024 at 7:59 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I don't think we need XLOG_HEAP2_FREEZE_PAGE as a separate record type
anymore. By removing that, you also get rid of the freeze-only codepath
near the end of heap_page_prune(), and the
heap_freeze_execute_prepared() function.The XLOG_HEAP2_FREEZE_PAGE record is a little smaller than
XLOG_HEAP2_PRUNE. But we could optimize the XLOG_HEAP2_PRUNE format for
the case that there's no pruning, just freezing. The record format
(xl_heap_prune) looks pretty complex as it is, I think it could be made
both more compact and more clear with some refactoring.On the point of removing the freeze-only code path from
heap_page_prune() (now heap_page_prune_and_freeze()): while doing this,
I realized that heap_pre_freeze_checks() was not being called in the
case that we decide to freeze because we emitted an FPI while setting
the hint bit. I've fixed that, however, I've done so by moving
heap_pre_freeze_checks() into the critical section. I think that is not
okay? I could move it earlier and not do call it when the hint bit FPI
leads us to freeze tuples. But, I think that would lead to us doing a
lot less validation of tuples being frozen when checksums are enabled.
Or, I could make two critical sections?
I found another approach and just do the pre-freeze checks if we are
considering freezing except for the FPI criteria.
HeapPageFreeze has two "trackers", for the "freeze" and "no freeze"
cases. heap_page_prune() needs to track both, until it decides whether
to freeze or not. But it doesn't make much sense that the caller
(lazy_scan_prune()) has to initialize both, and has to choose which of
the values to use after the call depending on whether heap_page_prune()
froze or not. The two trackers should be just heap_page_prune()'s
internal business.I've added new_relminmxid and new_relfrozenxid to PruneFreezeResult and
set them appropriately in heap_page_prune_and_freeze().It's a bit sad because if it wasn't for vacrel->skippedallvis,
vacrel->NewRelfrozenXid and vacrel->NewRelminMxid would be
vacrel->cutoffs.OldestXmin and vacrel->cutoffs.OldestMxact respectively
and we could avoid having lazy_scan_prune() initializing the
HeapPageFreeze altogether and just pass VacuumCutoffs (and
heap_page_prune_opt() could pass NULL) to heap_page_prune_and_freeze().I think it is probably worse to add both of them as additional optional
arguments, so I've just left lazy_scan_prune() with the job of
initializing them.In heap_page_prune_and_freeze(), I initialize new_relminmxid and
new_relfrozenxid to InvalidMultiXactId and InvalidTransactionId
respectively because on-access pruning doesn't have a value to set them
to. But I wasn't sure if this was okay -- since I don't see validation
that TransactionIdIsValid() in vac_update_relstats(). It will work now
-- just worried about future issues. I could add an assert there?
I looked more closely at vac_update_relstats() and it won't update
relfrozenxid unless the proposed value is smaller than the existing one.
And for MultiXactIds, it checks if it is valid. So, this is not an
issue.
I've also optimized the member ordering of PruneFreezeResult. pahole
identified some avoidable holes. I went through each commit and
optimized the PruneResult data structure as members are being added and
removed.
- Melanie
Attachments:
v3-0005-Prepare-freeze-tuples-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From c9f8a6a8fa06c62a738a8597f7fa0186719e3e0b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 11:18:52 -0500
Subject: [PATCH v3 05/17] Prepare freeze tuples in heap_page_prune()
In order to combine the freeze and prune records, we must determine
which tuples are freezable before actually executing pruning. All of the
page modifications should be made in the same critical section along
with emitting the combined WAL. Determine whether or not tuples should
or must be frozen and whether or not the page will be all frozen as a
consequence during pruning.
---
src/backend/access/heap/pruneheap.c | 78 ++++++++++++++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 68 ++++++------------------
src/include/access/heapam.h | 12 +++++
3 files changed, 101 insertions(+), 57 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 42fd4a74845..6bd8400b33b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -62,6 +62,9 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
PruneState *prstate, PruneResult *presult);
+
+static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
+ HeapPageFreeze *pagefrz, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -155,7 +158,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false,
+ heap_page_prune(relation, buffer, vistest, false, NULL,
&presult, NULL);
/*
@@ -218,6 +221,9 @@ prune_freeze_xmin_is_removable(GlobalVisState *visstate, TransactionId xmin)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED during
* pruning.
*
+ * pagefrz contains both input and output parameters used if the caller is
+ * interested in potentially freezing tuples on the page.
+ *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -229,6 +235,7 @@ void
heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
OffsetNumber *off_loc)
{
@@ -264,6 +271,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ presult->nfrozen = 0;
/*
* Keep track of whether or not the page is all_visible in case the caller
@@ -410,6 +418,15 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
presult->all_visible_except_removable = presult->all_visible;
+ /*
+ * We will update the VM after pruning, collecting LP_DEAD items, and
+ * freezing tuples. Keep track of whether or not the page is all_visible
+ * and all_frozen and use this information to update the VM. all_visible
+ * implies lpdead_items == 0, but don't trust all_frozen result unless
+ * all_visible is also set to true.
+ */
+ presult->all_frozen = true;
+
/* Scan the page */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -417,14 +434,18 @@ heap_page_prune(Relation relation, Buffer buffer,
{
ItemId itemid;
- /* Ignore items already processed as part of an earlier chain */
- if (prstate.marked[offnum])
- continue;
-
/* see preceding loop */
if (off_loc)
*off_loc = offnum;
+ if (pagefrz)
+ prune_prepare_freeze_tuple(page, offnum,
+ pagefrz, presult);
+
+ /* Ignore items already processed as part of an earlier chain */
+ if (prstate.marked[offnum])
+ continue;
+
/* Nothing to do if slot is empty */
itemid = PageGetItemId(page, offnum);
if (!ItemIdIsUsed(itemid))
@@ -867,6 +888,53 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
return ndeleted;
}
+/*
+ * While pruning, before actually executing pruning and updating the line
+ * pointers, we may consider freezing tuples referred to by LP_NORMAL line
+ * pointers whose visibility status is not HEAPTUPLE_DEAD. That is to say, we
+ * want to consider freezing normal tuples which will not be removed.
+*/
+static void
+prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
+ HeapPageFreeze *pagefrz,
+ PruneResult *presult)
+{
+ bool totally_frozen;
+ HeapTupleHeader htup;
+ ItemId itemid;
+
+ Assert(pagefrz);
+
+ itemid = PageGetItemId(page, offnum);
+
+ if (!ItemIdIsNormal(itemid))
+ return;
+
+ /* We do not consider freezing tuples which will be removed. */
+ if (presult->htsv[offnum] == HEAPTUPLE_DEAD ||
+ presult->htsv[offnum] == -1)
+ return;
+
+ htup = (HeapTupleHeader) PageGetItem(page, itemid);
+
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, pagefrz,
+ &presult->frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ presult->frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to become
+ * totally frozen (according to its freeze plan), then the page definitely
+ * cannot be set all-frozen in the visibility map later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+}
+
/* Record lowest soon-prunable XID */
static void
heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 06e0e841582..4187c998d25 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1416,16 +1416,13 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff;
ItemId itemid;
PruneResult presult;
- int tuples_frozen,
- lpdead_items,
+ int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1443,7 +1440,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
recently_dead_tuples = 0;
@@ -1461,31 +1457,20 @@ lazy_scan_prune(LVRelState *vacrel,
* false otherwise.
*/
heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
- &presult, &vacrel->offnum);
+ &pagefrz, &presult, &vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and check for tuples
* requiring freezing among remaining tuples with storage. We will update
* the VM after collecting LP_DEAD items and freezing tuples. Pruning will
- * have determined whether or not the page is all_visible. Keep track of
- * whether or not the page is all_frozen and use this information to
- * update the VM. all_visible implies lpdead_items == 0, but don't trust
- * all_frozen result unless all_visible is also set to true.
+ * have determined whether or not the page is all_visible and able to
+ * become all_frozen.
*
*/
- all_frozen = true;
-
- /*
- * Now scan the page to collect LP_DEAD items and update the variables set
- * just above.
- */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
- HeapTupleHeader htup;
- bool totally_frozen;
-
/*
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
@@ -1521,8 +1506,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(ItemIdIsNormal(itemid));
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
/*
* The criteria for counting a tuple as live in this block need to
* match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
@@ -1587,29 +1570,8 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
- /* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &pagefrz,
- &frozen[tuples_frozen], &totally_frozen))
- {
- /* Save prepared freeze plan for later */
- frozen[tuples_frozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the page
- * definitely cannot be set all-frozen in the visibility map later on
- */
- if (!totally_frozen)
- all_frozen = false;
}
- /*
- * We have now divided every item on the page into either an LP_DEAD item
- * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
- * that remains and needs to be considered for freezing now (LP_UNUSED and
- * LP_REDIRECT items also remain, but are of no further interest to us).
- */
vacrel->offnum = InvalidOffsetNumber;
/*
@@ -1618,8 +1580,8 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (presult.all_visible_except_removable && all_frozen &&
+ if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ (presult.all_visible_except_removable && presult.all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1629,7 +1591,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (tuples_frozen == 0)
+ if (presult.nfrozen == 0)
{
/*
* We have no freeze plans to execute, so there's no added cost
@@ -1657,7 +1619,7 @@ lazy_scan_prune(LVRelState *vacrel,
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (presult.all_visible_except_removable && all_frozen)
+ if (presult.all_visible_except_removable && presult.all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
snapshotConflictHorizon = presult.frz_conflict_horizon;
@@ -1673,7 +1635,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(vacrel->rel, buf,
snapshotConflictHorizon,
- frozen, tuples_frozen);
+ presult.frozen, presult.nfrozen);
}
}
else
@@ -1684,8 +1646,8 @@ lazy_scan_prune(LVRelState *vacrel,
*/
vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- all_frozen = false;
- tuples_frozen = 0; /* avoid miscounts in instrumentation */
+ presult.all_frozen = false;
+ presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1708,6 +1670,8 @@ lazy_scan_prune(LVRelState *vacrel,
&debug_cutoff, &debug_all_frozen))
Assert(false);
+ Assert(presult.all_frozen == debug_all_frozen);
+
Assert(!TransactionIdIsValid(debug_cutoff) ||
debug_cutoff == presult.frz_conflict_horizon);
}
@@ -1738,7 +1702,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += tuples_frozen;
+ vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
vacrel->live_tuples += live_tuples;
vacrel->recently_dead_tuples += recently_dead_tuples;
@@ -1761,7 +1725,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (all_frozen)
+ if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1832,7 +1796,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && presult.all_visible &&
- all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 297ba03bf09..2339abfd28a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -201,6 +201,11 @@ typedef struct PruneResult
int nnewlpdead; /* Number of newly LP_DEAD items */
bool all_visible; /* Whether or not the page is all visible */
bool all_visible_except_removable;
+ /* Whether or not the page can be set all frozen in the VM */
+ bool all_frozen;
+
+ /* Number of newly frozen tuples */
+ int nfrozen;
TransactionId frz_conflict_horizon; /* Newest xmin on the page */
/*
@@ -213,6 +218,12 @@ typedef struct PruneResult
* 1. Otherwise every access would need to subtract 1.
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneResult;
/*
@@ -324,6 +335,7 @@ extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer,
--
2.40.1
v3-0004-Add-reference-to-VacuumCutoffs-in-HeapPageFreeze.patchtext/x-diff; charset=us-asciiDownload
From 317b479b009c13836b28e289a1782ed6f865b732 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 16:22:17 -0500
Subject: [PATCH v3 04/17] Add reference to VacuumCutoffs in HeapPageFreeze
Future commits will move opportunistic freezing into the main path of
pruning in heap_page_prune(). Because on-access pruning will not do
opportunistic freezing, it is cleaner to keep the visibility information
required for calling heap_prepare_freeze_tuple() inside of the
HeapPageFreeze structure itself by saving a reference to VacuumCutoffs.
---
src/backend/access/heap/heapam.c | 67 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 3 +-
src/include/access/heapam.h | 2 +-
3 files changed, 36 insertions(+), 36 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34bc60f625f..7261c4988d7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6023,7 +6023,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
*/
static TransactionId
FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
- const struct VacuumCutoffs *cutoffs, uint16 *flags,
+ uint16 *flags,
HeapPageFreeze *pagefrz)
{
TransactionId newxmax;
@@ -6049,12 +6049,12 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
pagefrz->freeze_required = true;
return InvalidTransactionId;
}
- else if (MultiXactIdPrecedes(multi, cutoffs->relminmxid))
+ else if (MultiXactIdPrecedes(multi, pagefrz->cutoffs->relminmxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found multixact %u from before relminmxid %u",
- multi, cutoffs->relminmxid)));
- else if (MultiXactIdPrecedes(multi, cutoffs->OldestMxact))
+ multi, pagefrz->cutoffs->relminmxid)));
+ else if (MultiXactIdPrecedes(multi, pagefrz->cutoffs->OldestMxact))
{
TransactionId update_xact;
@@ -6069,7 +6069,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u from before multi freeze cutoff %u found to be still running",
- multi, cutoffs->OldestMxact)));
+ multi, pagefrz->cutoffs->OldestMxact)));
if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
{
@@ -6080,13 +6080,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
/* replace multi with single XID for its updater? */
update_xact = MultiXactIdGetUpdateXid(multi, t_infomask);
- if (TransactionIdPrecedes(update_xact, cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(update_xact, pagefrz->cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains update XID %u from before relfrozenxid %u",
multi, update_xact,
- cutoffs->relfrozenxid)));
- else if (TransactionIdPrecedes(update_xact, cutoffs->OldestXmin))
+ pagefrz->cutoffs->relfrozenxid)));
+ else if (TransactionIdPrecedes(update_xact, pagefrz->cutoffs->OldestXmin))
{
/*
* Updater XID has to have aborted (otherwise the tuple would have
@@ -6098,7 +6098,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains committed update XID %u from before removable cutoff %u",
multi, update_xact,
- cutoffs->OldestXmin)));
+ pagefrz->cutoffs->OldestXmin)));
*flags |= FRM_INVALIDATE_XMAX;
pagefrz->freeze_required = true;
return InvalidTransactionId;
@@ -6150,9 +6150,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
{
TransactionId xid = members[i].xid;
- Assert(!TransactionIdPrecedes(xid, cutoffs->relfrozenxid));
+ Assert(!TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid));
- if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->FreezeLimit))
{
/* Can't violate the FreezeLimit postcondition */
need_replace = true;
@@ -6164,7 +6164,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
/* Can't violate the MultiXactCutoff postcondition, either */
if (!need_replace)
- need_replace = MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff);
+ need_replace = MultiXactIdPrecedes(multi, pagefrz->cutoffs->MultiXactCutoff);
if (!need_replace)
{
@@ -6203,7 +6203,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
TransactionId xid = members[i].xid;
MultiXactStatus mstatus = members[i].status;
- Assert(!TransactionIdPrecedes(xid, cutoffs->relfrozenxid));
+ Assert(!TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid));
if (!ISUPDATE_from_mxstatus(mstatus))
{
@@ -6214,12 +6214,12 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdIsCurrentTransactionId(xid) ||
TransactionIdIsInProgress(xid))
{
- if (TransactionIdPrecedes(xid, cutoffs->OldestXmin))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains running locker XID %u from before removable cutoff %u",
multi, xid,
- cutoffs->OldestXmin)));
+ pagefrz->cutoffs->OldestXmin)));
newmembers[nnewmembers++] = members[i];
has_lockers = true;
}
@@ -6277,11 +6277,11 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* We determined that updater must be kept -- add it to pending new
* members list
*/
- if (TransactionIdPrecedes(xid, cutoffs->OldestXmin))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains committed update XID %u from before removable cutoff %u",
- multi, xid, cutoffs->OldestXmin)));
+ multi, xid, pagefrz->cutoffs->OldestXmin)));
newmembers[nnewmembers++] = members[i];
}
@@ -6373,7 +6373,6 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
*/
bool
heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen)
{
@@ -6401,14 +6400,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
xmin_already_frozen = true;
else
{
- if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found xmin %u from before relfrozenxid %u",
- xid, cutoffs->relfrozenxid)));
+ xid, pagefrz->cutoffs->relfrozenxid)));
/* Will set freeze_xmin flags in freeze plan below */
- freeze_xmin = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
+ freeze_xmin = TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin);
/* Verify that xmin committed if and when freeze plan is executed */
if (freeze_xmin)
@@ -6422,8 +6421,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
xid = HeapTupleHeaderGetXvac(tuple);
if (TransactionIdIsNormal(xid))
{
- Assert(TransactionIdPrecedesOrEquals(cutoffs->relfrozenxid, xid));
- Assert(TransactionIdPrecedes(xid, cutoffs->OldestXmin));
+ Assert(TransactionIdPrecedesOrEquals(pagefrz->cutoffs->relfrozenxid, xid));
+ Assert(TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin));
/*
* For Xvac, we always freeze proactively. This allows totally_frozen
@@ -6448,8 +6447,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* perform no-op xmax processing. The only constraint is that the
* FreezeLimit/MultiXactCutoff postcondition must never be violated.
*/
- newxmax = FreezeMultiXactId(xid, tuple->t_infomask, cutoffs,
- &flags, pagefrz);
+ newxmax = FreezeMultiXactId(xid, tuple->t_infomask, &flags, pagefrz);
if (flags & FRM_NOOP)
{
@@ -6472,7 +6470,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* (This repeats work from FreezeMultiXactId, but allows "no
* freeze" tracker maintenance to happen in only one place.)
*/
- Assert(!MultiXactIdPrecedes(newxmax, cutoffs->MultiXactCutoff));
+ Assert(!MultiXactIdPrecedes(newxmax, pagefrz->cutoffs->MultiXactCutoff));
Assert(MultiXactIdIsValid(newxmax) && xid == newxmax);
}
else if (flags & FRM_RETURN_IS_XID)
@@ -6481,7 +6479,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* xmax will become an updater Xid (original MultiXact's updater
* member Xid will be carried forward as a simple Xid in Xmax).
*/
- Assert(!TransactionIdPrecedes(newxmax, cutoffs->OldestXmin));
+ Assert(!TransactionIdPrecedes(newxmax, pagefrz->cutoffs->OldestXmin));
/*
* NB -- some of these transformations are only valid because we
@@ -6505,7 +6503,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* xmax is an old MultiXactId that we have to replace with a new
* MultiXactId, to carry forward two or more original member XIDs.
*/
- Assert(!MultiXactIdPrecedes(newxmax, cutoffs->OldestMxact));
+ Assert(!MultiXactIdPrecedes(newxmax, pagefrz->cutoffs->OldestMxact));
/*
* We can't use GetMultiXactIdHintBits directly on the new multi
@@ -6540,14 +6538,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
else if (TransactionIdIsNormal(xid))
{
/* Raw xmax is normal XID */
- if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found xmax %u from before relfrozenxid %u",
- xid, cutoffs->relfrozenxid)));
+ xid, pagefrz->cutoffs->relfrozenxid)));
/* Will set freeze_xmax flags in freeze plan below */
- freeze_xmax = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
+ freeze_xmax = TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin);
/*
* Verify that xmax aborted if and when freeze plan is executed,
@@ -6627,7 +6625,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* Does this tuple force caller to freeze the entire page?
*/
pagefrz->freeze_required =
- heap_tuple_should_freeze(tuple, cutoffs,
+ heap_tuple_should_freeze(tuple, pagefrz->cutoffs,
&pagefrz->NoFreezePageRelfrozenXid,
&pagefrz->NoFreezePageRelminMxid);
}
@@ -6949,8 +6947,9 @@ heap_freeze_tuple(HeapTupleHeader tuple,
pagefrz.NoFreezePageRelfrozenXid = FreezeLimit;
pagefrz.NoFreezePageRelminMxid = MultiXactCutoff;
- do_freeze = heap_prepare_freeze_tuple(tuple, &cutoffs,
- &pagefrz, &frz, &totally_frozen);
+ pagefrz.cutoffs = &cutoffs;
+
+ do_freeze = heap_prepare_freeze_tuple(tuple, &pagefrz, &frz, &totally_frozen);
/*
* Note that because this is not a WAL-logged operation, we don't need to
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f9892f4cd08..06e0e841582 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1442,6 +1442,7 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ pagefrz.cutoffs = &vacrel->cutoffs;
tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
@@ -1587,7 +1588,7 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
+ if (heap_prepare_freeze_tuple(htup, &pagefrz,
&frozen[tuples_frozen], &totally_frozen))
{
/* Save prepared freeze plan for later */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index d8e65ae7e35..297ba03bf09 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -189,6 +189,7 @@ typedef struct HeapPageFreeze
TransactionId NoFreezePageRelfrozenXid;
MultiXactId NoFreezePageRelminMxid;
+ struct VacuumCutoffs *cutoffs;
} HeapPageFreeze;
/*
@@ -295,7 +296,6 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
--
2.40.1
v3-0001-lazy_scan_prune-tests-tuple-vis-with-GlobalVisTes.patchtext/x-diff; charset=us-asciiDownload
From 377172183a63e133d62996768e0f927d54aa7adf Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:14:47 -0500
Subject: [PATCH v3 01/17] lazy_scan_prune tests tuple vis with GlobalVisTest
One requirement for eventually combining the prune and freeze records,
is that we must check during pruning if live tuples on the page are
visible to everyone and thus, whether or not the page is all visible. We
only consider opportunistically freezing tuples if the whole page is all
visible and could be set all frozen.
During pruning (in heap_page_prune()), we do not have access to
VacuumCutoffs -- as on access pruning also calls heap_page_prune(). We
do, however, have access to a GlobalVisState. This can be used to
determine whether or not the tuple is visible to everyone. It also has
the potential of being more up-to-date than VacuumCutoffs->OldestXmin.
This commit simply modifies lazy_scan_prune() to use GlobalVisState
instead of OldestXmin. Future commits will move the
all_visible/all_frozen calculation into heap_page_prune().
---
src/backend/access/heap/vacuumlazy.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 18004907750..fe31c0125d6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1373,6 +1373,20 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * Wrap GlobalVisTestIsRemovableXid() to handle FrozenTransactionIds when we
+ * are examining tuple xmins to determine if the page is all-visible during
+ * pruning. Old tuples may have FrozenTransactionId xmins.
+ */
+static inline bool
+prune_freeze_xmin_is_removable(GlobalVisState *visstate, TransactionId xmin)
+{
+ if (xmin == FrozenTransactionId)
+ return true;
+
+ return GlobalVisTestIsRemovableXid(visstate, xmin);
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -1582,8 +1596,7 @@ lazy_scan_prune(LVRelState *vacrel,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(htup);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ if (!prune_freeze_xmin_is_removable(vacrel->vistest, xmin))
{
all_visible = false;
break;
--
2.40.1
v3-0002-Pass-heap_prune_chain-PruneResult-output-paramete.patchtext/x-diff; charset=us-asciiDownload
From a0ae45fd2b1bec08d5040a08663df94c37ad7a9f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:39:59 -0500
Subject: [PATCH v3 02/17] Pass heap_prune_chain() PruneResult output parameter
Future commits will set other members of PruneResult in
heap_prune_chain(), so start passing it as an output parameter now. This
eliminates the output parameter htsv -- the array of HTSV_Results --
since that is a member of the PruneResult.
---
src/backend/access/heap/pruneheap.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4f12413b8b1..4a2bf3dd780 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -61,8 +61,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- int8 *htsv,
- PruneState *prstate);
+ PruneState *prstate, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -325,7 +324,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Process this item or chain of items */
presult->ndeleted += heap_prune_chain(buffer, offnum,
- presult->htsv, &prstate);
+ &prstate, presult);
}
/* Clear the offset information once we have processed the given page. */
@@ -454,7 +453,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in htsv.
+ * Tuple visibility information is provided in presult->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -484,7 +483,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- int8 *htsv, PruneState *prstate)
+ PruneState *prstate, PruneResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -505,7 +504,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(htsv[rootoffnum] != -1);
+ Assert(presult->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -528,7 +527,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -625,7 +624,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(htsv[offnum]))
+ switch (htsv_get_valid_status(presult->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
--
2.40.1
v3-0003-heap_page_prune-sets-all_visible-and-frz_conflict.patchtext/x-diff; charset=us-asciiDownload
From 8759a7a28009cbafc75c1ef986454847ef90338f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 14:01:37 -0500
Subject: [PATCH v3 03/17] heap_page_prune sets all_visible and
frz_conflict_horizon
In order to combine the prune and freeze records, we must know if the
page is eligible to be opportunistically frozen before finishing
pruning. Save all_visible in the PruneResult and set it to false when we
see non-removable tuples which are not visible to everyone.
We will also need to ensure that the snapshotConflictHorizon for the combined
prune + freeze record is the more conservative of that calculated for each of
pruning and freezing. Calculate the visibility_cutoff_xid for the purposes of
freezing -- the newest xmin on the page -- in heap_page_prune() and save it in
PruneResult.frz_conflict_horizon.
---
src/backend/access/heap/pruneheap.c | 136 +++++++++++++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 130 ++++++-------------------
src/include/access/heapam.h | 3 +
3 files changed, 160 insertions(+), 109 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4a2bf3dd780..42fd4a74845 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -65,8 +65,10 @@ static int heap_prune_chain(Buffer buffer,
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
-static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -187,6 +189,20 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
+/*
+ * Wrap GlobalVisTestIsRemovableXid() to handle FrozenTransactionIds when we
+ * are examining tuple xmins to determine if the page is all-visible during
+ * pruning. Old tuples may have FrozenTransactionId xmins.
+ */
+static inline bool
+prune_freeze_xmin_is_removable(GlobalVisState *visstate, TransactionId xmin)
+{
+ if (xmin == FrozenTransactionId)
+ return true;
+
+ return GlobalVisTestIsRemovableXid(visstate, xmin);
+}
+
/*
* Prune and repair fragmentation in the specified page.
*
@@ -249,6 +265,14 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ /*
+ * Keep track of whether or not the page is all_visible in case the caller
+ * wants to use this information to update the VM.
+ */
+ presult->all_visible = true;
+ /* for recovery conflicts */
+ presult->frz_conflict_horizon = InvalidTransactionId;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(prstate.rel);
@@ -300,8 +324,92 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
+ switch (presult->htsv[offnum])
+ {
+ case HEAPTUPLE_DEAD:
+
+ /*
+ * Deliberately delay unsetting all_visible until later during
+ * pruning. Removable dead tuples shouldn't preclude freezing
+ * the page. After finishing this first pass of tuple
+ * visibility checks, initialize all_visible_except_removable
+ * with the current value of all_visible to indicate whether
+ * or not the page is all visible except for dead tuples. This
+ * will allow us to attempt to freeze the page after pruning.
+ * Later during pruning, if we encounter an LP_DEAD item or
+ * are setting an item LP_DEAD, we will unset all_visible. As
+ * long as we unset it before updating the visibility map,
+ * this will be correct.
+ */
+ break;
+ case HEAPTUPLE_LIVE:
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed
+ * asynchronously. See SetHintBits for more info. Check that
+ * the tuple is hinted xmin-committed because of that.
+ */
+ if (presult->all_visible)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed?
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+ if (!prune_freeze_xmin_is_removable(vistest, xmin))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, presult->frz_conflict_horizon) &&
+ TransactionIdIsNormal(xmin))
+ presult->frz_conflict_horizon = xmin;
+ }
+ break;
+ case HEAPTUPLE_RECENTLY_DEAD:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+ /* This is an expected case during concurrent vacuum */
+ presult->all_visible = false;
+ break;
+ default:
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+ break;
+ }
}
+ /*
+ * For vacuum, if the whole page will become frozen, we consider
+ * opportunistically freezing tuples. Dead tuples which will be removed by
+ * the end of vacuuming should not preclude us from opportunistically
+ * freezing. We will not be able to freeze the whole page if there are
+ * tuples present which are not visible to everyone or if there are dead
+ * tuples which are not yet removable. We need all_visible to be false if
+ * LP_DEAD tuples remain after pruning so that we do not incorrectly
+ * update the visibility map or page hint bit. So, we will update
+ * presult->all_visible to reflect the presence of LP_DEAD items while
+ * pruning and keep all_visible_except_removable to permit freezing if the
+ * whole page will eventually become all visible after removing tuples.
+ */
+ presult->all_visible_except_removable = presult->all_visible;
+
/* Scan the page */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -596,10 +704,14 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
/*
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now. We don't increment ndeleted here since
- * the LP was already marked dead.
+ * the LP was already marked dead. If it will not be marked
+ * LP_UNUSED, it will remain LP_DEAD, making the page not
+ * all_visible.
*/
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
+ else
+ presult->all_visible = false;
break;
}
@@ -736,7 +848,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
}
@@ -749,7 +861,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect item. We can clean up by setting the redirect item to
* DEAD state or LP_UNUSED if the caller indicated.
*/
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
return ndeleted;
@@ -786,13 +898,20 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
-heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+
+ /*
+ * Setting the line pointer LP_DEAD means the page will definitely not be
+ * all_visible.
+ */
+ presult->all_visible = false;
}
/*
@@ -802,7 +921,8 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
* pointers LP_DEAD if mark_unused_now is true.
*/
static void
-heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
@@ -813,7 +933,7 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
- heap_prune_record_dead(prstate, offnum);
+ heap_prune_record_dead(prstate, offnum, presult);
}
/* Record line pointer to be marked unused */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fe31c0125d6..f9892f4cd08 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1373,20 +1373,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * Wrap GlobalVisTestIsRemovableXid() to handle FrozenTransactionIds when we
- * are examining tuple xmins to determine if the page is all-visible during
- * pruning. Old tuples may have FrozenTransactionId xmins.
- */
-static inline bool
-prune_freeze_xmin_is_removable(GlobalVisState *visstate, TransactionId xmin)
-{
- if (xmin == FrozenTransactionId)
- return true;
-
- return GlobalVisTestIsRemovableXid(visstate, xmin);
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -1436,9 +1422,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_visible,
- all_frozen;
- TransactionId visibility_cutoff_xid;
+ bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1479,17 +1463,16 @@ lazy_scan_prune(LVRelState *vacrel,
&presult, &vacrel->offnum);
/*
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Keep track of whether or not the page is all_visible and
- * all_frozen and use this information to update the VM. all_visible
- * implies 0 lpdead_items, but don't trust all_frozen result unless
- * all_visible is also set to true.
+ * Now scan the page to collect LP_DEAD items and check for tuples
+ * requiring freezing among remaining tuples with storage. We will update
+ * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
+ * have determined whether or not the page is all_visible. Keep track of
+ * whether or not the page is all_frozen and use this information to
+ * update the VM. all_visible implies lpdead_items == 0, but don't trust
+ * all_frozen result unless all_visible is also set to true.
*
- * Also keep track of the visibility cutoff xid for recovery conflicts.
*/
- all_visible = true;
all_frozen = true;
- visibility_cutoff_xid = InvalidTransactionId;
/*
* Now scan the page to collect LP_DEAD items and update the variables set
@@ -1530,11 +1513,6 @@ lazy_scan_prune(LVRelState *vacrel,
* will only happen every other VACUUM, at most. Besides, VACUUM
* must treat hastup/nonempty_pages as provisional no matter how
* LP_DEAD items are handled (handled here, or handled later on).
- *
- * Also deliberately delay unsetting all_visible until just before
- * we return to lazy_scan_heap caller, as explained in full below.
- * (This is another case where it's useful to anticipate that any
- * LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
*/
deadoffsets[lpdead_items++] = offnum;
continue;
@@ -1572,41 +1550,6 @@ lazy_scan_prune(LVRelState *vacrel,
* what acquire_sample_rows() does.
*/
live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (!prune_freeze_xmin_is_removable(vacrel->vistest, xmin))
- {
- all_visible = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
break;
case HEAPTUPLE_RECENTLY_DEAD:
@@ -1616,7 +1559,6 @@ lazy_scan_prune(LVRelState *vacrel,
* pruning.)
*/
recently_dead_tuples++;
- all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1627,16 +1569,13 @@ lazy_scan_prune(LVRelState *vacrel,
* results. This assumption is a bit shaky, but it is what
* acquire_sample_rows() does, so be consistent.
*/
- all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
- all_visible = false;
/*
- * Count such rows as live. As above, we assume the deleting
- * transaction will commit and update the counters after we
- * report.
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
*/
live_tuples++;
break;
@@ -1679,7 +1618,7 @@ lazy_scan_prune(LVRelState *vacrel,
* page all-frozen afterwards (might not happen until final heap pass).
*/
if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (all_visible && all_frozen &&
+ (presult.all_visible_except_removable && all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1712,16 +1651,16 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->frozen_pages++;
/*
- * We can use visibility_cutoff_xid as our cutoff for conflicts
+ * We can use frz_conflict_horizon as our cutoff for conflicts
* when the whole page is eligible to become all-frozen in the VM
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (all_visible && all_frozen)
+ if (presult.all_visible_except_removable && all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = visibility_cutoff_xid;
- visibility_cutoff_xid = InvalidTransactionId;
+ snapshotConflictHorizon = presult.frz_conflict_horizon;
+ presult.frz_conflict_horizon = InvalidTransactionId;
}
else
{
@@ -1757,17 +1696,19 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && lpdead_items == 0)
+ if (presult.all_visible)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
+ Assert(lpdead_items == 0);
+
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
Assert(false);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == visibility_cutoff_xid);
+ debug_cutoff == presult.frz_conflict_horizon);
}
#endif
@@ -1792,19 +1733,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(dead_items->num_items <= dead_items->max_items);
pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
dead_items->num_items);
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on
- * to make the choice of whether or not to freeze the page unaffected
- * by the short-term presence of LP_DEAD items. These LP_DEAD items
- * were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible. It needs
- * to reflect the present state of things, as expected by our caller.
- */
- all_visible = false;
}
/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1821,20 +1749,20 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (lpdead_items > 0);
- Assert(!all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_visible || !(*has_lpdead_items));
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (all_frozen)
{
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1854,7 +1782,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
+ vmbuffer, presult.frz_conflict_horizon,
flags);
}
@@ -1902,7 +1830,7 @@ lazy_scan_prune(LVRelState *vacrel,
* it as all-frozen. Note that all_frozen is only valid if all_visible is
* true, so we must check both all_visible and all_frozen.
*/
- else if (all_visible_according_to_vm && all_visible &&
+ else if (all_visible_according_to_vm && presult.all_visible &&
all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
@@ -1919,11 +1847,11 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Set the page all-frozen (and all-visible) in the VM.
*
- * We can pass InvalidTransactionId as our visibility_cutoff_xid,
- * since a snapshotConflictHorizon sufficient to make everything safe
- * for REDO was logged when the page's tuples were frozen.
+ * We can pass InvalidTransactionId as our frz_conflict_horizon, since
+ * a snapshotConflictHorizon sufficient to make everything safe for
+ * REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f68593..d8e65ae7e35 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,6 +198,9 @@ typedef struct PruneResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ bool all_visible; /* Whether or not the page is all visible */
+ bool all_visible_except_removable;
+ TransactionId frz_conflict_horizon; /* Newest xmin on the page */
/*
* Tuple visibility is only computed once for each tuple, for correctness
--
2.40.1
v3-0006-lazy_scan_prune-reorder-freeze-execution-logic.patchtext/x-diff; charset=us-asciiDownload
From 1bb0bdd4e1337fe95b34bedaac255285144a3329 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 14:50:12 -0500
Subject: [PATCH v3 06/17] lazy_scan_prune reorder freeze execution logic
To combine the prune and freeze records, freezing must be done before a
pruning WAL record is emitted. We will move the freeze execution into
heap_page_prune() in future commits. lazy_scan_prune() currently
executes freezing, updates vacrel->NewRelfrozenXid and
vacrel->NewRelminMxid, and resets the snapshotConflictHorizon that the
visibility map update record may use in the same block of if statements.
This commit starts reordering that logic so that the freeze execution
can be separated from the other updates which should not be done in
pruning. It also adds a helper calculating freeze snapshot conflict
horizon. This will be useful when the freeze execution is moved into
pruning because not all callers of heap_page_prune() have access to
VacuumCutoffs.
---
src/backend/access/heap/vacuumlazy.c | 112 ++++++++++++++++-----------
1 file changed, 67 insertions(+), 45 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4187c998d25..abbb7ab3ada 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -269,6 +269,8 @@ static void update_vacuum_error_info(LVRelState *vacrel,
static void restore_vacuum_error_info(LVRelState *vacrel,
const LVSavedErrInfo *saved_vacrel);
+static TransactionId heap_frz_conflict_horizon(PruneResult *presult,
+ HeapPageFreeze *pagefrz);
/*
* heap_vacuum_rel() -- perform VACUUM for one heap relation
@@ -1373,6 +1375,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/*
+ * Determine the snapshotConflictHorizon for freezing. Must only be called
+ * after pruning and determining if the page is freezable.
+ */
+static TransactionId
+heap_frz_conflict_horizon(PruneResult *presult, HeapPageFreeze *pagefrz)
+{
+ TransactionId result;
+
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts when the
+ * whole page is eligible to become all-frozen in the VM once we're done
+ * with it. Otherwise we generate a conservative cutoff by stepping back
+ * from OldestXmin.
+ */
+ if (presult->all_visible_except_removable && presult->all_frozen)
+ result = presult->frz_conflict_horizon;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ result = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(result);
+ }
+
+ return result;
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -1421,6 +1450,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
+ bool do_freeze;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1580,10 +1610,15 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ do_freeze = pagefrz.freeze_required ||
(presult.all_visible_except_removable && presult.all_frozen &&
- fpi_before != pgWalUsage.wal_fpi))
+ presult.nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+
+ if (do_freeze)
{
+ TransactionId snapshotConflictHorizon;
+
/*
* We're freezing the page. Our final NewRelfrozenXid doesn't need to
* be affected by the XIDs that are just about to be frozen anyway.
@@ -1591,52 +1626,39 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (presult.nfrozen == 0)
- {
- /*
- * We have no freeze plans to execute, so there's no added cost
- * from following the freeze path. That's why it was chosen. This
- * is important in the case where the page only contains totally
- * frozen tuples at this point (perhaps only following pruning).
- * Such pages can be marked all-frozen in the VM by our caller,
- * even though none of its tuples were newly frozen here (note
- * that the "no freeze" path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter
- * here, since it only counts pages with newly frozen tuples
- * (don't confuse that with pages newly set all-frozen in VM).
- */
- }
- else
- {
- TransactionId snapshotConflictHorizon;
+ vacrel->frozen_pages++;
- vacrel->frozen_pages++;
+ snapshotConflictHorizon = heap_frz_conflict_horizon(&presult, &pagefrz);
- /*
- * We can use frz_conflict_horizon as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM
- * once we're done with it. Otherwise we generate a conservative
- * cutoff by stepping back from OldestXmin.
- */
- if (presult.all_visible_except_removable && presult.all_frozen)
- {
- /* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = presult.frz_conflict_horizon;
- presult.frz_conflict_horizon = InvalidTransactionId;
- }
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
+ /* Using same cutoff when setting VM is now unnecessary */
+ if (presult.all_visible_except_removable && presult.all_frozen)
+ presult.frz_conflict_horizon = InvalidTransactionId;
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(vacrel->rel, buf,
+ snapshotConflictHorizon,
+ presult.frozen, presult.nfrozen);
+ }
+ else if (presult.all_frozen && presult.nfrozen == 0)
+ {
+ /* Page should be all visible except to-be-removed tuples */
+ Assert(presult.all_visible_except_removable);
+
+ /*
+ * We have no freeze plans to execute, so there's no added cost from
+ * following the freeze path. That's why it was chosen. This is
+ * important in the case where the page only contains totally frozen
+ * tuples at this point (perhaps only following pruning). Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here (note that the "no freeze"
+ * path never sets pages all-frozen).
+ *
+ * We never increment the frozen_pages instrumentation counter here,
+ * since it only counts pages with newly frozen tuples (don't confuse
+ * that with pages newly set all-frozen in VM).
+ */
+ vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
+ vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
}
else
{
--
2.40.1
v3-0007-Execute-freezing-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From ab80dd6a10d28e2483a074f1a9ea8b445e7d487c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 8 Mar 2024 16:45:57 -0500
Subject: [PATCH v3 07/17] Execute freezing in heap_page_prune()
As a step toward combining the prune and freeze WAL records, execute
freezing in heap_page_prune(). The logic to determine whether or not to
execute freeze plans was moved from lazy_scan_prune() over to
heap_page_prune() with little modification.
---
src/backend/access/heap/heapam_handler.c | 2 +-
src/backend/access/heap/pruneheap.c | 151 +++++++++++++++++------
src/backend/access/heap/vacuumlazy.c | 129 ++++++-------------
src/backend/storage/ipc/procarray.c | 6 +-
src/include/access/heapam.h | 41 +++---
src/tools/pgindent/typedefs.list | 2 +-
6 files changed, 180 insertions(+), 151 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 680a50bf8b1..5e522f5b0ba 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1046,7 +1046,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
* We ignore unused and redirect line pointers. DEAD line pointers
* should be counted as dead, because we need vacuum to run to get rid
* of them. Note that this rule agrees with the way that
- * heap_page_prune() counts things.
+ * heap_page_prune_and_freeze() counts things.
*/
if (!ItemIdIsNormal(itemid))
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6bd8400b33b..abf6bdb2d99 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -17,16 +17,18 @@
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
+#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
-/* Working data for heap_page_prune and subroutines */
+/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
Relation rel;
@@ -61,17 +63,18 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- PruneState *prstate, PruneResult *presult);
+ PruneState *prstate, PruneFreezeResult *presult);
static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
- HeapPageFreeze *pagefrz, PruneResult *presult);
+ HeapPageFreeze *pagefrz, HeapTupleFreeze *frozen,
+ PruneFreezeResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult);
+ PruneFreezeResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult);
+ PruneFreezeResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -151,15 +154,15 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
- PruneResult presult;
+ PruneFreezeResult presult;
/*
* For now, pass mark_unused_now as false regardless of whether or
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false, NULL,
- &presult, NULL);
+ heap_page_prune_and_freeze(relation, buffer, vistest, false, NULL,
+ &presult, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -207,7 +210,12 @@ prune_freeze_xmin_is_removable(GlobalVisState *visstate, TransactionId xmin)
}
/*
- * Prune and repair fragmentation in the specified page.
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * If the page can be marked all-frozen in the visibility map, we may
+ * opportunistically freeze tuples on the page if either its tuples are old
+ * enough or freezing will be cheap enough.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -221,23 +229,24 @@ prune_freeze_xmin_is_removable(GlobalVisState *visstate, TransactionId xmin)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED during
* pruning.
*
- * pagefrz contains both input and output parameters used if the caller is
- * interested in potentially freezing tuples on the page.
+ * pagefrz is an input parameter containing visibility cutoff information and
+ * the current relfrozenxid and relminmxids used if the caller is interested in
+ * freezing tuples on the page.
*
* off_loc is the offset location required by the caller to use in error
* callback.
*
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
- * heap_page_prune() is responsible for initializing it.
+ * heap_page_prune_and_freeze() is responsible for initializing it.
*/
void
-heap_page_prune(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
- PruneResult *presult,
- OffsetNumber *off_loc)
+heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ GlobalVisState *vistest,
+ bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
+ PruneFreezeResult *presult,
+ OffsetNumber *off_loc)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -245,6 +254,14 @@ heap_page_prune(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
+ bool do_freeze;
+ int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -281,6 +298,10 @@ heap_page_prune(Relation relation, Buffer buffer,
/* for recovery conflicts */
presult->frz_conflict_horizon = InvalidTransactionId;
+ /* For advancing relfrozenxid and relminmxid */
+ presult->new_relfrozenxid = InvalidTransactionId;
+ presult->new_relminmxid = InvalidMultiXactId;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(prstate.rel);
@@ -440,7 +461,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (pagefrz)
prune_prepare_freeze_tuple(page, offnum,
- pagefrz, presult);
+ pagefrz, frozen, presult);
/* Ignore items already processed as part of an earlier chain */
if (prstate.marked[offnum])
@@ -555,6 +576,61 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ */
+ if (pagefrz)
+ do_freeze = pagefrz->freeze_required ||
+ (presult->all_visible_except_removable && presult->all_frozen &&
+ presult->nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+ else
+ do_freeze = false;
+
+ if (do_freeze)
+ {
+ frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
+
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(relation, buffer,
+ frz_conflict_horizon,
+ frozen, presult->nfrozen);
+ }
+ else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
+ {
+ /*
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all frozen and there
+ * will be no newly frozen tuples.
+ */
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+
+ /* Caller won't update new_relfrozenxid and new_relminmxid */
+ if (!pagefrz)
+ return;
+
+ /*
+ * If we will freeze tuples on the page or, even if we don't freeze tuples
+ * on the page, if we will set the page all-frozen in the visibility map,
+ * we can advance relfrozenxid and relminmxid to the values in
+ * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
+ */
+ if (presult->all_frozen || presult->nfrozen > 0)
+ {
+ presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
+ }
+ else
+ {
+ presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
+ }
}
@@ -612,7 +688,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- PruneState *prstate, PruneResult *presult)
+ PruneState *prstate, PruneFreezeResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -877,10 +953,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
/*
* We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune caused us to
- * visit the dead successor of a redirect item before visiting the
- * redirect item. We can clean up by setting the redirect item to
- * DEAD state or LP_UNUSED if the caller indicated.
+ * item. This can happen if the loop in heap_page_prune_and_freeze()
+ * caused us to visit the dead successor of a redirect item before
+ * visiting the redirect item. We can clean up by setting the
+ * redirect item to DEAD state or LP_UNUSED if the caller indicated.
*/
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
@@ -897,7 +973,8 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
static void
prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
HeapPageFreeze *pagefrz,
- PruneResult *presult)
+ HeapTupleFreeze *frozen,
+ PruneFreezeResult *presult)
{
bool totally_frozen;
HeapTupleHeader htup;
@@ -919,11 +996,11 @@ prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
/* Tuple with storage -- consider need to freeze */
if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &presult->frozen[presult->nfrozen],
+ &frozen[presult->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- presult->frozen[presult->nfrozen++].offset = offnum;
+ frozen[presult->nfrozen++].offset = offnum;
}
/*
@@ -967,7 +1044,7 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult)
+ PruneFreezeResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
@@ -990,7 +1067,7 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
*/
static void
heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult)
+ PruneFreezeResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
@@ -1017,9 +1094,9 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
/*
- * Perform the actual page changes needed by heap_page_prune.
- * It is expected that the caller has a full cleanup lock on the
- * buffer.
+ * Perform the actual page pruning modifications needed by
+ * heap_page_prune_and_freeze(). It is expected that the caller has a full
+ * cleanup lock on the buffer.
*/
void
heap_page_prune_execute(Buffer buffer,
@@ -1133,11 +1210,11 @@ heap_page_prune_execute(Buffer buffer,
#ifdef USE_ASSERT_CHECKING
/*
- * When heap_page_prune() was called, mark_unused_now may have been
- * passed as true, which allows would-be LP_DEAD items to be made
- * LP_UNUSED instead. This is only possible if the relation has no
- * indexes. If there are any dead items, then mark_unused_now was not
- * true and every item being marked LP_UNUSED must refer to a
+ * When heap_page_prune_and_freeze() was called, mark_unused_now may
+ * have been passed as true, which allows would-be LP_DEAD items to be
+ * made LP_UNUSED instead. This is only possible if the relation has
+ * no indexes. If there are any dead items, then mark_unused_now was
+ * not true and every item being marked LP_UNUSED must refer to a
* heap-only tuple.
*/
if (ndead > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index abbb7ab3ada..6dd8d457c9c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -269,9 +269,6 @@ static void update_vacuum_error_info(LVRelState *vacrel,
static void restore_vacuum_error_info(LVRelState *vacrel,
const LVSavedErrInfo *saved_vacrel);
-static TransactionId heap_frz_conflict_horizon(PruneResult *presult,
- HeapPageFreeze *pagefrz);
-
/*
* heap_vacuum_rel() -- perform VACUUM for one heap relation
*
@@ -432,12 +429,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* as an upper bound on the XIDs stored in the pages we'll actually scan
* (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
*
- * Next acquire vistest, a related cutoff that's used in heap_page_prune.
- * We expect vistest will always make heap_page_prune remove any deleted
- * tuple whose xmax is < OldestXmin. lazy_scan_prune must never become
- * confused about whether a tuple should be frozen or removed. (In the
- * future we might want to teach lazy_scan_prune to recompute vistest from
- * time to time, to increase the number of dead tuples it can prune away.)
+ * Next acquire vistest, a related cutoff that's used in
+ * heap_page_prune_and_freeze(). We expect vistest will always make
+ * heap_page_prune_and_freeze() remove any deleted tuple whose xmax is <
+ * OldestXmin. lazy_scan_prune must never become confused about whether a
+ * tuple should be frozen or removed. (In the future we might want to
+ * teach lazy_scan_prune to recompute vistest from time to time, to
+ * increase the number of dead tuples it can prune away.)
*/
vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
@@ -1379,8 +1377,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
* Determine the snapshotConflictHorizon for freezing. Must only be called
* after pruning and determining if the page is freezable.
*/
-static TransactionId
-heap_frz_conflict_horizon(PruneResult *presult, HeapPageFreeze *pagefrz)
+TransactionId
+heap_frz_conflict_horizon(PruneFreezeResult *presult, HeapPageFreeze *pagefrz)
{
TransactionId result;
@@ -1407,21 +1405,21 @@ heap_frz_conflict_horizon(PruneResult *presult, HeapPageFreeze *pagefrz)
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
- * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
- * whether or not a tuple should be considered DEAD. This happened when an
- * inserting transaction concurrently aborted (after our heap_page_prune()
- * call, before our HeapTupleSatisfiesVacuum() call). There was rather a lot
- * of complexity just so we could deal with tuples that were DEAD to VACUUM,
- * but nevertheless were left with storage after pruning.
+ * Prior to PostgreSQL 14 there were very rare cases where
+ * heap_page_prune_and_freeze() was allowed to disagree with our
+ * HeapTupleSatisfiesVacuum() call about whether or not a tuple should be
+ * considered DEAD. This happened when an inserting transaction concurrently
+ * aborted (after our heap_page_prune_and_freeze() call, before our
+ * HeapTupleSatisfiesVacuum() call). There was rather a lot of complexity just
+ * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
+ * left with storage after pruning.
*
* As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune()'s visibility check. Without the second call to
- * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
- * disagreement. We'll just handle such tuples as if they had become fully dead
- * right after this operation completes instead of in the middle of it. Note that
- * any tuple that becomes dead after the call to heap_page_prune() can't need to
- * be frozen, because it was visible to another session when vacuum started.
+ * result of heap_page_prune_and_freeze()'s visibility check. Without the
+ * second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
+ * there can be no disagreement. We'll just handle such tuples as if they had
+ * become fully dead right after this operation completes instead of in the
+ * middle of it.
*
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
@@ -1444,26 +1442,24 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
- PruneResult presult;
+ PruneFreezeResult presult;
int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool do_freeze;
- int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
/*
* maxoff might be reduced following line pointer array truncation in
- * heap_page_prune. That's safe for us to ignore, since the reclaimed
- * space will continue to look like LP_UNUSED items below.
+ * heap_page_prune_and_freeze(). That's safe for us to ignore, since the
+ * reclaimed space will continue to look like LP_UNUSED items below.
*/
maxoff = PageGetMaxOffsetNumber(page);
- /* Initialize (or reset) page-level state */
+ /* Initialize pagefrz */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
@@ -1475,7 +1471,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples = 0;
/*
- * Prune all HOT-update chains in this page.
+ * Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
* in presult.ndeleted. It should not be confused with lpdead_items;
@@ -1486,8 +1482,8 @@ lazy_scan_prune(LVRelState *vacrel,
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
* false otherwise.
*/
- heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
- &pagefrz, &presult, &vacrel->offnum);
+ heap_page_prune_and_freeze(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+ &pagefrz, &presult, &vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and check for tuples
@@ -1604,72 +1600,23 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = InvalidOffsetNumber;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- do_freeze = pagefrz.freeze_required ||
- (presult.all_visible_except_removable && presult.all_frozen &&
- presult.nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
+ Assert(MultiXactIdIsValid(presult.new_relminmxid));
+ vacrel->NewRelfrozenXid = presult.new_relfrozenxid;
+ Assert(TransactionIdIsValid(presult.new_relfrozenxid));
+ vacrel->NewRelminMxid = presult.new_relminmxid;
- if (do_freeze)
+ if (presult.nfrozen > 0)
{
- TransactionId snapshotConflictHorizon;
-
/*
- * We're freezing the page. Our final NewRelfrozenXid doesn't need to
- * be affected by the XIDs that are just about to be frozen anyway.
+ * We never increment the frozen_pages instrumentation counter when
+ * nfrozen == 0, since it only counts pages with newly frozen tuples
+ * (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
-
vacrel->frozen_pages++;
- snapshotConflictHorizon = heap_frz_conflict_horizon(&presult, &pagefrz);
-
/* Using same cutoff when setting VM is now unnecessary */
- if (presult.all_visible_except_removable && presult.all_frozen)
+ if (presult.all_frozen)
presult.frz_conflict_horizon = InvalidTransactionId;
-
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
- else if (presult.all_frozen && presult.nfrozen == 0)
- {
- /* Page should be all visible except to-be-removed tuples */
- Assert(presult.all_visible_except_removable);
-
- /*
- * We have no freeze plans to execute, so there's no added cost from
- * following the freeze path. That's why it was chosen. This is
- * important in the case where the page only contains totally frozen
- * tuples at this point (perhaps only following pruning). Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here (note that the "no freeze"
- * path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter here,
- * since it only counts pages with newly frozen tuples (don't confuse
- * that with pages newly set all-frozen in VM).
- */
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- }
- else
- {
- /*
- * Page requires "no freeze" processing. It might be set all-visible
- * in the visibility map, but it can never be set all-frozen.
- */
- vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- presult.all_frozen = false;
- presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index b3cd248fb64..88a6d504dff 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1715,9 +1715,9 @@ TransactionIdIsActive(TransactionId xid)
* Note: the approximate horizons (see definition of GlobalVisState) are
* updated by the computations done here. That's currently required for
* correctness and a small optimization. Without doing so it's possible that
- * heap vacuum's call to heap_page_prune() uses a more conservative horizon
- * than later when deciding which tuples can be removed - which the code
- * doesn't expect (breaking HOT).
+ * heap vacuum's call to heap_page_prune_and_freeze() uses a more conservative
+ * horizon than later when deciding which tuples can be removed - which the
+ * code doesn't expect (breaking HOT).
*/
static void
ComputeXidHorizons(ComputeXidHorizonsResult *h)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2339abfd28a..45c4ae22e6a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -195,7 +195,7 @@ typedef struct HeapPageFreeze
/*
* Per-page state returned from pruning
*/
-typedef struct PruneResult
+typedef struct PruneFreezeResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
@@ -210,9 +210,10 @@ typedef struct PruneResult
/*
* Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
*
* This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
* 1. Otherwise every access would need to subtract 1.
@@ -220,17 +221,18 @@ typedef struct PruneResult
int8 htsv[MaxHeapTuplesPerPage + 1];
- /*
- * One entry for every tuple that we may freeze.
- */
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
-} PruneResult;
+ /* New value of relfrozenxid found by heap_page_prune_and_freeze() */
+ TransactionId new_relfrozenxid;
+
+ /* New value of relminmxid found by heap_page_prune_and_freeze() */
+ MultiXactId new_relminmxid;
+} PruneFreezeResult;
/*
* Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneResult.htsv for details. This helper function is meant to
- * guard against examining visibility status array members which have not yet
- * been computed.
+ * of int8. See PruneFreezeResult.htsv for details. This helper function is
+ * meant to guard against examining visibility status array members which have
+ * not yet been computed.
*/
static inline HTSV_Result
htsv_get_valid_status(int status)
@@ -306,6 +308,9 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
Buffer *buffer, struct TM_FailureData *tmfd);
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+
+extern TransactionId heap_frz_conflict_horizon(PruneFreezeResult *presult,
+ HeapPageFreeze *pagefrz);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
@@ -332,12 +337,12 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
- PruneResult *presult,
- OffsetNumber *off_loc);
+extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ struct GlobalVisState *vistest,
+ bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
+ PruneFreezeResult *presult,
+ OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index aa7a25b8f8c..1c1a4d305d6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2175,7 +2175,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
-PruneResult
+PruneFreezeResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.40.1
v3-0008-Make-opp-freeze-heuristic-compatible-with-prune-f.patchtext/x-diff; charset=us-asciiDownload
From 4a72b89445a3952e06a0648a7f0c7e6eba2f1edc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 16:11:35 -0500
Subject: [PATCH v3 08/17] Make opp freeze heuristic compatible with
prune+freeze record
Once the prune and freeze records are combined, we will no longer be
able to use a test of whether or not pruning emitted an FPI to decide
whether or not to opportunistically freeze a freezable page.
While this heuristic should be improved, for now, approximate the
previous logic by keeping track of whether or not a hint bit FPI was
emitted during visibility checks (when checksums are on) and combine
that with checking XLogCheckBufferNeedsBackup(). If we just finished
deciding whether or not to prune and the current buffer seems to need an
FPI after modification, it is likely that pruning would have emitted an
FPI.
---
src/backend/access/heap/pruneheap.c | 58 +++++++++++++++++++++--------
1 file changed, 43 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index abf6bdb2d99..f164b7957ed 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -255,6 +255,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PruneState prstate;
HeapTupleData tup;
bool do_freeze;
+ bool do_prune;
+ bool whole_page_freezable;
+ bool hint_bit_fpi;
+ bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
TransactionId frz_conflict_horizon = InvalidTransactionId;
@@ -424,6 +428,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
}
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted. Then reset fpi_before for no prune case.
+ */
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ fpi_before = pgWalUsage.wal_fpi;
+
/*
* For vacuum, if the whole page will become frozen, we consider
* opportunistically freezing tuples. Dead tuples which will be removed by
@@ -481,11 +492,42 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = InvalidOffsetNumber;
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
+ /*
+ * Only incur overhead of checking if we will do an FPI if we might use
+ * the information.
+ */
+ if (do_prune && pagefrz)
+ prune_fpi = XLogCheckBufferNeedsBackup(buffer);
+
+ /* Is the whole page freezable? And is there something to freeze */
+ whole_page_freezable = presult->all_visible_except_removable &&
+ presult->all_frozen;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and prune
+ * records are combined, this heuristic couldn't be used anymore. The
+ * opportunistic freeze heuristic must be improved; however, for now, try
+ * to approximate it.
+ */
+
+ do_freeze = pagefrz &&
+ (pagefrz->freeze_required ||
+ (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
/* Have we found any prunable items? */
- if (prstate.nredirected > 0 || prstate.ndead > 0 || prstate.nunused > 0)
+ if (do_prune)
{
/*
* Apply the planned item changes, then repair page fragmentation, and
@@ -577,20 +619,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- if (pagefrz)
- do_freeze = pagefrz->freeze_required ||
- (presult->all_visible_except_removable && presult->all_frozen &&
- presult->nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
- else
- do_freeze = false;
-
if (do_freeze)
{
frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
--
2.40.1
v3-0009-Separate-tuple-pre-freeze-checks-and-invoke-earli.patchtext/x-diff; charset=us-asciiDownload
From 07e6620bc79878c1f6a2eed4dd6b338045cd9b40 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 16:53:45 -0500
Subject: [PATCH v3 09/17] Separate tuple pre freeze checks and invoke earlier
When combining the prune and freeze records their critical sections will
have to be combined. heap_freeze_execute_prepared() does a set of pre
freeze validations before starting its critical section. Move these
validations into a helper function, heap_pre_freeze_checks(), and invoke
it in heap_page_prune() before the pruning critical section.
Also move up the calculation of the freeze snapshot conflict horizon.
---
src/backend/access/heap/heapam.c | 58 ++++++++++++++++-------------
src/backend/access/heap/pruneheap.c | 8 +++-
src/include/access/heapam.h | 3 ++
3 files changed, 42 insertions(+), 27 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7261c4988d7..16e3f2520a4 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6659,35 +6659,19 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
- */
+* Perform xmin/xmax XID status sanity checks before calling
+* heap_freeze_execute_prepared().
+*
+* heap_prepare_freeze_tuple doesn't perform these checks directly because
+* pg_xact lookups are relatively expensive. They shouldn't be repeated
+* by successive VACUUMs that each decide against freezing the same page.
+*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- /*
- * Perform xmin/xmax XID status sanity checks before critical section.
- *
- * heap_prepare_freeze_tuple doesn't perform these checks directly because
- * pg_xact lookups are relatively expensive. They shouldn't be repeated
- * by successive VACUUMs that each decide against freezing the same page.
- */
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6726,6 +6710,30 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xmax)));
}
}
+}
+
+/*
+ * heap_freeze_execute_prepared
+ *
+ * Executes freezing of one or more heap tuples on a page on behalf of caller.
+ * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
+ * Caller must set 'offset' in each plan for us. Note that we destructively
+ * sort caller's tuples array in-place, so caller had better be done with it.
+ *
+ * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
+ * later on without any risk of unsafe pg_xact lookups, even following a hard
+ * crash (or when querying from a standby). We represent freezing by setting
+ * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
+ * See section on buffer access rules in src/backend/storage/buffer/README.
+ */
+void
+heap_freeze_execute_prepared(Relation rel, Buffer buffer,
+ TransactionId snapshotConflictHorizon,
+ HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
+
+ Assert(ntuples > 0);
START_CRIT_SECTION();
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f164b7957ed..bc0a23da61b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -523,6 +523,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
(pagefrz->freeze_required ||
(whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+ if (do_freeze)
+ {
+ heap_pre_freeze_checks(buffer, frozen, presult->nfrozen);
+ frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
+ }
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -621,8 +627,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
-
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(relation, buffer,
frz_conflict_horizon,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 45c4ae22e6a..dffbbd3cd3e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -314,6 +314,9 @@ extern TransactionId heap_frz_conflict_horizon(PruneFreezeResult *presult,
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
+
+extern void heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
TransactionId snapshotConflictHorizon,
HeapTupleFreeze *tuples, int ntuples);
--
2.40.1
v3-0010-Inline-heap_freeze_execute_prepared.patchtext/x-diff; charset=us-asciiDownload
From 4554c87895dbc0566b345ad590af0fc033142f28 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 17:03:17 -0500
Subject: [PATCH v3 10/17] Inline heap_freeze_execute_prepared()
In order to merge freeze and prune records, the execution of tuple
freezing and the WAL logging of the changes to the page must be
separated so that the WAL logging can be combined with prune WAL
logging. This commit makes a helper for the tuple freezing and then
inlines the contents of heap_freeze_execute_prepared() where it is
called in heap_page_prune(). The original function,
heap_freeze_execute_prepared() is retained because the "no prune" case
in heap_page_prune() must still be able to emit a freeze record.
---
src/backend/access/heap/heapam.c | 61 +++++++++++++++++------------
src/backend/access/heap/pruneheap.c | 51 ++++++++++++++++++++++--
src/include/access/heapam.h | 8 ++++
3 files changed, 90 insertions(+), 30 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 16e3f2520a4..a3691584c55 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -91,9 +91,6 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
ItemPointer ctid, TransactionId xid,
LockTupleMode mode);
-static int heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
- xl_heap_freeze_plan *plans_out,
- OffsetNumber *offsets_out);
static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
uint16 *new_infomask2);
static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
@@ -6713,30 +6710,17 @@ heap_pre_freeze_checks(Buffer buffer,
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
+ * Helper which executes freezing of one or more heap tuples on a page on
+ * behalf of caller. Caller passes an array of tuple plans from
+ * heap_prepare_freeze_tuple. Caller must set 'offset' in each plan for us.
+ * Must be called in a critical section that also marks the buffer dirty and,
+ * if needed, emits WAL.
*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- START_CRIT_SECTION();
-
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6746,6 +6730,29 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
htup = (HeapTupleHeader) PageGetItem(page, itemid);
heap_execute_freeze_tuple(htup, frz);
}
+}
+
+/*
+ * heap_freeze_execute_prepared
+ *
+ * Execute freezing of prepared tuples and WAL-logs the changes so that VACUUM
+ * can advance the rel's relfrozenxid later on without any risk of unsafe
+ * pg_xact lookups, even following a hard crash (or when querying from a
+ * standby). We represent freezing by setting infomask bits in tuple headers,
+ * but this shouldn't be thought of as a hint. See section on buffer access
+ * rules in src/backend/storage/buffer/README. Must be called from within a
+ * critical section.
+ */
+void
+heap_freeze_execute_prepared(Relation rel, Buffer buffer,
+ TransactionId snapshotConflictHorizon,
+ HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
+
+ Assert(ntuples > 0);
+
+ heap_freeze_prepared_tuples(buffer, tuples, ntuples);
MarkBufferDirty(buffer);
@@ -6758,7 +6765,11 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xl_heap_freeze_page xlrec;
XLogRecPtr recptr;
- /* Prepare deduplicated representation for use in WAL record */
+ /*
+ * Prepare deduplicated representation for use in WAL record
+ * Destructively sorts tuples array in-place, so caller had better be
+ * done with it.
+ */
nplans = heap_log_freeze_plan(tuples, ntuples, plans, offsets);
xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
@@ -6783,8 +6794,6 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
PageSetLSN(page, recptr);
}
-
- END_CRIT_SECTION();
}
/*
@@ -6874,7 +6883,7 @@ heap_log_freeze_new_plan(xl_heap_freeze_plan *plan, HeapTupleFreeze *frz)
* (actually there is one array per freeze plan, but that's not of immediate
* concern to our caller).
*/
-static int
+int
heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
xl_heap_freeze_plan *plans_out,
OffsetNumber *offsets_out)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index bc0a23da61b..d4356e0bce9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -627,10 +627,53 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(relation, buffer,
- frz_conflict_horizon,
- frozen, presult->nfrozen);
+ START_CRIT_SECTION();
+
+ Assert(presult->nfrozen > 0);
+
+ heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
+
+ MarkBufferDirty(buffer);
+
+ /* Now WAL-log freezing if necessary */
+ if (RelationNeedsWAL(relation))
+ {
+ xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
+ OffsetNumber offsets[MaxHeapTuplesPerPage];
+ int nplans;
+ xl_heap_freeze_page xlrec;
+ XLogRecPtr recptr;
+
+ /*
+ * Prepare deduplicated representation for use in WAL record
+ * Destructively sorts tuples array in-place.
+ */
+ nplans = heap_log_freeze_plan(frozen, presult->nfrozen, plans, offsets);
+
+ xlrec.snapshotConflictHorizon = frz_conflict_horizon;
+ xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
+ xlrec.nplans = nplans;
+
+ XLogBeginInsert();
+ XLogRegisterData((char *) &xlrec, SizeOfHeapFreezePage);
+
+ /*
+ * The freeze plan array and offset array are not actually in the
+ * buffer, but pretend that they are. When XLogInsert stores the
+ * whole buffer, the arrays need not be stored too.
+ */
+ XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBufData(0, (char *) plans,
+ nplans * sizeof(xl_heap_freeze_plan));
+ XLogRegisterBufData(0, (char *) offsets,
+ presult->nfrozen * sizeof(OffsetNumber));
+
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE_PAGE);
+
+ PageSetLSN(page, recptr);
+ }
+
+ END_CRIT_SECTION();
}
else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
{
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index dffbbd3cd3e..8a6bc071345 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -14,6 +14,7 @@
#ifndef HEAPAM_H
#define HEAPAM_H
+#include "access/heapam_xlog.h"
#include "access/relation.h" /* for backward compatibility */
#include "access/relscan.h"
#include "access/sdir.h"
@@ -320,9 +321,16 @@ extern void heap_pre_freeze_checks(Buffer buffer,
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
TransactionId snapshotConflictHorizon,
HeapTupleFreeze *tuples, int ntuples);
+
+extern void heap_freeze_prepared_tuples(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern bool heap_freeze_tuple(HeapTupleHeader tuple,
TransactionId relfrozenxid, TransactionId relminmxid,
TransactionId FreezeLimit, TransactionId MultiXactCutoff);
+
+extern int heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
+ xl_heap_freeze_plan *plans_out,
+ OffsetNumber *offsets_out);
extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
TransactionId *NoFreezePageRelfrozenXid,
--
2.40.1
v3-0011-Exit-heap_page_prune-early-if-no-prune.patchtext/x-diff; charset=us-asciiDownload
From b8a3a4b7bc76a67ba8c9d132d1efd844862bb3dc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 17:42:05 -0500
Subject: [PATCH v3 11/17] Exit heap_page_prune() early if no prune
If there is nothing to be pruned on the page, heap_page_prune() will
consider whether or not to update the page's pd_prune_xid and whether or
not to freeze the page. In this case, if we decide to freeze the page,
we will need to emit a freeze record.
Future commits will emit a combined freeze+prune record for cases in
which we are both pruning and freezing. In the no prune case, we are
done with heap_page_prune() after checking whether or not to set
pd_prune_xid. By reversing the prune and no prune cases so that the no
prune case is first, we can exit early in the no prune case. This allows
us to reduce the indentation level of the remaining code and not have to
validate whether or not we are, in fact, pruning.
Since we now exit early in the no prune case, we must set nfrozen and
all_frozen to their final values before executing pruning or freezing.
---
src/backend/access/heap/pruneheap.c | 195 ++++++++++++++++------------
1 file changed, 111 insertions(+), 84 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d4356e0bce9..d77270ad0d6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -528,80 +528,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
heap_pre_freeze_checks(buffer, frozen, presult->nfrozen);
frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
}
-
- /* Any error while applying the changes is critical */
- START_CRIT_SECTION();
-
- /* Have we found any prunable items? */
- if (do_prune)
+ else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
{
/*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
- */
- heap_page_prune_execute(buffer,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
-
- /*
- * Update the page's pd_prune_xid field to either zero, or the lowest
- * XID of any soon-prunable tuple.
- */
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
-
- /*
- * Also clear the "page is full" flag, since there's no point in
- * repeating the prune/defrag process until something else happens to
- * the page.
- */
- PageClearFull(page);
-
- MarkBufferDirty(buffer);
-
- /*
- * Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all frozen and there
+ * will be no newly frozen tuples.
*/
- if (RelationNeedsWAL(relation))
- {
- xl_heap_prune xlrec;
- XLogRecPtr recptr;
-
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
- xlrec.nredirected = prstate.nredirected;
- xlrec.ndead = prstate.ndead;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
-
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
-
- /*
- * The OffsetNumber arrays are not actually in the buffer, but we
- * pretend that they are. When XLogInsert stores the whole
- * buffer, the offset arrays need not be stored too.
- */
- if (prstate.nredirected > 0)
- XLogRegisterBufData(0, (char *) prstate.redirected,
- prstate.nredirected *
- sizeof(OffsetNumber) * 2);
-
- if (prstate.ndead > 0)
- XLogRegisterBufData(0, (char *) prstate.nowdead,
- prstate.ndead * sizeof(OffsetNumber));
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
- if (prstate.nunused > 0)
- XLogRegisterBufData(0, (char *) prstate.nowunused,
- prstate.nunused * sizeof(OffsetNumber));
+ /* Record number of newly-set-LP_DEAD items for caller */
+ presult->nnewlpdead = prstate.ndead;
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
- PageSetLSN(BufferGetPage(buffer), recptr);
- }
- }
- else
+ /* Have we found any prunable items? */
+ if (!do_prune)
{
+ /* Any error while applying the changes is critical */
+ START_CRIT_SECTION();
+
/*
* If we didn't prune anything, but have found a new value for the
* pd_prune_xid field, update it and mark the buffer dirty. This is
@@ -618,17 +565,105 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageClearFull(page);
MarkBufferDirtyHint(buffer, true);
}
+
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+
+ /*
+ * We may have decided not to opportunistically freeze above because
+ * pruning would not emit an FPI. Now, however, if checksums are
+ * enabled, setting the hint bit may have emitted an FPI. Check again
+ * if we should freeze.
+ */
+ if (!do_freeze && hint_bit_fpi)
+ do_freeze = pagefrz &&
+ (pagefrz->freeze_required ||
+ (whole_page_freezable && presult->nfrozen > 0));
+
+ if (do_freeze)
+ {
+ heap_freeze_execute_prepared(relation, buffer,
+ frz_conflict_horizon,
+ frozen, presult->nfrozen);
+ }
+ else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
+ {
+ presult->all_frozen = false;
+ presult->nfrozen = 0;
+ }
+
+ END_CRIT_SECTION();
+
+ goto update_frozenxids;
}
- END_CRIT_SECTION();
+ START_CRIT_SECTION();
- /* Record number of newly-set-LP_DEAD items for caller */
- presult->nnewlpdead = prstate.ndead;
+ /*
+ * Apply the planned item changes, then repair page fragmentation, and
+ * update the page's hint bit about whether it has free line pointers.
+ */
+ heap_page_prune_execute(buffer,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
- if (do_freeze)
+ /*
+ * Update the page's pd_prune_xid field to either zero, or the lowest XID
+ * of any soon-prunable tuple.
+ */
+ ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
+
+ /*
+ * Also clear the "page is full" flag, since there's no point in repeating
+ * the prune/defrag process until something else happens to the page.
+ */
+ PageClearFull(page);
+
+ MarkBufferDirty(buffer);
+
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
+ */
+ if (RelationNeedsWAL(relation))
{
- START_CRIT_SECTION();
+ xl_heap_prune xlrec;
+ XLogRecPtr recptr;
+
+ xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
+ xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
+ xlrec.nredirected = prstate.nredirected;
+ xlrec.ndead = prstate.ndead;
+
+ XLogBeginInsert();
+ XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
+
+ XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ /*
+ * The OffsetNumber arrays are not actually in the buffer, but we
+ * pretend that they are. When XLogInsert stores the whole buffer,
+ * the offset arrays need not be stored too.
+ */
+ if (prstate.nredirected > 0)
+ XLogRegisterBufData(0, (char *) prstate.redirected,
+ prstate.nredirected *
+ sizeof(OffsetNumber) * 2);
+
+ if (prstate.ndead > 0)
+ XLogRegisterBufData(0, (char *) prstate.nowdead,
+ prstate.ndead * sizeof(OffsetNumber));
+
+ if (prstate.nunused > 0)
+ XLogRegisterBufData(0, (char *) prstate.nowunused,
+ prstate.nunused * sizeof(OffsetNumber));
+
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+
+ PageSetLSN(BufferGetPage(buffer), recptr);
+ }
+
+ if (do_freeze)
+ {
Assert(presult->nfrozen > 0);
heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
@@ -672,20 +707,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PageSetLSN(page, recptr);
}
-
- END_CRIT_SECTION();
- }
- else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
- {
- /*
- * If we will neither freeze tuples on the page nor set the page all
- * frozen in the visibility map, the page is not all frozen and there
- * will be no newly frozen tuples.
- */
- presult->all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumentation */
}
+ END_CRIT_SECTION();
+
+update_frozenxids:
+
/* Caller won't update new_relfrozenxid and new_relminmxid */
if (!pagefrz)
return;
--
2.40.1
v3-0012-Merge-prune-and-freeze-records.patchtext/x-diff; charset=us-asciiDownload
From 4d8db3931ea86f1cb17e0687cc53e38824887cea Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 17:55:31 -0500
Subject: [PATCH v3 12/17] Merge prune and freeze records
When there are both tuples to prune and freeze on a page, emit a single,
combined prune record containing the offsets for pruning and the freeze
plans and offsets for freezing. This will reduce the number of WAL
records emitted.
---
src/backend/access/heap/heapam.c | 42 ++++++++++++--
src/backend/access/heap/pruneheap.c | 85 +++++++++++++----------------
src/include/access/heapam_xlog.h | 20 +++++--
3 files changed, 90 insertions(+), 57 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a3691584c55..a8f35eba3c9 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8803,24 +8803,28 @@ heap_xlog_prune(XLogReaderState *record)
if (action == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
- OffsetNumber *end;
OffsetNumber *redirected;
OffsetNumber *nowdead;
OffsetNumber *nowunused;
int nredirected;
int ndead;
int nunused;
+ int nplans;
Size datalen;
+ xl_heap_freeze_plan *plans;
+ OffsetNumber *frz_offsets;
+ int curoff = 0;
- redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
-
+ nplans = xlrec->nplans;
nredirected = xlrec->nredirected;
ndead = xlrec->ndead;
- end = (OffsetNumber *) ((char *) redirected + datalen);
+ nunused = xlrec->nunused;
+
+ plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, &datalen);
+ redirected = (OffsetNumber *) &plans[nplans];
nowdead = redirected + (nredirected * 2);
nowunused = nowdead + ndead;
- nunused = (end - nowunused);
- Assert(nunused >= 0);
+ frz_offsets = nowunused + nunused;
/* Update all line pointers per the record, and repair fragmentation */
heap_page_prune_execute(buffer,
@@ -8828,6 +8832,32 @@ heap_xlog_prune(XLogReaderState *record)
nowdead, ndead,
nowunused, nunused);
+ for (int p = 0; p < nplans; p++)
+ {
+ HeapTupleFreeze frz;
+
+ /*
+ * Convert freeze plan representation from WAL record into
+ * per-tuple format used by heap_execute_freeze_tuple
+ */
+ frz.xmax = plans[p].xmax;
+ frz.t_infomask2 = plans[p].t_infomask2;
+ frz.t_infomask = plans[p].t_infomask;
+ frz.frzflags = plans[p].frzflags;
+ frz.offset = InvalidOffsetNumber; /* unused, but be tidy */
+
+ for (int i = 0; i < plans[p].ntuples; i++)
+ {
+ OffsetNumber offset = frz_offsets[curoff++];
+ ItemId lp;
+ HeapTupleHeader tuple;
+
+ lp = PageGetItemId(page, offset);
+ tuple = (HeapTupleHeader) PageGetItem(page, lp);
+ heap_execute_freeze_tuple(tuple, &frz);
+ }
+ }
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d77270ad0d6..994cf75c54e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -619,6 +619,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
PageClearFull(page);
+ if (do_freeze)
+ heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
+
MarkBufferDirty(buffer);
/*
@@ -629,10 +632,37 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
+ xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
+ OffsetNumber offsets[MaxHeapTuplesPerPage];
+
xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
xlrec.nredirected = prstate.nredirected;
xlrec.ndead = prstate.ndead;
+ xlrec.nunused = prstate.nunused;
+ xlrec.nplans = 0;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions
+ * on the standby older than the youngest xmax of the most recently
+ * removed tuple this record will prune will conflict. If this record
+ * will freeze tuples, any transactions on the standby with xids older
+ * than the youngest tuple this record will freeze will conflict.
+ */
+ if (do_freeze)
+ xlrec.snapshotConflictHorizon = Max(prstate.snapshotConflictHorizon,
+ frz_conflict_horizon);
+ else
+ xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
+
+ /*
+ * Prepare deduplicated representation for use in WAL record
+ * Destructively sorts tuples array in-place.
+ */
+ if (do_freeze)
+ xlrec.nplans = heap_log_freeze_plan(frozen,
+ presult->nfrozen, plans, offsets);
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
@@ -644,6 +674,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* pretend that they are. When XLogInsert stores the whole buffer,
* the offset arrays need not be stored too.
*/
+ if (xlrec.nplans > 0)
+ XLogRegisterBufData(0, (char *) plans,
+ xlrec.nplans * sizeof(xl_heap_freeze_plan));
+
if (prstate.nredirected > 0)
XLogRegisterBufData(0, (char *) prstate.redirected,
prstate.nredirected *
@@ -657,56 +691,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
XLogRegisterBufData(0, (char *) prstate.nowunused,
prstate.nunused * sizeof(OffsetNumber));
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
-
- PageSetLSN(BufferGetPage(buffer), recptr);
- }
-
- if (do_freeze)
- {
- Assert(presult->nfrozen > 0);
-
- heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
-
- MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(relation))
- {
- xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
- OffsetNumber offsets[MaxHeapTuplesPerPage];
- int nplans;
- xl_heap_freeze_page xlrec;
- XLogRecPtr recptr;
-
- /*
- * Prepare deduplicated representation for use in WAL record
- * Destructively sorts tuples array in-place.
- */
- nplans = heap_log_freeze_plan(frozen, presult->nfrozen, plans, offsets);
-
- xlrec.snapshotConflictHorizon = frz_conflict_horizon;
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.nplans = nplans;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapFreezePage);
-
- /*
- * The freeze plan array and offset array are not actually in the
- * buffer, but pretend that they are. When XLogInsert stores the
- * whole buffer, the arrays need not be stored too.
- */
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- XLogRegisterBufData(0, (char *) plans,
- nplans * sizeof(xl_heap_freeze_plan));
+ if (xlrec.nplans > 0)
XLogRegisterBufData(0, (char *) offsets,
presult->nfrozen * sizeof(OffsetNumber));
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE_PAGE);
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
- PageSetLSN(page, recptr);
- }
+ PageSetLSN(BufferGetPage(buffer), recptr);
}
END_CRIT_SECTION();
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 6488dad5e64..22f236bb52a 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -231,23 +231,35 @@ typedef struct xl_heap_update
* during opportunistic pruning)
*
* The array of OffsetNumbers following the fixed part of the record contains:
+ * * for each freeze plan: the freeze plan
* * for each redirected item: the item offset, then the offset redirected to
* * for each now-dead item: the item offset
* * for each now-unused item: the item offset
- * The total number of OffsetNumbers is therefore 2*nredirected+ndead+nunused.
- * Note that nunused is not explicitly stored, but may be found by reference
- * to the total record length.
+ * * for each tuple frozen by the freeze plans: the offset of the item corresponding to that tuple
+ * The total number of OffsetNumbers is therefore
+ * (2*nredirected) + ndead + nunused + (sum[plan.ntuples for plan in plans])
*
* Acquires a full cleanup lock.
*/
typedef struct xl_heap_prune
{
TransactionId snapshotConflictHorizon;
+ uint16 nplans;
uint16 nredirected;
uint16 ndead;
+ uint16 nunused;
bool isCatalogRel; /* to handle recovery conflict during logical
* decoding on standby */
- /* OFFSET NUMBERS are in the block reference 0 */
+ /*
+ * OFFSET NUMBERS and freeze plans are in the block reference 0 in the
+ * following order:
+ *
+ * * xl_heap_freeze_plan plans[nplans];
+ * * OffsetNumber redirected[2 * nredirected];
+ * * OffsetNumber nowdead[ndead];
+ * * OffsetNumber nowunused[nunused];
+ * * OffsetNumber frz_offsets[...];
+ */
} xl_heap_prune;
#define SizeOfHeapPrune (offsetof(xl_heap_prune, isCatalogRel) + sizeof(bool))
--
2.40.1
v3-0013-Set-hastup-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 46dc299c2bc878799e2c56ed4c240d5c5284b986 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Jan 2024 14:53:36 -0500
Subject: [PATCH v3 13/17] Set hastup in heap_page_prune
lazy_scan_prune() loops through the line pointers and tuple visibility
information for each tuple on a page, setting hastup to true if there
are any LP_REDIRECT line pointers or tuples with storage which will not
be removed. We want to remove this extra loop from lazy_scan_prune(),
and we know about non-removable tuples during heap_page_prune() anyway.
Set hastup when recording LP_REDIRECT line pointers in
heap_prune_chain() and when LP_NORMAL line pointers refer to tuples
whose visibility status is not HEAPTUPLE_DEAD.
---
src/backend/access/heap/pruneheap.c | 33 ++++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 25 ++-------------------
src/include/access/heapam.h | 3 +++
3 files changed, 34 insertions(+), 27 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 994cf75c54e..2fee9aa509c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -70,7 +70,8 @@ static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
PruneFreezeResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum);
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneFreezeResult *presult);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
PruneFreezeResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
@@ -294,6 +295,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->nnewlpdead = 0;
presult->nfrozen = 0;
+ presult->hastup = false;
+
/*
* Keep track of whether or not the page is all_visible in case the caller
* wants to use this information to update the VM.
@@ -474,18 +477,37 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prune_prepare_freeze_tuple(page, offnum,
pagefrz, frozen, presult);
+ itemid = PageGetItemId(page, offnum);
+
+ if (ItemIdIsNormal(itemid) &&
+ presult->htsv[offnum] != HEAPTUPLE_DEAD)
+ {
+ Assert(presult->htsv[offnum] != -1);
+
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the
+ * soft assumption that any LP_DEAD items encountered here will
+ * become LP_UNUSED later on, before count_nondeletable_pages is
+ * reached. If we don't make this assumption then rel truncation
+ * will only happen every other VACUUM, at most. Besides, VACUUM
+ * must treat hastup/nonempty_pages as provisional no matter how
+ * LP_DEAD items are handled (handled here, or handled later on).
+ */
+ presult->hastup = true;
+ }
+
/* Ignore items already processed as part of an earlier chain */
if (prstate.marked[offnum])
continue;
/* Nothing to do if slot is empty */
- itemid = PageGetItemId(page, offnum);
if (!ItemIdIsUsed(itemid))
continue;
/* Process this item or chain of items */
presult->ndeleted += heap_prune_chain(buffer, offnum,
&prstate, presult);
+
}
/* Clear the offset information once we have processed the given page. */
@@ -1040,7 +1062,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (i >= nchain)
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
- heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
+ heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], presult);
}
else if (nchain < 2 && ItemIdIsRedirected(rootlp))
{
@@ -1122,7 +1144,8 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
/* Record line pointer to be redirected */
static void
heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum)
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneFreezeResult *presult)
{
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
@@ -1132,6 +1155,8 @@ heap_prune_record_redirect(PruneState *prstate,
prstate->marked[offnum] = true;
Assert(!prstate->marked[rdoffnum]);
prstate->marked[rdoffnum] = true;
+
+ presult->hastup = true;
}
/* Record line pointer to be marked dead */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6dd8d457c9c..aac38f54c0a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1447,7 +1447,6 @@ lazy_scan_prune(LVRelState *vacrel,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
- bool hastup = false;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1491,7 +1490,6 @@ lazy_scan_prune(LVRelState *vacrel,
* the VM after collecting LP_DEAD items and freezing tuples. Pruning will
* have determined whether or not the page is all_visible and able to
* become all_frozen.
- *
*/
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -1504,28 +1502,12 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- if (!ItemIdIsUsed(itemid))
- continue;
-
/* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid))
- {
- /* page makes rel truncation unsafe */
- hastup = true;
+ if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
continue;
- }
if (ItemIdIsDead(itemid))
{
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- */
deadoffsets[lpdead_items++] = offnum;
continue;
}
@@ -1593,9 +1575,6 @@ lazy_scan_prune(LVRelState *vacrel,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
-
- hastup = true; /* page makes rel truncation unsafe */
-
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1677,7 +1656,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->recently_dead_tuples += recently_dead_tuples;
/* Can't truncate this page */
- if (hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8a6bc071345..7ad46696d66 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -202,11 +202,14 @@ typedef struct PruneFreezeResult
int nnewlpdead; /* Number of newly LP_DEAD items */
bool all_visible; /* Whether or not the page is all visible */
bool all_visible_except_removable;
+ bool hastup; /* Does page make rel truncation unsafe */
+
/* Whether or not the page can be set all frozen in the VM */
bool all_frozen;
/* Number of newly frozen tuples */
int nfrozen;
+
TransactionId frz_conflict_horizon; /* Newest xmin on the page */
/*
--
2.40.1
v3-0014-Count-tuples-for-vacuum-logging-in-heap_page_prun.patchtext/x-diff; charset=us-asciiDownload
From 444ba496410ba547564f2fed1fb7bea4e8d1e020 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Jan 2024 17:25:56 -0500
Subject: [PATCH v3 14/17] Count tuples for vacuum logging in heap_page_prune
lazy_scan_prune() loops through all of the tuple visibility information
that was recorded in heap_page_prune() and then counts live and recently
dead tuples. That information is available in heap_page_prune(), so just
record it there. Add live and recently dead tuple counters to the
PruneResult. Doing this counting in heap_page_prune() eliminates the
need for saving the tuple visibility status information in the
PruneResult. Instead, save it in the PruneState where it can be
referenced by heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 110 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 77 +------------------
src/include/access/heapam.h | 29 +------
3 files changed, 99 insertions(+), 117 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2fee9aa509c..575cbcb13a3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -55,6 +55,18 @@ typedef struct
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
} PruneState;
/* Local functions */
@@ -65,7 +77,8 @@ static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
PruneState *prstate, PruneFreezeResult *presult);
-static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
+static inline HTSV_Result htsv_get_valid_status(int status);
+static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum, PruneState *prstate,
HeapPageFreeze *pagefrz, HeapTupleFreeze *frozen,
PruneFreezeResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
@@ -297,6 +310,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = false;
+ presult->live_tuples = 0;
+ presult->recently_dead_tuples = 0;
+
/*
* Keep track of whether or not the page is all_visible in case the caller
* wants to use this information to update the VM.
@@ -342,7 +358,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsNormal(itemid))
{
- presult->htsv[offnum] = -1;
+ prstate.htsv[offnum] = -1;
continue;
}
@@ -358,9 +374,30 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
- presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
- switch (presult->htsv[offnum])
+ prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
+ Assert(ItemIdIsNormal(itemid));
+
+ /*
+ * The criteria for counting a tuple as live in this block need to
+ * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
+ * and ANALYZE may produce wildly different reltuples values, e.g.
+ * when there are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as
+ * VACUUM can't run inside a transaction block, which makes some cases
+ * impossible (e.g. in-progress insert from the same transaction).
+ *
+ * We treat LP_DEAD items (which are the closest thing to DEAD tuples
+ * that might be seen here) differently, too: we assume that they'll
+ * become LP_UNUSED before VACUUM finishes. This difference is only
+ * superficial. VACUUM effectively agrees with ANALYZE about DEAD
+ * items, in the end. VACUUM won't remember LP_DEAD items, but only
+ * because they're not supposed to be left behind when it is done.
+ * (Cases where we bypass index vacuuming will violate this optimistic
+ * assumption, but the overall impact of that should be negligible.)
+ */
+ switch (prstate.htsv[offnum])
{
case HEAPTUPLE_DEAD:
@@ -380,6 +417,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
case HEAPTUPLE_LIVE:
+ /*
+ * Count it as live. Not only is this natural, but it's also
+ * what acquire_sample_rows() does.
+ */
+ presult->live_tuples++;
+
/*
* Is the tuple definitely visible to all transactions?
*
@@ -416,13 +459,34 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
+
+ /*
+ * If tuple is recently dead then we must not remove it from
+ * the relation. (We only remove items that are LP_DEAD from
+ * pruning.)
+ */
+ presult->recently_dead_tuples++;
presult->all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and
+ * we assume that will happen only after we report our
+ * results. This assumption is a bit shaky, but it is what
+ * acquire_sample_rows() does, so be consistent.
+ */
presult->all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
+
+ /*
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
+ */
+ presult->live_tuples++;
presult->all_visible = false;
break;
default:
@@ -474,15 +538,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*off_loc = offnum;
if (pagefrz)
- prune_prepare_freeze_tuple(page, offnum,
+ prune_prepare_freeze_tuple(page, offnum, &prstate,
pagefrz, frozen, presult);
itemid = PageGetItemId(page, offnum);
if (ItemIdIsNormal(itemid) &&
- presult->htsv[offnum] != HEAPTUPLE_DEAD)
+ prstate.htsv[offnum] != HEAPTUPLE_DEAD)
{
- Assert(presult->htsv[offnum] != -1);
+ Assert(prstate.htsv[offnum] != -1);
/*
* Deliberately don't set hastup for LP_DEAD items. We make the
@@ -770,10 +834,24 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
}
+/*
+ * Pruning calculates tuple visibility once and saves the results in an array
+ * of int8. See PruneState.htsv for details. This helper function is meant to
+ * guard against examining visibility status array members which have not yet
+ * been computed.
+ */
+static inline HTSV_Result
+htsv_get_valid_status(int status)
+{
+ Assert(status >= HEAPTUPLE_DEAD &&
+ status <= HEAPTUPLE_DELETE_IN_PROGRESS);
+ return (HTSV_Result) status;
+}
+
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in presult->htsv.
+ * Tuple visibility information is provided in prstate->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -824,7 +902,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(presult->htsv[rootoffnum] != -1);
+ Assert(prstate->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -847,7 +925,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -948,7 +1026,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(presult->htsv[offnum]))
+ switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
@@ -1086,7 +1164,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* want to consider freezing normal tuples which will not be removed.
*/
static void
-prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
+prune_prepare_freeze_tuple(Page page, OffsetNumber offnum, PruneState *prstate,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frozen,
PruneFreezeResult *presult)
@@ -1103,8 +1181,8 @@ prune_prepare_freeze_tuple(Page page, OffsetNumber offnum,
return;
/* We do not consider freezing tuples which will be removed. */
- if (presult->htsv[offnum] == HEAPTUPLE_DEAD ||
- presult->htsv[offnum] == -1)
+ if (prstate->htsv[offnum] == HEAPTUPLE_DEAD ||
+ prstate->htsv[offnum] == -1)
return;
htup = (HeapTupleHeader) PageGetItem(page, itemid);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index aac38f54c0a..634f4da9a17 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1442,10 +1442,8 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
+ int lpdead_items = 0;
PruneFreezeResult presult;
- int lpdead_items,
- live_tuples,
- recently_dead_tuples;
HeapPageFreeze pagefrz;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1465,9 +1463,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- lpdead_items = 0;
- live_tuples = 0;
- recently_dead_tuples = 0;
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1502,9 +1497,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- /* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
- continue;
if (ItemIdIsDead(itemid))
{
@@ -1512,69 +1504,6 @@ lazy_scan_prune(LVRelState *vacrel,
continue;
}
- Assert(ItemIdIsNormal(itemid));
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (htsv_get_valid_status(presult.htsv[offnum]))
- {
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- live_tuples++;
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- recently_dead_tuples++;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This an expected case during concurrent vacuum. Count such
- * rows as live. As above, we assume the deleting transaction
- * will commit and update the counters after we report.
- */
- live_tuples++;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1652,8 +1581,8 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
+ vacrel->live_tuples += presult.live_tuples;
+ vacrel->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 7ad46696d66..22a2494a3f8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,6 +198,8 @@ typedef struct HeapPageFreeze
*/
typedef struct PruneFreezeResult
{
+ int live_tuples;
+ int recently_dead_tuples;
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
bool all_visible; /* Whether or not the page is all visible */
@@ -212,19 +214,6 @@ typedef struct PruneFreezeResult
TransactionId frz_conflict_horizon; /* Newest xmin on the page */
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
- * details. This is of type int8[], instead of HTSV_Result[], so we can
- * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
- * items.
- *
- * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
- * 1. Otherwise every access would need to subtract 1.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
-
-
/* New value of relfrozenxid found by heap_page_prune_and_freeze() */
TransactionId new_relfrozenxid;
@@ -232,20 +221,6 @@ typedef struct PruneFreezeResult
MultiXactId new_relminmxid;
} PruneFreezeResult;
-/*
- * Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneFreezeResult.htsv for details. This helper function is
- * meant to guard against examining visibility status array members which have
- * not yet been computed.
- */
-static inline HTSV_Result
-htsv_get_valid_status(int status)
-{
- Assert(status >= HEAPTUPLE_DEAD &&
- status <= HEAPTUPLE_DELETE_IN_PROGRESS);
- return (HTSV_Result) status;
-}
-
/* ----------------
* function prototypes for heap access method
*
--
2.40.1
v3-0015-Save-dead-tuple-offsets-during-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 8a09af63ceedd3ea24ab975470c09b20383410bf Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Jan 2024 16:55:28 -0500
Subject: [PATCH v3 15/17] Save dead tuple offsets during heap_page_prune
After heap_page_prune() returned, lazy_scan_prune() looped through all
of the offsets of LP_DEAD items which it later added to
LVRelState->dead_items. Instead take care of this when marking a line
pointer or when an existing non-removable LP_DEAD item is encountered in
heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 7 ++++
src/backend/access/heap/vacuumlazy.c | 60 ++++++----------------------
src/include/access/heapam.h | 2 +
3 files changed, 22 insertions(+), 47 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 575cbcb13a3..fa628e410e6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -312,6 +312,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->live_tuples = 0;
presult->recently_dead_tuples = 0;
+ presult->lpdead_items = 0;
/*
* Keep track of whether or not the page is all_visible in case the caller
@@ -1001,7 +1002,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
+ {
presult->all_visible = false;
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
+ }
break;
}
@@ -1253,6 +1257,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
* all_visible.
*/
presult->all_visible = false;
+
+ /* Record the dead offset for vacuum */
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 634f4da9a17..4b45e8be1ad 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1439,23 +1439,11 @@ lazy_scan_prune(LVRelState *vacrel,
bool *has_lpdead_items)
{
Relation rel = vacrel->rel;
- OffsetNumber offnum,
- maxoff;
- ItemId itemid;
- int lpdead_items = 0;
PruneFreezeResult presult;
HeapPageFreeze pagefrz;
- OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
- /*
- * maxoff might be reduced following line pointer array truncation in
- * heap_page_prune_and_freeze(). That's safe for us to ignore, since the
- * reclaimed space will continue to look like LP_UNUSED items below.
- */
- maxoff = PageGetMaxOffsetNumber(page);
-
/* Initialize pagefrz */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
@@ -1468,9 +1456,9 @@ lazy_scan_prune(LVRelState *vacrel,
* Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
- * in presult.ndeleted. It should not be confused with lpdead_items;
- * lpdead_items's final value can be thought of as the number of tuples
- * that were deleted from indexes.
+ * in presult.ndeleted. It should not be confused with
+ * presult.lpdead_items; presult.lpdead_items's final value can be thought
+ * of as the number of tuples that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
@@ -1480,32 +1468,10 @@ lazy_scan_prune(LVRelState *vacrel,
&pagefrz, &presult, &vacrel->offnum);
/*
- * Now scan the page to collect LP_DEAD items and check for tuples
- * requiring freezing among remaining tuples with storage. We will update
- * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
- * have determined whether or not the page is all_visible and able to
- * become all_frozen.
+ * We will update the VM after collecting LP_DEAD items and freezing
+ * tuples. Pruning will have determined whether or not the page is
+ * all_visible.
*/
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
- {
- /*
- * Set the offset number so that we can display it along with any
- * error that occurred while processing this tuple.
- */
- vacrel->offnum = offnum;
- itemid = PageGetItemId(page, offnum);
-
-
- if (ItemIdIsDead(itemid))
- {
- deadoffsets[lpdead_items++] = offnum;
- continue;
- }
-
- }
-
vacrel->offnum = InvalidOffsetNumber;
Assert(MultiXactIdIsValid(presult.new_relminmxid));
@@ -1541,7 +1507,7 @@ lazy_scan_prune(LVRelState *vacrel,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(lpdead_items == 0);
+ Assert(presult.lpdead_items == 0);
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
@@ -1557,7 +1523,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
- if (lpdead_items > 0)
+ if (presult.lpdead_items > 0)
{
VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
@@ -1566,9 +1532,9 @@ lazy_scan_prune(LVRelState *vacrel,
ItemPointerSetBlockNumber(&tmp, blkno);
- for (int i = 0; i < lpdead_items; i++)
+ for (int i = 0; i < presult.lpdead_items; i++)
{
- ItemPointerSetOffsetNumber(&tmp, deadoffsets[i]);
+ ItemPointerSetOffsetNumber(&tmp, presult.deadoffsets[i]);
dead_items->items[dead_items->num_items++] = tmp;
}
@@ -1580,7 +1546,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += lpdead_items;
+ vacrel->lpdead_items += presult.lpdead_items;
vacrel->live_tuples += presult.live_tuples;
vacrel->recently_dead_tuples += presult.recently_dead_tuples;
@@ -1589,7 +1555,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
- *has_lpdead_items = (lpdead_items > 0);
+ *has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
@@ -1657,7 +1623,7 @@ lazy_scan_prune(LVRelState *vacrel,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (lpdead_items > 0 && PageIsAllVisible(page))
+ else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 22a2494a3f8..cc3071644c3 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -219,6 +219,8 @@ typedef struct PruneFreezeResult
/* New value of relminmxid found by heap_page_prune_and_freeze() */
MultiXactId new_relminmxid;
+ int lpdead_items; /* includes existing LP_DEAD items */
+ OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
/* ----------------
--
2.40.1
v3-0016-Obsolete-XLOG_HEAP2_FREEZE_PAGE.patchtext/x-diff; charset=us-asciiDownload
From 5871dbff01dc89c0a1dfc09db341fff8314451c0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 12 Mar 2024 19:07:38 -0400
Subject: [PATCH v3 16/17] Obsolete XLOG_HEAP2_FREEZE_PAGE
When vacuum freezes tuples, the information needed to replay these
changes is saved in the xl_heap_prune record. As such, we no longer need
to create new xl_heap_freeze records. We can get rid of
heap_freeze_execute_prepared() as well as the special case in
heap_page_prune_and_freeze() for when only freezing is done.
We must retain the xl_heap_freeze_page record and
heap_xlog_freeze_page() in order to replay old freeze records.
---
src/backend/access/heap/heapam.c | 78 ++-----------
src/backend/access/heap/pruneheap.c | 163 ++++++++++++++--------------
src/include/access/heapam.h | 7 +-
3 files changed, 88 insertions(+), 160 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a8f35eba3c9..12a1a7805f4 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6340,7 +6340,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* XIDs or MultiXactIds that will need to be processed by a future VACUUM.
*
* VACUUM caller must assemble HeapTupleFreeze freeze plan entries for every
- * tuple that we returned true for, and call heap_freeze_execute_prepared to
+ * tuple that we returned true for, and call heap_page_prune_and_freeze() to
* execute freezing. Caller must initialize pagefrz fields for page as a
* whole before first call here for each heap page.
*
@@ -6656,8 +6656,7 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
-* Perform xmin/xmax XID status sanity checks before calling
-* heap_freeze_execute_prepared().
+* Perform xmin/xmax XID status sanity checks before executing freezing.
*
* heap_prepare_freeze_tuple doesn't perform these checks directly because
* pg_xact lookups are relatively expensive. They shouldn't be repeated
@@ -6732,70 +6731,6 @@ heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
}
}
-/*
- * heap_freeze_execute_prepared
- *
- * Execute freezing of prepared tuples and WAL-logs the changes so that VACUUM
- * can advance the rel's relfrozenxid later on without any risk of unsafe
- * pg_xact lookups, even following a hard crash (or when querying from a
- * standby). We represent freezing by setting infomask bits in tuple headers,
- * but this shouldn't be thought of as a hint. See section on buffer access
- * rules in src/backend/storage/buffer/README. Must be called from within a
- * critical section.
- */
-void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
-{
- Page page = BufferGetPage(buffer);
-
- Assert(ntuples > 0);
-
- heap_freeze_prepared_tuples(buffer, tuples, ntuples);
-
- MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(rel))
- {
- xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
- OffsetNumber offsets[MaxHeapTuplesPerPage];
- int nplans;
- xl_heap_freeze_page xlrec;
- XLogRecPtr recptr;
-
- /*
- * Prepare deduplicated representation for use in WAL record
- * Destructively sorts tuples array in-place, so caller had better be
- * done with it.
- */
- nplans = heap_log_freeze_plan(tuples, ntuples, plans, offsets);
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(rel);
- xlrec.nplans = nplans;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapFreezePage);
-
- /*
- * The freeze plan array and offset array are not actually in the
- * buffer, but pretend that they are. When XLogInsert stores the
- * whole buffer, the arrays need not be stored too.
- */
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- XLogRegisterBufData(0, (char *) plans,
- nplans * sizeof(xl_heap_freeze_plan));
- XLogRegisterBufData(0, (char *) offsets,
- ntuples * sizeof(OffsetNumber));
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE_PAGE);
-
- PageSetLSN(page, recptr);
- }
-}
-
/*
* Comparator used to deduplicate XLOG_HEAP2_FREEZE_PAGE freeze plans
*/
@@ -8827,10 +8762,11 @@ heap_xlog_prune(XLogReaderState *record)
frz_offsets = nowunused + nunused;
/* Update all line pointers per the record, and repair fragmentation */
- heap_page_prune_execute(buffer,
- redirected, nredirected,
- nowdead, ndead,
- nowunused, nunused);
+ if (nredirected > 0 || ndead > 0 || nunused > 0)
+ heap_page_prune_execute(buffer,
+ redirected, nredirected,
+ nowdead, ndead,
+ nowunused, nunused);
for (int p = 0; p < nplans; p++)
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fa628e410e6..6f45e5c37f0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -270,6 +270,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
+ bool do_hint;
bool whole_page_freezable;
bool hint_bit_fpi;
bool prune_fpi = false;
@@ -583,6 +584,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /* Record number of newly-set-LP_DEAD items for caller */
+ presult->nnewlpdead = prstate.ndead;
+
/*
* Only incur overhead of checking if we will do an FPI if we might use
* the information.
@@ -590,7 +594,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_prune && pagefrz)
prune_fpi = XLogCheckBufferNeedsBackup(buffer);
- /* Is the whole page freezable? And is there something to freeze */
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
+
+ /* Is the whole page freezable? And is there something to freeze? */
whole_page_freezable = presult->all_visible_except_removable &&
presult->all_frozen;
@@ -605,55 +617,63 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* opportunistic freeze heuristic must be improved; however, for now, try
* to approximate it.
*/
-
do_freeze = pagefrz &&
(pagefrz->freeze_required ||
(whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
- if (do_freeze)
- {
+ /*
+ * Validate the tuples we are considering freezing. We do this even if
+ * pruning and hint bit setting have not emitted an FPI so far because we
+ * still may emit an FPI while setting the page hint bit later. But we
+ * want to avoid doing the pre-freeze checks in a critical section.
+ */
+ if (pagefrz &&
+ (pagefrz->freeze_required ||
+ (whole_page_freezable && presult->nfrozen > 0)))
heap_pre_freeze_checks(buffer, frozen, presult->nfrozen);
- frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
- }
- else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
- {
- /*
- * If we will neither freeze tuples on the page nor set the page all
- * frozen in the visibility map, the page is not all frozen and there
- * will be no newly frozen tuples.
- */
- presult->all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- /* Record number of newly-set-LP_DEAD items for caller */
- presult->nnewlpdead = prstate.ndead;
+ /*
+ * If we are going to modify the page contents anyway, we will have to
+ * update more than hint bits.
+ */
+ if (do_freeze || do_prune)
+ do_hint = false;
+ START_CRIT_SECTION();
- /* Have we found any prunable items? */
- if (!do_prune)
- {
- /* Any error while applying the changes is critical */
- START_CRIT_SECTION();
+ /*
+ * Update the page's pd_prune_xid field to either zero, or the lowest XID
+ * of any soon-prunable tuple.
+ */
+ ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
- /*
- * If we didn't prune anything, but have found a new value for the
- * pd_prune_xid field, update it and mark the buffer dirty. This is
- * treated as a non-WAL-logged hint.
- *
- * Also clear the "page is full" flag if it is set, since there's no
- * point in repeating the prune/defrag process until something else
- * happens to the page.
- */
- if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page))
- {
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
- PageClearFull(page);
- MarkBufferDirtyHint(buffer, true);
- }
+ /*
+ * If pruning, freezing, or updating the hint bit, clear the "page is
+ * full" flag if it is set since there's no point in repeating the
+ * prune/defrag process until something else happens to the page.
+ */
+ if (do_prune || do_freeze || do_hint)
+ PageClearFull(page);
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ /*
+ * Apply the planned item changes, then repair page fragmentation, and
+ * update the page's hint bit about whether it has free line pointers.
+ */
+ if (do_prune)
+ {
+ heap_page_prune_execute(buffer,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
+ }
+
+ /*
+ * If we aren't pruning or freezing anything, but we updated pd_prune_xid,
+ * this is a non-WAL-logged hint.
+ */
+ if (do_hint)
+ {
+ MarkBufferDirtyHint(buffer, true);
/*
* We may have decided not to opportunistically freeze above because
@@ -661,60 +681,37 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* enabled, setting the hint bit may have emitted an FPI. Check again
* if we should freeze.
*/
- if (!do_freeze && hint_bit_fpi)
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+
+ if (hint_bit_fpi)
do_freeze = pagefrz &&
(pagefrz->freeze_required ||
(whole_page_freezable && presult->nfrozen > 0));
-
- if (do_freeze)
- {
- heap_freeze_execute_prepared(relation, buffer,
- frz_conflict_horizon,
- frozen, presult->nfrozen);
- }
- else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
- {
- presult->all_frozen = false;
- presult->nfrozen = 0;
- }
-
- END_CRIT_SECTION();
-
- goto update_frozenxids;
}
- START_CRIT_SECTION();
-
- /*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
- */
- heap_page_prune_execute(buffer,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
-
- /*
- * Update the page's pd_prune_xid field to either zero, or the lowest XID
- * of any soon-prunable tuple.
- */
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
-
- /*
- * Also clear the "page is full" flag, since there's no point in repeating
- * the prune/defrag process until something else happens to the page.
- */
- PageClearFull(page);
-
if (do_freeze)
+ {
+ frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
+ }
+ else if ((!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
+ {
+ /*
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all-frozen and there
+ * will be no newly frozen tuples.
+ */
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
- MarkBufferDirty(buffer);
+ if (do_prune || do_freeze)
+ MarkBufferDirty(buffer);
/*
* Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
*/
- if (RelationNeedsWAL(relation))
+ if ((do_prune || do_freeze) && RelationNeedsWAL(relation))
{
xl_heap_prune xlrec;
XLogRecPtr recptr;
@@ -789,8 +786,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
-update_frozenxids:
-
/* Caller won't update new_relfrozenxid and new_relminmxid */
if (!pagefrz)
return;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index cc3071644c3..c36623f53bd 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -102,7 +102,7 @@ typedef enum
} HTSV_Result;
/*
- * heap_prepare_freeze_tuple may request that heap_freeze_execute_prepared
+ * heap_prepare_freeze_tuple may request that the heap_page_prune_and_freeze()
* check any tuple's to-be-frozen xmin and/or xmax status using pg_xact
*/
#define HEAP_FREEZE_CHECK_XMIN_COMMITTED 0x01
@@ -155,7 +155,7 @@ typedef struct HeapPageFreeze
/*
* "Freeze" NewRelfrozenXid/NewRelminMxid trackers.
*
- * Trackers used when heap_freeze_execute_prepared freezes, or when there
+ * Trackers used when heap_page_prune_and_freeze() freezes, or when there
* are zero freeze plans for a page. It is always valid for vacuumlazy.c
* to freeze any page, by definition. This even includes pages that have
* no tuples with storage to consider in the first place. That way the
@@ -298,9 +298,6 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
extern void heap_pre_freeze_checks(Buffer buffer,
HeapTupleFreeze *tuples, int ntuples);
-extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples);
extern void heap_freeze_prepared_tuples(Buffer buffer,
HeapTupleFreeze *tuples, int ntuples);
--
2.40.1
v3-0017-Streamline-XLOG_HEAP2_PRUNE-record.patchtext/x-diff; charset=us-asciiDownload
From 1dddbaea7de88911be06bae4ecfe47119c6812a1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 13 Mar 2024 00:28:57 -0400
Subject: [PATCH v3 17/17] Streamline XLOG_HEAP2_PRUNE record
xl_heap_prune struct for the XLOG_HEAP2_PRUNE record type had members
for counting the number of freeze plans and number of redirected, dead,
and newly unused line pointers. However, only some of those are used in
many XLOG_HEAP2_PRUNE records. As part of a refactor to use
XLOG_HEAP2_PRUNE record types instead of XLOG_HEAP2_FREEZE_PAGE records
when only freezing is being done, eliminate those members and instead
use flags to indicate which of those types of modifications will be
done. The resulting record will contain only data about modifications
that must be done.
ci-os-only:
---
src/backend/access/heap/heapam.c | 121 ++++++++++++++++++-----
src/backend/access/heap/pruneheap.c | 86 ++++++++++++----
src/backend/access/rmgrdesc/heapdesc.c | 130 +++++++++++++++++++------
src/include/access/heapam_xlog.h | 122 ++++++++++++++---------
src/tools/pgindent/typedefs.list | 2 +
5 files changed, 337 insertions(+), 124 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 12a1a7805f4..11aa176b6c3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8703,10 +8703,73 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
return key_tuple;
}
+/*
+ * Given a MAXALIGNed buffer returned by XLogRecGetBlockData() and pointed to
+ * by cursor and any xl_heap_prune flags, deserialize the arrays of
+ * OffsetNumbers contained in an xl_heap_prune record.
+ */
+static void
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+ int *nredirected, OffsetNumber **redirected,
+ int *ndead, OffsetNumber **nowdead,
+ int *nunused, OffsetNumber **nowunused,
+ int *nplans, xl_heap_freeze_plan **plans,
+ OffsetNumber **frz_offsets)
+{
+ if (flags & XLHP_HAS_FREEZE_PLANS)
+ {
+ xlhp_freeze *freeze = (xlhp_freeze *) cursor;
+
+ *nplans = freeze->nplans;
+ Assert(*nplans > 0);
+ *plans = freeze->plans;
+
+ cursor += offsetof(xlhp_freeze, plans);
+ cursor += sizeof(xl_heap_freeze_plan) * freeze->nplans;
+ }
+
+ if (flags & XLHP_HAS_REDIRECTIONS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ *nredirected = subrecord->ntargets;
+ Assert(nredirected > 0);
+ *redirected = &subrecord->data[0];
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber[2]) * *nredirected;
+ }
+
+ if (flags & XLHP_HAS_DEAD_ITEMS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ *ndead = subrecord->ntargets;
+ Assert(ndead > 0);
+ *nowdead = subrecord->data;
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber) * *ndead;
+ }
+
+ if (flags & XLHP_HAS_NOW_UNUSED_ITEMS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ *nunused = subrecord->ntargets;
+ Assert(nunused > 0);
+ *nowunused = subrecord->data;
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber) * *nunused;
+ }
+
+ if (nplans > 0)
+ *frz_offsets = (OffsetNumber *) cursor;
+}
+
/*
* Handles XLOG_HEAP2_PRUNE record type.
- *
- * Acquires a full cleanup lock.
*/
static void
heap_xlog_prune(XLogReaderState *record)
@@ -8717,49 +8780,54 @@ heap_xlog_prune(XLogReaderState *record)
RelFileLocator rlocator;
BlockNumber blkno;
XLogRedoAction action;
+ bool get_cleanup_lock;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
+ /*
+ * If there are dead, redirected, or unused items,
+ * heap_page_prune_execute() will call PageRepairFragementation() which
+ * expects a full cleanup lock.
+ */
+ get_cleanup_lock = xlrec->flags & XLHP_HAS_REDIRECTIONS ||
+ xlrec->flags & XLHP_HAS_DEAD_ITEMS ||
+ xlrec->flags & XLHP_HAS_NOW_UNUSED_ITEMS;
+
/*
* We're about to remove tuples. In Hot Standby mode, ensure that there's
* no queries running for which the removed tuples are still visible.
*/
if (InHotStandby)
ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->isCatalogRel,
+ xlrec->flags & XLHP_IS_CATALOG_REL,
rlocator);
/*
- * If we have a full-page image, restore it (using a cleanup lock) and
- * we're done.
+ * If we have a full-page image, restore it and we're done.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true,
- &buffer);
+ action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ get_cleanup_lock, &buffer);
+
if (action == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
- OffsetNumber *redirected;
- OffsetNumber *nowdead;
- OffsetNumber *nowunused;
- int nredirected;
- int ndead;
- int nunused;
- int nplans;
Size datalen;
- xl_heap_freeze_plan *plans;
- OffsetNumber *frz_offsets;
+ OffsetNumber *redirected = NULL;
+ OffsetNumber *nowdead = NULL;
+ OffsetNumber *nowunused = NULL;
+ int nredirected = 0;
+ int ndead = 0;
+ int nunused = 0;
+ int nplans = 0;
+ xl_heap_freeze_plan *plans = NULL;
+ OffsetNumber *frz_offsets = NULL;
int curoff = 0;
- nplans = xlrec->nplans;
- nredirected = xlrec->nredirected;
- ndead = xlrec->ndead;
- nunused = xlrec->nunused;
+ char *cursor = XLogRecGetBlockData(record, 0, &datalen);
- plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, &datalen);
- redirected = (OffsetNumber *) &plans[nplans];
- nowdead = redirected + (nredirected * 2);
- nowunused = nowdead + ndead;
- frz_offsets = nowunused + nunused;
+ heap_xlog_deserialize_prune_and_freeze(cursor, xlrec->flags,
+ &nredirected, &redirected, &ndead, &nowdead,
+ &nunused, &nowunused, &nplans, &plans, &frz_offsets);
/* Update all line pointers per the record, and repair fragmentation */
if (nredirected > 0 || ndead > 0 || nunused > 0)
@@ -8798,7 +8866,6 @@ heap_xlog_prune(XLogReaderState *record)
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
-
PageSetLSN(page, lsn);
MarkBufferDirty(buffer);
}
@@ -8810,7 +8877,7 @@ heap_xlog_prune(XLogReaderState *record)
UnlockReleaseBuffer(buffer);
/*
- * After pruning records from a page, it's useful to update the FSM
+ * After modifying records on a page, it's useful to update the FSM
* about it, as it may cause the page become target for insertions
* later even if vacuum decides not to visit it (which is possible if
* gets marked all-visible.)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6f45e5c37f0..d3643b1ecc6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -715,15 +715,19 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
{
xl_heap_prune xlrec;
XLogRecPtr recptr;
+ xlhp_freeze freeze;
+ xlhp_prune_items redirect,
+ dead,
+ unused;
+ int nplans = 0;
xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
- OffsetNumber offsets[MaxHeapTuplesPerPage];
+ OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.nredirected = prstate.nredirected;
- xlrec.ndead = prstate.ndead;
- xlrec.nunused = prstate.nunused;
- xlrec.nplans = 0;
+ xlrec.flags = 0;
+
+ if (RelationIsAccessibleInLogicalDecoding(relation))
+ xlrec.flags |= XLHP_IS_CATALOG_REL;
/*
* The snapshotConflictHorizon for the whole record should be the most
@@ -745,8 +749,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Destructively sorts tuples array in-place.
*/
if (do_freeze)
- xlrec.nplans = heap_log_freeze_plan(frozen,
- presult->nfrozen, plans, offsets);
+ nplans = heap_log_freeze_plan(frozen,
+ presult->nfrozen, plans,
+ frz_offsets);
+ if (nplans > 0)
+ xlrec.flags |= XLHP_HAS_FREEZE_PLANS;
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
@@ -758,26 +765,71 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* pretend that they are. When XLogInsert stores the whole buffer,
* the offset arrays need not be stored too.
*/
- if (xlrec.nplans > 0)
+ if (nplans > 0)
+ {
+ freeze = (xlhp_freeze)
+ {
+ .nplans = nplans
+ };
+
+ XLogRegisterBufData(0, (char *) &freeze, offsetof(xlhp_freeze, plans));
+
XLogRegisterBufData(0, (char *) plans,
- xlrec.nplans * sizeof(xl_heap_freeze_plan));
+ sizeof(xl_heap_freeze_plan) * freeze.nplans);
+ }
+
if (prstate.nredirected > 0)
+ {
+ xlrec.flags |= XLHP_HAS_REDIRECTIONS;
+
+ redirect = (xlhp_prune_items)
+ {
+ .ntargets = prstate.nredirected
+ };
+
+ XLogRegisterBufData(0, (char *) &redirect,
+ offsetof(xlhp_prune_items, data));
+
XLogRegisterBufData(0, (char *) prstate.redirected,
- prstate.nredirected *
- sizeof(OffsetNumber) * 2);
+ sizeof(OffsetNumber[2]) * prstate.nredirected);
+ }
if (prstate.ndead > 0)
+ {
+ xlrec.flags |= XLHP_HAS_DEAD_ITEMS;
+
+ dead = (xlhp_prune_items)
+ {
+ .ntargets = prstate.ndead
+ };
+
+ XLogRegisterBufData(0, (char *) &dead,
+ offsetof(xlhp_prune_items, data));
+
XLogRegisterBufData(0, (char *) prstate.nowdead,
- prstate.ndead * sizeof(OffsetNumber));
+ sizeof(OffsetNumber) * dead.ntargets);
+ }
if (prstate.nunused > 0)
+ {
+ xlrec.flags |= XLHP_HAS_NOW_UNUSED_ITEMS;
+
+ unused = (xlhp_prune_items)
+ {
+ .ntargets = prstate.nunused
+ };
+
+ XLogRegisterBufData(0, (char *) &unused,
+ offsetof(xlhp_prune_items, data));
+
XLogRegisterBufData(0, (char *) prstate.nowunused,
- prstate.nunused * sizeof(OffsetNumber));
+ sizeof(OffsetNumber) * unused.ntargets);
+ }
- if (xlrec.nplans > 0)
- XLogRegisterBufData(0, (char *) offsets,
- presult->nfrozen * sizeof(OffsetNumber));
+ if (nplans > 0)
+ XLogRegisterBufData(0, (char *) frz_offsets,
+ sizeof(OffsetNumber) * presult->nfrozen);
recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 36a3d83c8c2..462b0d74f80 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -179,43 +179,109 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
xl_heap_prune *xlrec = (xl_heap_prune *) rec;
- appendStringInfo(buf, "snapshotConflictHorizon: %u, nredirected: %u, ndead: %u, isCatalogRel: %c",
+ appendStringInfo(buf, "snapshotConflictHorizon: %u, isCatalogRel: %c",
xlrec->snapshotConflictHorizon,
- xlrec->nredirected,
- xlrec->ndead,
- xlrec->isCatalogRel ? 'T' : 'F');
+ xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
if (XLogRecHasBlockData(record, 0))
{
- OffsetNumber *end;
- OffsetNumber *redirected;
- OffsetNumber *nowdead;
- OffsetNumber *nowunused;
- int nredirected;
- int nunused;
Size datalen;
-
- redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0,
- &datalen);
-
- nredirected = xlrec->nredirected;
- end = (OffsetNumber *) ((char *) redirected + datalen);
- nowdead = redirected + (nredirected * 2);
- nowunused = nowdead + xlrec->ndead;
- nunused = (end - nowunused);
- Assert(nunused >= 0);
-
- appendStringInfo(buf, ", nunused: %d", nunused);
-
- appendStringInfoString(buf, ", redirected:");
- array_desc(buf, redirected, sizeof(OffsetNumber) * 2,
- nredirected, &redirect_elem_desc, NULL);
- appendStringInfoString(buf, ", dead:");
- array_desc(buf, nowdead, sizeof(OffsetNumber), xlrec->ndead,
- &offset_elem_desc, NULL);
- appendStringInfoString(buf, ", unused:");
- array_desc(buf, nowunused, sizeof(OffsetNumber), nunused,
- &offset_elem_desc, NULL);
+ OffsetNumber *redirected = NULL;
+ OffsetNumber *nowdead = NULL;
+ OffsetNumber *nowunused = NULL;
+ int nredirected = 0;
+ int nunused = 0;
+ int ndead = 0;
+ int nplans = 0;
+ xl_heap_freeze_plan *plans = NULL;
+ OffsetNumber *frz_offsets;
+
+ char *cursor = XLogRecGetBlockData(record, 0, &datalen);
+
+ if (xlrec->flags & XLHP_HAS_FREEZE_PLANS)
+ {
+ xlhp_freeze *freeze = (xlhp_freeze *) cursor;
+
+ nplans = freeze->nplans;
+ Assert(nplans > 0);
+ plans = freeze->plans;
+
+ cursor += offsetof(xlhp_freeze, plans);
+ cursor += sizeof(xl_heap_freeze_plan) * freeze->nplans;
+ }
+
+ if (xlrec->flags & XLHP_HAS_REDIRECTIONS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ nredirected = subrecord->ntargets;
+ Assert(nredirected > 0);
+ redirected = &subrecord->data[0];
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber[2]) * nredirected;
+ }
+
+ if (xlrec->flags & XLHP_HAS_DEAD_ITEMS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ ndead = subrecord->ntargets;
+ Assert(ndead > 0);
+ nowdead = subrecord->data;
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber) * ndead;
+ }
+
+ if (xlrec->flags & XLHP_HAS_NOW_UNUSED_ITEMS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ nunused = subrecord->ntargets;
+ Assert(nunused > 0);
+ nowunused = subrecord->data;
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber) * nunused;
+ }
+
+ if (nplans > 0)
+ frz_offsets = (OffsetNumber *) cursor;
+
+ appendStringInfo(buf, ", nredirected: %u, ndead: %u, nunused: %u, nplans: %u,",
+ nredirected,
+ ndead,
+ nunused,
+ nplans);
+
+ if (nredirected > 0)
+ {
+ appendStringInfoString(buf, ", redirected:");
+ array_desc(buf, redirected, sizeof(OffsetNumber) * 2,
+ nredirected, &redirect_elem_desc, NULL);
+ }
+
+ if (ndead > 0)
+ {
+ appendStringInfoString(buf, ", dead:");
+ array_desc(buf, nowdead, sizeof(OffsetNumber), ndead,
+ &offset_elem_desc, NULL);
+ }
+
+ if (nunused > 0)
+ {
+ appendStringInfoString(buf, ", unused:");
+ array_desc(buf, nowunused, sizeof(OffsetNumber), nunused,
+ &offset_elem_desc, NULL);
+ }
+
+ if (nplans > 0)
+ {
+ appendStringInfoString(buf, ", plans:");
+ array_desc(buf, plans, sizeof(xl_heap_freeze_plan), nplans,
+ &plan_elem_desc, &frz_offsets);
+ }
}
}
else if (info == XLOG_HEAP2_VACUUM)
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 22f236bb52a..bebd93422d5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -227,42 +227,84 @@ typedef struct xl_heap_update
#define SizeOfHeapUpdate (offsetof(xl_heap_update, new_offnum) + sizeof(OffsetNumber))
/*
- * This is what we need to know about page pruning (both during VACUUM and
- * during opportunistic pruning)
+ * XXX: As of Postgres 17, XLOG_HEAP2_PRUNE records replace
+ * XLOG_HEAP2_FREEZE_PAGE record types
+ */
+
+/*
+ * This is what we need to know about page pruning and freezing, both during
+ * VACUUM and during opportunistic pruning.
*
- * The array of OffsetNumbers following the fixed part of the record contains:
- * * for each freeze plan: the freeze plan
- * * for each redirected item: the item offset, then the offset redirected to
- * * for each now-dead item: the item offset
- * * for each now-unused item: the item offset
- * * for each tuple frozen by the freeze plans: the offset of the item corresponding to that tuple
- * The total number of OffsetNumbers is therefore
- * (2*nredirected) + ndead + nunused + (sum[plan.ntuples for plan in plans])
+ * If XLPH_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, or XLHP_HAS_NOW_UNUSED is set,
+ * acquires a full cleanup lock. Otherwise an ordinary exclusive lock is
+ * enough. This can happen if freezing was the only modification to the page.
*
- * Acquires a full cleanup lock.
+ * The data for block reference 0 contains "sub-records" depending on which
+ * of the XLHP_HAS_* flags are set. See xlhp_* struct definitions below.
+ *
+ * The layout is in the same order as the XLHP_* flags.
*/
typedef struct xl_heap_prune
{
TransactionId snapshotConflictHorizon;
- uint16 nplans;
- uint16 nredirected;
- uint16 ndead;
- uint16 nunused;
- bool isCatalogRel; /* to handle recovery conflict during logical
- * decoding on standby */
- /*
- * OFFSET NUMBERS and freeze plans are in the block reference 0 in the
- * following order:
- *
- * * xl_heap_freeze_plan plans[nplans];
- * * OffsetNumber redirected[2 * nredirected];
- * * OffsetNumber nowdead[ndead];
- * * OffsetNumber nowunused[nunused];
- * * OffsetNumber frz_offsets[...];
- */
+ uint8 flags;
} xl_heap_prune;
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, isCatalogRel) + sizeof(bool))
+#define XLHP_IS_CATALOG_REL 0x01 /* to handle recovery conflict
+ * during logical decoding on
+ * standby */
+#define XLHP_HAS_FREEZE_PLANS 0x02
+#define XLHP_HAS_REDIRECTIONS 0x04
+#define XLHP_HAS_DEAD_ITEMS 0x08
+#define XLHP_HAS_NOW_UNUSED_ITEMS 0x10
+
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+
+/*
+ * This struct represents a 'freeze plan', which describes how to freeze a
+ * group of one or more heap tuples (appears in xl_heap_freeze_page and
+ * xl_heap_prune's xlhp_freeze records)
+ */
+/* 0x01 was XLH_FREEZE_XMIN */
+#define XLH_FREEZE_XVAC 0x02
+#define XLH_INVALID_XVAC 0x04
+
+typedef struct xl_heap_freeze_plan
+{
+ TransactionId xmax;
+ uint16 t_infomask2;
+ uint16 t_infomask;
+ uint8 frzflags;
+
+ /* Length of individual page offset numbers array for this plan */
+ uint16 ntuples;
+} xl_heap_freeze_plan;
+
+/*
+ * This is what we need to know about a block being frozen during vacuum
+ *
+ * Backup block 0's data contains an array of xl_heap_freeze_plan structs
+ * (with nplans elements), followed by one or more page offset number arrays.
+ * Each such page offset number array corresponds to a single freeze plan
+ * (REDO routine freezes corresponding heap tuples using freeze plan).
+ */
+typedef struct xlhp_freeze
+{
+ uint16 nplans;
+ xl_heap_freeze_plan plans[FLEXIBLE_ARRAY_MEMBER];
+} xlhp_freeze;
+
+/*
+ * Sub-record type contained in block reference 0 of a prune record if
+ * XLHP_HAS_REDIRECTIONS/XLHP_HAS_DEAD_ITEMS/XLHP_HAS_NOW_UNUSED_ITEMS is set.
+ * Note that in the XLHP_HAS_REDIRECTIONS variant, there are actually 2 *
+ * length number of OffsetNumbers in the data.
+ */
+typedef struct xlhp_prune_items
+{
+ uint16 ntargets;
+ OffsetNumber data[FLEXIBLE_ARRAY_MEMBER];
+} xlhp_prune_items;
/*
* The vacuum page record is similar to the prune record, but can only mark
@@ -326,26 +368,6 @@ typedef struct xl_heap_inplace
} xl_heap_inplace;
#define SizeOfHeapInplace (offsetof(xl_heap_inplace, offnum) + sizeof(OffsetNumber))
-
-/*
- * This struct represents a 'freeze plan', which describes how to freeze a
- * group of one or more heap tuples (appears in xl_heap_freeze_page record)
- */
-/* 0x01 was XLH_FREEZE_XMIN */
-#define XLH_FREEZE_XVAC 0x02
-#define XLH_INVALID_XVAC 0x04
-
-typedef struct xl_heap_freeze_plan
-{
- TransactionId xmax;
- uint16 t_infomask2;
- uint16 t_infomask;
- uint8 frzflags;
-
- /* Length of individual page offset numbers array for this plan */
- uint16 ntuples;
-} xl_heap_freeze_plan;
-
/*
* This is what we need to know about a block being frozen during vacuum
*
@@ -353,6 +375,10 @@ typedef struct xl_heap_freeze_plan
* (with nplans elements), followed by one or more page offset number arrays.
* Each such page offset number array corresponds to a single freeze plan
* (REDO routine freezes corresponding heap tuples using freeze plan).
+ *
+ * This is for backwards compatability for reading individual freeze records.
+ * As of Postgres 17, xl_heap_freeze_plan records occur in xl_heap_prune
+ * records.
*/
typedef struct xl_heap_freeze_page
{
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1c1a4d305d6..2702f211d90 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4002,6 +4002,8 @@ xl_xact_stats_items
xl_xact_subxacts
xl_xact_twophase
xl_xact_xinfo
+xlhp_freeze
+xlhp_prune_items
xmlBuffer
xmlBufferPtr
xmlChar
--
2.40.1
On 15/03/2024 02:56, Melanie Plageman wrote:
Okay, so I was going to start using xl_heap_prune for vacuum here too,
but I realized it would be bigger because of the
snapshotConflictHorizon. Do you think there is a non-terrible way to
make the snapshotConflictHorizon optional? Like with a flag?
Yeah, another flag would do the trick.
I introduced a few sub-record types similar to what you suggested --
they help a bit with alignment, so I think they are worth keeping. There
are comments around them, but perhaps a larger diagram of the layout of
a the new XLOG_HEAP2_PRUNE record would be helpful.I started doing this, but I can't find a way of laying out the diagram
that pgindent doesn't destroy. I thought I remember other diagrams in
the source code showing the layout of something (something with pages
somewhere?) that don't get messed up by pgindent, but I couldn't find
them.
See src/tools/pgindent/README, section "Cleaning up in case of failure
or ugly output":
/*----------
* Text here will not be touched by pgindent.
*/
There is a bit of duplicated code between heap_xlog_prune() and
heap2_desc() since they both need to deserialize the record. Before the
code to do this was small and it didn't matter, but it might be worth
refactoring it that way now.I have added a helper function to do the deserialization,
heap_xlog_deserialize_prune_and_freeze(). But I didn't start using it in
heap2_desc() because of the way the pg_waldump build file works. Do you
think the helper belongs in any of waldump's existing sources?pg_waldump_sources = files(
'compat.c',
'pg_waldump.c',
'rmgrdesc.c',
)pg_waldump_sources += rmgr_desc_sources
pg_waldump_sources += xlogreader_sources
pg_waldump_sources += files('../../backend/access/transam/xlogstats.c')Otherwise, I assume I am suppose to avoid adding some big new include to
waldump.
Didn't look closely at that, but there's some precedent with
commit/prepare/abort records. See ParseCommitRecord, xl_xact_commit,
xl_parsed_commit et al.
Note that we don't provide WAL compatibility across major versions. You
can fully remove the old xl_heap_freeze_page format. (We should bump
XLOG_PAGE_MAGIC when this is committed though)
On the point of removing the freeze-only code path from
heap_page_prune() (now heap_page_prune_and_freeze()): while doing this,
I realized that heap_pre_freeze_checks() was not being called in the
case that we decide to freeze because we emitted an FPI while setting
the hint bit. I've fixed that, however, I've done so by moving
heap_pre_freeze_checks() into the critical section. I think that is not
okay? I could move it earlier and not do call it when the hint bit FPI
leads us to freeze tuples. But, I think that would lead to us doing a
lot less validation of tuples being frozen when checksums are enabled.
Or, I could make two critical sections?I found another approach and just do the pre-freeze checks if we are
considering freezing except for the FPI criteria.
Hmm, I think you can make this simpler if you use
XLogCheckBufferNeedsBackup() to check if the hint-bit update will
generate a full-page-image before entering the critical section. Like
you did to check if pruning will generate a full-page-image. I included
that change in the attached patches.
I don't fully understand this:
/*
* If we will freeze tuples on the page or they were all already frozen
* on the page, if we will set the page all-frozen in the visibility map,
* we can advance relfrozenxid and relminmxid to the values in
* pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
*/
if (presult->all_frozen || presult->nfrozen > 0)
{
presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
}
else
{
presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
}
Firstly, the comment is outdated, because we have already done the
freezing at this point. But more importantly, I don't understand the
logic here. Could we check just presult->nfrozen > 0 here and get the
same result?
I think it is probably worse to add both of them as additional optional
arguments, so I've just left lazy_scan_prune() with the job of
initializing them.
Ok.
Here are some patches on top of your patches for some further
refactorings. Some notable changes in heap_page_prune_and_freeze():
- I moved the heap_prepare_freeze_tuple() work from the 2nd loop to the
1st one. It seems more clear and efficient that way.
- I extracted the code to generate the WAL record to a separate function.
- Refactored the "will setting hint bit generate FPI" check as discussed
above
These patches are in a very rough WIP state, but I wanted to share
early. I haven't done much testing, and I'm not wedded to these changes,
but I think they make it more readable. Please include / squash in the
patch set if you agree with them.
Please also take a look at the comments I marked with HEIKKI or FIXME,
in the patches and commit messages.
I'll wait for a new version from you before reviewing more.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
v3heikki-0001-Inline-heap_frz_conflict_horizon-to-the-cal.patchtext/x-patch; charset=UTF-8; name=v3heikki-0001-Inline-heap_frz_conflict_horizon-to-the-cal.patchDownload
From 622620a7875ae8c1626e9cd118156e0c734d44ed Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun, 17 Mar 2024 22:52:28 +0200
Subject: [PATCH v3heikki 1/4] Inline heap_frz_conflict_horizon() to the
caller.
FIXME: This frz_conflict_horizon business looks fishy to me. We have:
- local frz_conflict_horizon variable,
- presult->frz_conflict_horizon, and
- prstate.snapshotConflictHorizon
should we really have all three, and what are the differences?
---
src/backend/access/heap/pruneheap.c | 17 ++++++++++++++---
src/backend/access/heap/vacuumlazy.c | 27 ---------------------------
src/include/access/heapam.h | 2 --
3 files changed, 14 insertions(+), 32 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d3643b1ecc6..f4f5468e144 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "commands/vacuum.h"
#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
@@ -275,7 +276,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool hint_bit_fpi;
bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
- TransactionId frz_conflict_horizon = InvalidTransactionId;
/*
* One entry for every tuple that we may freeze.
@@ -691,7 +691,18 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts when
+ * the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin.
+ */
+ if (!(presult->all_visible_except_removable && presult->all_frozen))
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(presult->frz_conflict_horizon);
+ }
heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
}
else if ((!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
@@ -740,7 +751,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (do_freeze)
xlrec.snapshotConflictHorizon = Max(prstate.snapshotConflictHorizon,
- frz_conflict_horizon);
+ presult->frz_conflict_horizon);
else
xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4b45e8be1ad..8d3723faf3a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1373,33 +1373,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
-/*
- * Determine the snapshotConflictHorizon for freezing. Must only be called
- * after pruning and determining if the page is freezable.
- */
-TransactionId
-heap_frz_conflict_horizon(PruneFreezeResult *presult, HeapPageFreeze *pagefrz)
-{
- TransactionId result;
-
- /*
- * We can use frz_conflict_horizon as our cutoff for conflicts when the
- * whole page is eligible to become all-frozen in the VM once we're done
- * with it. Otherwise we generate a conservative cutoff by stepping back
- * from OldestXmin.
- */
- if (presult->all_visible_except_removable && presult->all_frozen)
- result = presult->frz_conflict_horizon;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- result = pagefrz->cutoffs->OldestXmin;
- TransactionIdRetreat(result);
- }
-
- return result;
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c36623f53bd..4e17347e625 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -290,8 +290,6 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
-extern TransactionId heap_frz_conflict_horizon(PruneFreezeResult *presult,
- HeapPageFreeze *pagefrz);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
--
2.39.2
v3heikki-0002-Misc-cleanup.patchtext/x-patch; charset=UTF-8; name=v3heikki-0002-Misc-cleanup.patchDownload
From 0219842487931f899abcf183c863c43270c098f0 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun, 17 Mar 2024 23:05:31 +0200
Subject: [PATCH v3heikki 2/4] Misc cleanup
FIXME and some comments I added with HEIKKI: prefix with questions
---
src/backend/access/heap/pruneheap.c | 16 +++++++---------
src/backend/access/heap/vacuumlazy.c | 7 ++++++-
2 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f4f5468e144..b3573bb628b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -32,10 +32,9 @@
/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
- Relation rel;
-
/* tuple visibility test, initialized for the relation */
GlobalVisState *vistest;
+
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
@@ -248,12 +247,12 @@ prune_freeze_xmin_is_removable(GlobalVisState *visstate, TransactionId xmin)
* the current relfrozenxid and relminmxids used if the caller is interested in
* freezing tuples on the page.
*
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
* heap_page_prune_and_freeze() is responsible for initializing it.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
@@ -294,7 +293,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* initialize the rest of our working state.
*/
prstate.new_prune_xid = InvalidTransactionId;
- prstate.rel = relation;
prstate.vistest = vistest;
prstate.mark_unused_now = mark_unused_now;
prstate.snapshotConflictHorizon = InvalidTransactionId;
@@ -302,9 +300,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
memset(prstate.marked, 0, sizeof(prstate.marked));
/*
- * presult->htsv is not initialized here because all ntuple spots in the
+ * prstate.htsv is not initialized here because all ntuple spots in the
* array will be set either to a valid HTSV_Result value or -1.
*/
+
presult->ndeleted = 0;
presult->nnewlpdead = 0;
presult->nfrozen = 0;
@@ -328,7 +327,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->new_relminmxid = InvalidMultiXactId;
maxoff = PageGetMaxOffsetNumber(page);
- tup.t_tableOid = RelationGetRelid(prstate.rel);
+ tup.t_tableOid = RelationGetRelid(relation);
/*
* Determine HTSV for all tuples.
@@ -378,7 +377,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
- Assert(ItemIdIsNormal(itemid));
/*
* The criteria for counting a tuple as live in this block need to
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8d3723faf3a..d3df7029667 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -433,7 +433,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* heap_page_prune_and_freeze(). We expect vistest will always make
* heap_page_prune_and_freeze() remove any deleted tuple whose xmax is <
* OldestXmin. lazy_scan_prune must never become confused about whether a
- * tuple should be frozen or removed. (In the future we might want to
+ * tuple should be frozen or removed.
+ * HEIKKI: Is such confusion possible anymore?
+ * (In the future we might want to
* teach lazy_scan_prune to recompute vistest from time to time, to
* increase the number of dead tuples it can prune away.)
*/
@@ -1387,6 +1389,9 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
* so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
* left with storage after pruning.
*
+ * HEIKKI: does the above paragraph still make sense? We don't call
+ * HeapTupleSatisfiesVacuum() here anymore
+ *
* As of Postgres 17, we circumvent this problem altogether by reusing the
* result of heap_page_prune_and_freeze()'s visibility check. Without the
* second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
--
2.39.2
v3heikki-0003-Move-work-to-the-first-loop.patchtext/x-patch; charset=UTF-8; name=v3heikki-0003-Move-work-to-the-first-loop.patchDownload
From d72cebf13f9866112309883f72a382fc2cb57e17 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun, 17 Mar 2024 23:08:42 +0200
Subject: [PATCH v3heikki 3/4] Move work to the first loop
It seems more efficient and more straightforward to do freezing in the
first loop. When it was part of the 2nd loop, the 2nd loop needed to
do more work (PageGetItemId() and related checks) for tuples that were
already processed as part of an earlier chain, while in the 1st loop
that work is already done.
---
src/backend/access/heap/pruneheap.c | 141 ++++++++++------------------
1 file changed, 52 insertions(+), 89 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b3573bb628b..3541628799a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -78,9 +78,6 @@ static int heap_prune_chain(Buffer buffer,
PruneState *prstate, PruneFreezeResult *presult);
static inline HTSV_Result htsv_get_valid_status(int status);
-static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum, PruneState *prstate,
- HeapPageFreeze *pagefrz, HeapTupleFreeze *frozen,
- PruneFreezeResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
@@ -322,6 +319,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* for recovery conflicts */
presult->frz_conflict_horizon = InvalidTransactionId;
+ /*
+ * We will update the VM after pruning, collecting LP_DEAD items, and
+ * freezing tuples. Keep track of whether or not the page is all_visible
+ * and all_frozen and use this information to update the VM. all_visible
+ * implies lpdead_items == 0, but don't trust all_frozen result unless
+ * all_visible is also set to true.
+ */
+ /* HEIKKI: the caller updates the VM? not us */
+ presult->all_frozen = true;
+
/* For advancing relfrozenxid and relminmxid */
presult->new_relfrozenxid = InvalidTransactionId;
presult->new_relminmxid = InvalidMultiXactId;
@@ -493,6 +500,42 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
+
+ if (prstate.htsv[offnum] != HEAPTUPLE_DEAD)
+ {
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the
+ * soft assumption that any LP_DEAD items encountered here will
+ * become LP_UNUSED later on, before count_nondeletable_pages is
+ * reached. If we don't make this assumption then rel truncation
+ * will only happen every other VACUUM, at most. Besides, VACUUM
+ * must treat hastup/nonempty_pages as provisional no matter how
+ * LP_DEAD items are handled (handled here, or handled later on).
+ */
+ presult->hastup = true;
+
+ /* Since we're not removing this tuple, consider freezing it */
+ if (pagefrz)
+ {
+ bool totally_frozen;
+
+ if ((heap_prepare_freeze_tuple(htup, pagefrz,
+ &frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to become
+ * totally frozen (according to its freeze plan), then the page definitely
+ * cannot be set all-frozen in the visibility map later on.
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+ }
+ }
}
/*
@@ -517,61 +560,29 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
presult->all_visible_except_removable = presult->all_visible;
- /*
- * We will update the VM after pruning, collecting LP_DEAD items, and
- * freezing tuples. Keep track of whether or not the page is all_visible
- * and all_frozen and use this information to update the VM. all_visible
- * implies lpdead_items == 0, but don't trust all_frozen result unless
- * all_visible is also set to true.
- */
- presult->all_frozen = true;
-
- /* Scan the page */
+ /* Scan the page for hot chains */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
ItemId itemid;
- /* see preceding loop */
- if (off_loc)
- *off_loc = offnum;
-
- if (pagefrz)
- prune_prepare_freeze_tuple(page, offnum, &prstate,
- pagefrz, frozen, presult);
-
- itemid = PageGetItemId(page, offnum);
-
- if (ItemIdIsNormal(itemid) &&
- prstate.htsv[offnum] != HEAPTUPLE_DEAD)
- {
- Assert(prstate.htsv[offnum] != -1);
-
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- */
- presult->hastup = true;
- }
-
/* Ignore items already processed as part of an earlier chain */
if (prstate.marked[offnum])
continue;
+ /* see preceding loop */
+ if (off_loc)
+ *off_loc = offnum;
+
/* Nothing to do if slot is empty */
+ itemid = PageGetItemId(page, offnum);
if (!ItemIdIsUsed(itemid))
continue;
/* Process this item or chain of items */
presult->ndeleted += heap_prune_chain(buffer, offnum,
&prstate, presult);
-
}
/* Clear the offset information once we have processed the given page. */
@@ -1217,54 +1228,6 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
return ndeleted;
}
-/*
- * While pruning, before actually executing pruning and updating the line
- * pointers, we may consider freezing tuples referred to by LP_NORMAL line
- * pointers whose visibility status is not HEAPTUPLE_DEAD. That is to say, we
- * want to consider freezing normal tuples which will not be removed.
-*/
-static void
-prune_prepare_freeze_tuple(Page page, OffsetNumber offnum, PruneState *prstate,
- HeapPageFreeze *pagefrz,
- HeapTupleFreeze *frozen,
- PruneFreezeResult *presult)
-{
- bool totally_frozen;
- HeapTupleHeader htup;
- ItemId itemid;
-
- Assert(pagefrz);
-
- itemid = PageGetItemId(page, offnum);
-
- if (!ItemIdIsNormal(itemid))
- return;
-
- /* We do not consider freezing tuples which will be removed. */
- if (prstate->htsv[offnum] == HEAPTUPLE_DEAD ||
- prstate->htsv[offnum] == -1)
- return;
-
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
- /* Tuple with storage -- consider need to freeze */
- if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &frozen[presult->nfrozen],
- &totally_frozen)))
- {
- /* Save prepared freeze plan for later */
- frozen[presult->nfrozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or eligible to become
- * totally frozen (according to its freeze plan), then the page definitely
- * cannot be set all-frozen in the visibility map later on
- */
- if (!totally_frozen)
- presult->all_frozen = false;
-}
-
/* Record lowest soon-prunable XID */
static void
heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
--
2.39.2
v3heikki-0004-Refactor-heap_page_prune_and_freeze-some-mo.patchtext/x-patch; charset=UTF-8; name=v3heikki-0004-Refactor-heap_page_prune_and_freeze-some-mo.patchDownload
From 978fc7c9c6a3c30f34c8d54a98aad8b5163fd0ab Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 18 Mar 2024 00:47:44 +0200
Subject: [PATCH v3heikki 4/4] Refactor heap_page_prune_and_freeze() some more
- Move WAL-logging to separate function
- Refactor the code that executes the actions prune, freeze and
hint-bit setting actions on the page. There's a little bit of
repetition, in how pd_prune_xid is set and the PageClearFull() call,
but I find this easier to follow.
- Instead of checking after-the-fact if MarkBufferDirtyHint()
generated an FPI, check before entering the critical section if it
would. There's a small chance that a checkpoint started in between
and this gives different answer, but it's a very rare and this is a
crude heuristic anyway.
---
src/backend/access/heap/pruneheap.c | 433 ++++++++++++++--------------
1 file changed, 214 insertions(+), 219 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3541628799a..4d7f7f7ea94 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -48,6 +48,9 @@ typedef struct
OffsetNumber nowdead[MaxHeapTuplesPerPage];
OffsetNumber nowunused[MaxHeapTuplesPerPage];
+ /* one entry for every tuple that we may freeze */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+
/*
* marked[i] is true if item i is entered in one of the above arrays.
*
@@ -89,6 +92,8 @@ static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber o
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
+static void log_heap_prune_and_freeze(Relation relation, Buffer buffer, PruneState *prstate, PruneFreezeResult *presult);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -270,14 +275,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_hint;
bool whole_page_freezable;
bool hint_bit_fpi;
- bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
- /*
- * One entry for every tuple that we may freeze.
- */
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
-
/*
* Our strategy is to scan the page and make lists of items to change,
* then apply the changes within a critical section. This keeps as much
@@ -520,11 +519,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool totally_frozen;
if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &frozen[presult->nfrozen],
+ &prstate.frozen[presult->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- frozen[presult->nfrozen++].offset = offnum;
+ prstate.frozen[presult->nfrozen++].offset = offnum;
}
/*
@@ -596,13 +595,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
- /*
- * Only incur overhead of checking if we will do an FPI if we might use
- * the information.
- */
- if (do_prune && pagefrz)
- prune_fpi = XLogCheckBufferNeedsBackup(buffer);
-
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update the hint
@@ -626,9 +618,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* opportunistic freeze heuristic must be improved; however, for now, try
* to approximate it.
*/
- do_freeze = pagefrz &&
- (pagefrz->freeze_required ||
- (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+ do_freeze = false;
+ if (pagefrz)
+ {
+ if (pagefrz->freeze_required)
+ do_freeze = true;
+ else if (whole_page_freezable && presult->nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. Have we already
+ * emitted an FPI, or will do so anyway?
+ */
+ if (hint_bit_fpi ||
+ ((do_prune || do_hint) && XLogCheckBufferNeedsBackup(buffer)))
+ {
+ do_freeze = true;
+ }
+ }
+ }
/*
* Validate the tuples we are considering freezing. We do this even if
@@ -636,85 +643,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* still may emit an FPI while setting the page hint bit later. But we
* want to avoid doing the pre-freeze checks in a critical section.
*/
- if (pagefrz &&
- (pagefrz->freeze_required ||
- (whole_page_freezable && presult->nfrozen > 0)))
- heap_pre_freeze_checks(buffer, frozen, presult->nfrozen);
-
- /*
- * If we are going to modify the page contents anyway, we will have to
- * update more than hint bits.
- */
- if (do_freeze || do_prune)
- do_hint = false;
-
- START_CRIT_SECTION();
-
- /*
- * Update the page's pd_prune_xid field to either zero, or the lowest XID
- * of any soon-prunable tuple.
- */
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
-
- /*
- * If pruning, freezing, or updating the hint bit, clear the "page is
- * full" flag if it is set since there's no point in repeating the
- * prune/defrag process until something else happens to the page.
- */
- if (do_prune || do_freeze || do_hint)
- PageClearFull(page);
-
- /*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
- */
- if (do_prune)
- {
- heap_page_prune_execute(buffer,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
- }
-
- /*
- * If we aren't pruning or freezing anything, but we updated pd_prune_xid,
- * this is a non-WAL-logged hint.
- */
- if (do_hint)
- {
- MarkBufferDirtyHint(buffer, true);
-
- /*
- * We may have decided not to opportunistically freeze above because
- * pruning would not emit an FPI. Now, however, if checksums are
- * enabled, setting the hint bit may have emitted an FPI. Check again
- * if we should freeze.
- */
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
-
- if (hint_bit_fpi)
- do_freeze = pagefrz &&
- (pagefrz->freeze_required ||
- (whole_page_freezable && presult->nfrozen > 0));
- }
-
if (do_freeze)
- {
- /*
- * We can use frz_conflict_horizon as our cutoff for conflicts when
- * the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin.
- */
- if (!(presult->all_visible_except_removable && presult->all_frozen))
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
- TransactionIdRetreat(presult->frz_conflict_horizon);
- }
- heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
- }
- else if ((!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
+ heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
+
+ if (!do_freeze && (!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
{
/*
* If we will neither freeze tuples on the page nor set the page all
@@ -725,162 +657,97 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->nfrozen = 0; /* avoid miscounts in instrumentation */
}
- if (do_prune || do_freeze)
- MarkBufferDirty(buffer);
+ START_CRIT_SECTION();
- /*
- * Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
- */
- if ((do_prune || do_freeze) && RelationNeedsWAL(relation))
+ if (do_hint)
{
- xl_heap_prune xlrec;
- XLogRecPtr recptr;
- xlhp_freeze freeze;
- xlhp_prune_items redirect,
- dead,
- unused;
-
- int nplans = 0;
- xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
- OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
-
- xlrec.flags = 0;
-
- if (RelationIsAccessibleInLogicalDecoding(relation))
- xlrec.flags |= XLHP_IS_CATALOG_REL;
-
/*
- * The snapshotConflictHorizon for the whole record should be the most
- * conservative of all the horizons calculated for any of the possible
- * modifications. If this record will prune tuples, any transactions
- * on the standby older than the youngest xmax of the most recently
- * removed tuple this record will prune will conflict. If this record
- * will freeze tuples, any transactions on the standby with xids older
- * than the youngest tuple this record will freeze will conflict.
+ * Update the page's pd_prune_xid field to either zero, or the lowest XID
+ * of any soon-prunable tuple.
*/
- if (do_freeze)
- xlrec.snapshotConflictHorizon = Max(prstate.snapshotConflictHorizon,
- presult->frz_conflict_horizon);
- else
- xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
+ ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
/*
- * Prepare deduplicated representation for use in WAL record
- * Destructively sorts tuples array in-place.
+ * Clear the "page is full" flag if it is set since there's no point
+ * in repeating the prune/defrag process until something else happens
+ * to the page.
*/
- if (do_freeze)
- nplans = heap_log_freeze_plan(frozen,
- presult->nfrozen, plans,
- frz_offsets);
- if (nplans > 0)
- xlrec.flags |= XLHP_HAS_FREEZE_PLANS;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
-
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ PageClearFull(page);
/*
- * The OffsetNumber arrays are not actually in the buffer, but we
- * pretend that they are. When XLogInsert stores the whole buffer,
- * the offset arrays need not be stored too.
+ * We only needed to update pd_prune_xid and clear the page-is-full
+ * hint bit, this is a non-WAL-logged hint. If we will also freeze or
+ * prune the page, we will mark the buffer dirty below.
*/
- if (nplans > 0)
- {
- freeze = (xlhp_freeze)
- {
- .nplans = nplans
- };
-
- XLogRegisterBufData(0, (char *) &freeze, offsetof(xlhp_freeze, plans));
-
- XLogRegisterBufData(0, (char *) plans,
- sizeof(xl_heap_freeze_plan) * freeze.nplans);
- }
-
-
- if (prstate.nredirected > 0)
- {
- xlrec.flags |= XLHP_HAS_REDIRECTIONS;
-
- redirect = (xlhp_prune_items)
- {
- .ntargets = prstate.nredirected
- };
-
- XLogRegisterBufData(0, (char *) &redirect,
- offsetof(xlhp_prune_items, data));
-
- XLogRegisterBufData(0, (char *) prstate.redirected,
- sizeof(OffsetNumber[2]) * prstate.nredirected);
- }
+ if (!do_freeze && !do_prune)
+ MarkBufferDirtyHint(buffer, true);
+ }
- if (prstate.ndead > 0)
+ if (do_prune || do_freeze)
+ {
+ /*
+ * Apply the planned item changes, then repair page fragmentation, and
+ * update the page's hint bit about whether it has free line pointers.
+ */
+ if (do_prune)
{
- xlrec.flags |= XLHP_HAS_DEAD_ITEMS;
-
- dead = (xlhp_prune_items)
- {
- .ntargets = prstate.ndead
- };
-
- XLogRegisterBufData(0, (char *) &dead,
- offsetof(xlhp_prune_items, data));
-
- XLogRegisterBufData(0, (char *) prstate.nowdead,
- sizeof(OffsetNumber) * dead.ntargets);
+ heap_page_prune_execute(buffer,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
}
- if (prstate.nunused > 0)
+ if (do_freeze)
{
- xlrec.flags |= XLHP_HAS_NOW_UNUSED_ITEMS;
-
- unused = (xlhp_prune_items)
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts when
+ * the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin.
+ */
+ if (!(presult->all_visible_except_removable && presult->all_frozen))
{
- .ntargets = prstate.nunused
- };
-
- XLogRegisterBufData(0, (char *) &unused,
- offsetof(xlhp_prune_items, data));
-
- XLogRegisterBufData(0, (char *) prstate.nowunused,
- sizeof(OffsetNumber) * unused.ntargets);
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(presult->frz_conflict_horizon);
+ }
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
}
- if (nplans > 0)
- XLogRegisterBufData(0, (char *) frz_offsets,
- sizeof(OffsetNumber) * presult->nfrozen);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+ MarkBufferDirty(buffer);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
+ */
+ if (RelationNeedsWAL(relation))
+ log_heap_prune_and_freeze(relation, buffer, &prstate, presult);
}
END_CRIT_SECTION();
- /* Caller won't update new_relfrozenxid and new_relminmxid */
- if (!pagefrz)
- return;
-
/*
+ * Let caller know how it can updaterelfrozenxid and relminmxid
+ *
* If we will freeze tuples on the page or, even if we don't freeze tuples
* on the page, if we will set the page all-frozen in the visibility map,
* we can advance relfrozenxid and relminmxid to the values in
* pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
*/
- if (presult->all_frozen || presult->nfrozen > 0)
+ if (pagefrz)
{
- presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
- presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
- }
- else
- {
- presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
- presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
+ if (presult->all_frozen || presult->nfrozen > 0)
+ {
+ presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
+ }
+ else
+ {
+ presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
+ }
}
}
-
/*
* Perform visibility checks for heap pruning.
*/
@@ -1634,3 +1501,131 @@ heap_get_root_tuples(Page page, OffsetNumber *root_offsets)
}
}
}
+
+
+static void
+log_heap_prune_and_freeze(Relation relation, Buffer buffer, PruneState *prstate, PruneFreezeResult *presult)
+{
+ xl_heap_prune xlrec;
+ XLogRecPtr recptr;
+ xlhp_freeze freeze;
+ xlhp_prune_items redirect,
+ dead,
+ unused;
+
+ int nplans = 0;
+ xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
+ OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_freeze = (presult->nfrozen > 0);
+
+ xlrec.flags = 0;
+
+ if (RelationIsAccessibleInLogicalDecoding(relation))
+ xlrec.flags |= XLHP_IS_CATALOG_REL;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions
+ * on the standby older than the youngest xmax of the most recently
+ * removed tuple this record will prune will conflict. If this record
+ * will freeze tuples, any transactions on the standby with xids older
+ * than the youngest tuple this record will freeze will conflict.
+ */
+ if (do_freeze)
+ xlrec.snapshotConflictHorizon = Max(prstate->snapshotConflictHorizon,
+ presult->frz_conflict_horizon);
+ else
+ xlrec.snapshotConflictHorizon = prstate->snapshotConflictHorizon;
+
+ /*
+ * Prepare deduplicated representation for use in WAL record
+ * Destructively sorts tuples array in-place.
+ */
+ if (do_freeze)
+ nplans = heap_log_freeze_plan(prstate->frozen,
+ presult->nfrozen, plans,
+ frz_offsets);
+ if (nplans > 0)
+ xlrec.flags |= XLHP_HAS_FREEZE_PLANS;
+
+ XLogBeginInsert();
+ XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
+
+ XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+
+ /*
+ * The OffsetNumber arrays are not actually in the buffer, but we
+ * pretend that they are. When XLogInsert stores the whole buffer,
+ * the offset arrays need not be stored too.
+ */
+ if (nplans > 0)
+ {
+ freeze = (xlhp_freeze)
+ {
+ .nplans = nplans
+ };
+
+ XLogRegisterBufData(0, (char *) &freeze, offsetof(xlhp_freeze, plans));
+
+ XLogRegisterBufData(0, (char *) plans,
+ sizeof(xl_heap_freeze_plan) * freeze.nplans);
+ }
+
+
+ if (prstate->nredirected > 0)
+ {
+ xlrec.flags |= XLHP_HAS_REDIRECTIONS;
+
+ redirect = (xlhp_prune_items)
+ {
+ .ntargets = prstate->nredirected
+ };
+
+ XLogRegisterBufData(0, (char *) &redirect,
+ offsetof(xlhp_prune_items, data));
+
+ XLogRegisterBufData(0, (char *) prstate->redirected,
+ sizeof(OffsetNumber[2]) * prstate->nredirected);
+ }
+
+ if (prstate->ndead > 0)
+ {
+ xlrec.flags |= XLHP_HAS_DEAD_ITEMS;
+
+ dead = (xlhp_prune_items)
+ {
+ .ntargets = prstate->ndead
+ };
+
+ XLogRegisterBufData(0, (char *) &dead,
+ offsetof(xlhp_prune_items, data));
+
+ XLogRegisterBufData(0, (char *) prstate->nowdead,
+ sizeof(OffsetNumber) * dead.ntargets);
+ }
+
+ if (prstate->nunused > 0)
+ {
+ xlrec.flags |= XLHP_HAS_NOW_UNUSED_ITEMS;
+
+ unused = (xlhp_prune_items)
+ {
+ .ntargets = prstate->nunused
+ };
+
+ XLogRegisterBufData(0, (char *) &unused,
+ offsetof(xlhp_prune_items, data));
+
+ XLogRegisterBufData(0, (char *) prstate->nowunused,
+ sizeof(OffsetNumber) * unused.ntargets);
+ }
+
+ if (nplans > 0)
+ XLogRegisterBufData(0, (char *) frz_offsets,
+ sizeof(OffsetNumber) * presult->nfrozen);
+
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+
+ PageSetLSN(BufferGetPage(buffer), recptr);
+}
--
2.39.2
On Mon, Mar 18, 2024 at 01:15:21AM +0200, Heikki Linnakangas wrote:
On 15/03/2024 02:56, Melanie Plageman wrote:
Okay, so I was going to start using xl_heap_prune for vacuum here too,
but I realized it would be bigger because of the
snapshotConflictHorizon. Do you think there is a non-terrible way to
make the snapshotConflictHorizon optional? Like with a flag?Yeah, another flag would do the trick.
Okay, I've done this in attached v4 (including removing
XLOG_HEAP2_VACUUM). I had to put the snapshot conflict horizon in the
"main chunk" of data available at replay regardless of whether or not
the record ended up including an FPI.
I made it its own sub-record (xlhp_conflict_horizon) less to help with
alignment (though we can use all the help we can get there) and more to
keep it from getting lost. When you look at heapam_xlog.h, you can see
what a XLOG_HEAP2_PRUNE record will contain starting with the
xl_heap_prune struct and then all the sub-record types.
I introduced a few sub-record types similar to what you suggested --
they help a bit with alignment, so I think they are worth keeping. There
are comments around them, but perhaps a larger diagram of the layout of
a the new XLOG_HEAP2_PRUNE record would be helpful.I started doing this, but I can't find a way of laying out the diagram
that pgindent doesn't destroy. I thought I remember other diagrams in
the source code showing the layout of something (something with pages
somewhere?) that don't get messed up by pgindent, but I couldn't find
them.See src/tools/pgindent/README, section "Cleaning up in case of failure or
ugly output":/*----------
* Text here will not be touched by pgindent.
*/
Cool. This version doesn't include the spiffy drawing I promised yet.
Note that we don't provide WAL compatibility across major versions. You can
fully remove the old xl_heap_freeze_page format. (We should bump
XLOG_PAGE_MAGIC when this is committed though)
I've removed the xl_heap_freeze (and xl_heap_prune). I didn't bump
XLOG_PAGE_MAGIC.
On the point of removing the freeze-only code path from
heap_page_prune() (now heap_page_prune_and_freeze()): while doing this,
I realized that heap_pre_freeze_checks() was not being called in the
case that we decide to freeze because we emitted an FPI while setting
the hint bit. I've fixed that, however, I've done so by moving
heap_pre_freeze_checks() into the critical section. I think that is not
okay? I could move it earlier and not do call it when the hint bit FPI
leads us to freeze tuples. But, I think that would lead to us doing a
lot less validation of tuples being frozen when checksums are enabled.
Or, I could make two critical sections?I found another approach and just do the pre-freeze checks if we are
considering freezing except for the FPI criteria.Hmm, I think you can make this simpler if you use
XLogCheckBufferNeedsBackup() to check if the hint-bit update will generate a
full-page-image before entering the critical section. Like you did to check
if pruning will generate a full-page-image. I included that change in the
attached patches.
I used your proposed structure. You had XLogCheckBufferNeedsBackup()
twice in your proposed version a few lines apart. I don't think there is
any point in checking it twice. If we are going to rely on
XLogCheckBufferNeedsBackup() to tell us whether or not setting the hint
bit is *likely* to emit an FPI, then we might as well just call
XLogCheckBufferNeedsBackup() once.
I don't fully understand this:
/*
* If we will freeze tuples on the page or they were all already frozen
* on the page, if we will set the page all-frozen in the visibility map,
* we can advance relfrozenxid and relminmxid to the values in
* pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
*/
if (presult->all_frozen || presult->nfrozen > 0)
{
presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
}
else
{
presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
}Firstly, the comment is outdated, because we have already done the freezing
at this point. But more importantly, I don't understand the logic here.
Could we check just presult->nfrozen > 0 here and get the same result?
I've updated the comment. The reason I had
if (presult->all_frozen || presult->nfrozen > 0) was because of this
comment in heapam.h in the HeapPageFreeze struct
* "Freeze" NewRelfrozenXid/NewRelminMxid trackers.
*
* Trackers used when heap_freeze_execute_prepared freezes, or when there
* are zero freeze plans for a page. It is always valid for vacuumlazy.c
* to freeze any page, by definition. This even includes pages that have
* no tuples with storage to consider in the first place. That way the
* 'totally_frozen' results from heap_prepare_freeze_tuple can always be
* used in the same way, even when no freeze plans need to be executed to
* "freeze the page". Only the "freeze" path needs to consider the need
* to set pages all-frozen in the visibility map under this scheme.
*
* When we freeze a page, we generally freeze all XIDs < OldestXmin, only
* leaving behind XIDs that are ineligible for freezing, if any. And so
* you might wonder why these trackers are necessary at all; why should
* _any_ page that VACUUM freezes _ever_ be left with XIDs/MXIDs that
* ratchet back the top-level NewRelfrozenXid/NewRelminMxid trackers?
*
* It is useful to use a definition of "freeze the page" that does not
* overspecify how MultiXacts are affected. heap_prepare_freeze_tuple
* generally prefers to remove Multis eagerly, but lazy processing is used
* in cases where laziness allows VACUUM to avoid allocating a new Multi.
* The "freeze the page" trackers enable this flexibility.
*/
So, I don't really know if it is right to just check presult->nfrozen >
0 when updating relminmxid. I have changed it to the way you suggested.
But we can change it back.
Here are some patches on top of your patches for some further refactorings.
Some notable changes in heap_page_prune_and_freeze():- I moved the heap_prepare_freeze_tuple() work from the 2nd loop to the 1st
one. It seems more clear and efficient that way.
cool. I kept this.
- I extracted the code to generate the WAL record to a separate function.
cool. kept this.
These patches are in a very rough WIP state, but I wanted to share early. I
haven't done much testing, and I'm not wedded to these changes, but I think
they make it more readable. Please include / squash in the patch set if you
agree with them.
I've squashed the changes into and across my nineteen patches :)
I cleaned up your sugestions a bit and made a few stylistic choices.
In this version, I also broke up the last couple commits which
streamlined the WAL record and eliminated XLOG_HEAP2_FREEZE/VACUUM and
redistributed those changes in a way that I thought made sense.
Now, the progression is that in one commit we merge the prune and freeze
record, eliminating the XLOG_HEAP2_FREEZE record. Then, in another
commit, we eliminate the XLOG_HEAP2_VACUUM record. Then a later commit
streamlines the new mega xl_heap_prune struct into the variable size
structure based on which modifications it includes.
From 622620a7875ae8c1626e9cd118156e0c734d44ed Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun, 17 Mar 2024 22:52:28 +0200
Subject: [PATCH v3heikki 1/4] Inline heap_frz_conflict_horizon() to the
caller.FIXME: This frz_conflict_horizon business looks fishy to me. We have:
- local frz_conflict_horizon variable,
- presult->frz_conflict_horizon, and
- prstate.snapshotConflictHorizon
Yea, this is a mistake I made when I was rebasing some changes in. The
local variable is gone now.
should we really have all three, and what are the differences?
We do need both the prstate.snapshotConflictHorizon and the
presult->frz_conflict_horizon because the youngest freezable xmin will
often be different than the oldest removable xmax, so we have to track
both and take the most conservative one if we are both freezing and
pruning.
The third (local variable) one was an oops.
From 0219842487931f899abcf183c863c43270c098f0 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas <heikki.linnakangas@iki.fi> Date: Sun, 17 Mar 2024 23:05:31 +0200 Subject: [PATCH v3heikki 2/4] Misc cleanup --- src/backend/access/heap/pruneheap.c | 16 +++++++--------- src/backend/access/heap/vacuumlazy.c | 7 ++++++- 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 8d3723faf3a..d3df7029667 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -433,7 +433,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params, * heap_page_prune_and_freeze(). We expect vistest will always make * heap_page_prune_and_freeze() remove any deleted tuple whose xmax is < * OldestXmin. lazy_scan_prune must never become confused about whether a - * tuple should be frozen or removed. (In the future we might want to + * tuple should be frozen or removed. + * HEIKKI: Is such confusion possible anymore? + * (In the future we might want to * teach lazy_scan_prune to recompute vistest from time to time, to * increase the number of dead tuples it can prune away.)
TBH, I don't really know what this comment is even saying. But
lazy_scan_prune() doesn't do any freezing anymore, so I removed this
sentence.
*/ @@ -1387,6 +1389,9 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno, * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were * left with storage after pruning. * + * HEIKKI: does the above paragraph still make sense? We don't call + * HeapTupleSatisfiesVacuum() here anymore + *
Yea this whole comment definitely doesn't belong here anymore. Even
though we are calling HeapTupleSatisfiesVacuum() (from inside
heap_prune_satisifes_vacuum()) inside heap_page_prune_and_freeze(), the
comment really doesn't fit anywhere in there either. The comment is
describing a situation that is no longer possible. So describing a
situation that is no longer possible in a part of the code that it never
could have been possible does not make sense to me. I've removed the
comment.
* As of Postgres 17, we circumvent this problem altogether by reusing the
* result of heap_page_prune_and_freeze()'s visibility check. Without the
* second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
--
2.39.2
From d72cebf13f9866112309883f72a382fc2cb57e17 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sun, 17 Mar 2024 23:08:42 +0200
Subject: [PATCH v3heikki 3/4] Move work to the first loop
src/backend/access/heap/pruneheap.c | 141 ++++++++++------------------
1 file changed, 52 insertions(+), 89 deletions(-)diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index b3573bb628b..3541628799a 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -78,9 +78,6 @@ static int heap_prune_chain(Buffer buffer, PruneState *prstate, PruneFreezeResult *presult);static inline HTSV_Result htsv_get_valid_status(int status); -static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum, PruneState *prstate, - HeapPageFreeze *pagefrz, HeapTupleFreeze *frozen, - PruneFreezeResult *presult); static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid); static void heap_prune_record_redirect(PruneState *prstate, OffsetNumber offnum, OffsetNumber rdoffnum, @@ -322,6 +319,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer, /* for recovery conflicts */ presult->frz_conflict_horizon = InvalidTransactionId;+ /* + * We will update the VM after pruning, collecting LP_DEAD items, and + * freezing tuples. Keep track of whether or not the page is all_visible + * and all_frozen and use this information to update the VM. all_visible + * implies lpdead_items == 0, but don't trust all_frozen result unless + * all_visible is also set to true. + */ + /* HEIKKI: the caller updates the VM? not us */
I've updated this comment.
Other questions and notes on v4:
xl_heap_prune->flags is a uint8, but we are already using 7 of the bits.
Should we make it a unit16?
Eventually, I would like to avoid emitting a separate XLOG_HEAP2_VISIBLE
record for vacuum's first and second passes and just include the VM
update flags in the xl_heap_prune record. xl_heap_visible->flags is a
uint8. If we made xl_heap_prune->flags uint16, we could probably combine
them (though maybe we want other bits available). Also vacuum's second
pass doesn't set a snapshotConflictHorizon, so if we combined
xl_heap_visible and xl_heap_prune for vacuum we would end up saving even
more space (since vacuum sets xl_heap_visible->snapshotConflictHorizon
to InvalidXLogRecPtr).
A note on sub-record naming: I kept xl_heap_freeze_plan's name but
prefixed the other sub-records with xlhp. Do you think it is worth
renaming it (to xlhp_freeze_plan)? Also, should I change xlhp_freeze to
xlhp_freeze_page?
- Melanie
Attachments:
v4-0001-Reorganize-heap_page_prune-function-comment.patchtext/x-diff; charset=us-asciiDownload
From d1b1ce71c88a04e3c5a2480827bd3af3ffbe1372 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 18 Mar 2024 19:01:17 -0400
Subject: [PATCH v4 01/19] Reorganize heap_page_prune() function comment
heap_page_prune()'s function header comment didn't explain the
parameters in the same order they appear in the function. Fix that.
While we are at it, move some parts of the initial function body
comments around so they are in more relevant locations.
---
src/backend/access/heap/pruneheap.c | 30 ++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4f12413b8b1..b5895406ec2 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -193,22 +193,26 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
- * also need to account for a reduction in the length of the line pointer
- * array following array truncation by us.
+ * also need to account for a reduction in the length of the line pointer array
+ * following array truncation by us.
*
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum and
- * HeapTupleSatisfiesVacuum).
+ * Our strategy is to scan the page and make lists of items to change, then
+ * apply the changes within a critical section. This keeps as much logic as
+ * possible out of the critical section, and also ensures that WAL replay will
+ * work the same as the normal case.
*
- * mark_unused_now indicates whether or not dead items can be set LP_UNUSED during
- * pruning.
+ * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD (see
+ * heap_prune_satisfies_vacuum and HeapTupleSatisfiesVacuum).
*
- * off_loc is the offset location required by the caller to use in error
- * callback.
+ * mark_unused_now indicates whether or not dead items can be set LP_UNUSED
+ * during pruning.
*
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
* heap_page_prune() is responsible for initializing it.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
*/
void
heap_page_prune(Relation relation, Buffer buffer,
@@ -225,11 +229,6 @@ heap_page_prune(Relation relation, Buffer buffer,
HeapTupleData tup;
/*
- * Our strategy is to scan the page and make lists of items to change,
- * then apply the changes within a critical section. This keeps as much
- * logic as possible out of the critical section, and also ensures that
- * WAL replay will work the same as the normal case.
- *
* First, initialize the new pd_prune_xid value to zero (indicating no
* prunable tuples). If we find any tuples which may soon become
* prunable, we will save the lowest relevant XID in new_prune_xid. Also
@@ -241,12 +240,13 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.mark_unused_now = mark_unused_now;
prstate.snapshotConflictHorizon = InvalidTransactionId;
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
- memset(prstate.marked, 0, sizeof(prstate.marked));
/*
* presult->htsv is not initialized here because all ntuple spots in the
* array will be set either to a valid HTSV_Result value or -1.
*/
+ memset(prstate.marked, 0, sizeof(prstate.marked));
+
presult->ndeleted = 0;
presult->nnewlpdead = 0;
--
2.40.1
v4-0002-Remove-unused-PruneState-member-rel.patchtext/x-diff; charset=us-asciiDownload
From f7c3a7680e144e55f8c1c3eccaa5a7a1af47a47e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 18 Mar 2024 18:59:09 -0400
Subject: [PATCH v4 02/19] Remove unused PruneState member rel
PruneState->rel is no longer being used, so just remove it.
---
src/backend/access/heap/pruneheap.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b5895406ec2..08cb2a6f533 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -29,8 +29,6 @@
/* Working data for heap_page_prune and subroutines */
typedef struct
{
- Relation rel;
-
/* tuple visibility test, initialized for the relation */
GlobalVisState *vistest;
/* whether or not dead items can be set LP_UNUSED during pruning */
@@ -235,7 +233,6 @@ heap_page_prune(Relation relation, Buffer buffer,
* initialize the rest of our working state.
*/
prstate.new_prune_xid = InvalidTransactionId;
- prstate.rel = relation;
prstate.vistest = vistest;
prstate.mark_unused_now = mark_unused_now;
prstate.snapshotConflictHorizon = InvalidTransactionId;
@@ -251,7 +248,7 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->nnewlpdead = 0;
maxoff = PageGetMaxOffsetNumber(page);
- tup.t_tableOid = RelationGetRelid(prstate.rel);
+ tup.t_tableOid = RelationGetRelid(relation);
/*
* Determine HTSV for all tuples.
--
2.40.1
v4-0003-lazy_scan_prune-tests-tuple-vis-with-GlobalVisTes.patchtext/x-diff; charset=us-asciiDownload
From 416650477aa7a2da4db57070fcb2c3524093bf09 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:14:47 -0500
Subject: [PATCH v4 03/19] lazy_scan_prune tests tuple vis with GlobalVisTest
One requirement for eventually combining the prune and freeze records,
is that we must check during pruning if live tuples on the page are
visible to everyone and thus, whether or not the page is all visible. We
only consider opportunistically freezing tuples if the whole page is all
visible and could be set all frozen.
During pruning (in heap_page_prune()), we do not have access to
VacuumCutoffs -- as on access pruning also calls heap_page_prune(). We
do, however, have access to a GlobalVisState. This can be used to
determine whether or not the tuple is visible to everyone. It also has
the potential of being more up-to-date than VacuumCutoffs->OldestXmin.
This commit simply modifies lazy_scan_prune() to use GlobalVisState
instead of OldestXmin. Future commits will move the
all_visible/all_frozen calculation into heap_page_prune().
---
src/backend/access/heap/vacuumlazy.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 18004907750..3a991f0ea71 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1579,11 +1579,15 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
+ * that everyone sees it as committed? A
+ * FrozenTransactionId is seen as committed to everyone.
+ * Otherwise, we check if there is a snapshot that
+ * considers this xid to still be running, and if so, we
+ * don't consider the page all-visible.
*/
xmin = HeapTupleHeaderGetXmin(htup);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ if (xmin != FrozenTransactionId &&
+ !GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
{
all_visible = false;
break;
--
2.40.1
v4-0004-Pass-heap_prune_chain-PruneResult-output-paramete.patchtext/x-diff; charset=us-asciiDownload
From 94c0256a9c7e11001fcf4dec8998943e0b326f50 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:39:59 -0500
Subject: [PATCH v4 04/19] Pass heap_prune_chain() PruneResult output parameter
Future commits will set other members of PruneResult in
heap_prune_chain(), so start passing it as an output parameter now. This
eliminates the output parameter htsv -- the array of HTSV_Results --
since that is a member of the PruneResult.
---
src/backend/access/heap/pruneheap.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 08cb2a6f533..7eb21b603ba 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -59,8 +59,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- int8 *htsv,
- PruneState *prstate);
+ PruneState *prstate, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -322,7 +321,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Process this item or chain of items */
presult->ndeleted += heap_prune_chain(buffer, offnum,
- presult->htsv, &prstate);
+ &prstate, presult);
}
/* Clear the offset information once we have processed the given page. */
@@ -451,7 +450,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in htsv.
+ * Tuple visibility information is provided in presult->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -481,7 +480,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- int8 *htsv, PruneState *prstate)
+ PruneState *prstate, PruneResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -502,7 +501,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(htsv[rootoffnum] != -1);
+ Assert(presult->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -525,7 +524,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -622,7 +621,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(htsv[offnum]))
+ switch (htsv_get_valid_status(presult->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
--
2.40.1
v4-0005-heap_page_prune-sets-all_visible-and-frz_conflict.patchtext/x-diff; charset=us-asciiDownload
From caecf591376392a5518cb42923542f380e1c327c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 14:01:37 -0500
Subject: [PATCH v4 05/19] heap_page_prune sets all_visible and
frz_conflict_horizon
In order to combine the prune and freeze records, we must know if the
page is eligible to be opportunistically frozen before finishing
pruning. Save all_visible in the PruneResult and set it to false when we
see non-removable tuples which are not visible to everyone.
We will also need to ensure that the snapshotConflictHorizon for the combined
prune + freeze record is the more conservative of that calculated for each of
pruning and freezing. Calculate the visibility_cutoff_xid for the purposes of
freezing -- the newest xmin on the page -- in heap_page_prune() and save it in
PruneResult.frz_conflict_horizon.
---
src/backend/access/heap/pruneheap.c | 127 +++++++++++++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 121 ++++++-------------------
src/include/access/heapam.h | 3 +
3 files changed, 151 insertions(+), 100 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7eb21b603ba..bd30296ef1a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -63,8 +63,10 @@ static int heap_prune_chain(Buffer buffer,
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
-static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -246,6 +248,14 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ /*
+ * Keep track of whether or not the page is all_visible in case the caller
+ * wants to use this information to update the VM.
+ */
+ presult->all_visible = true;
+ /* for recovery conflicts */
+ presult->frz_conflict_horizon = InvalidTransactionId;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -297,8 +307,97 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
+ switch (presult->htsv[offnum])
+ {
+ case HEAPTUPLE_DEAD:
+
+ /*
+ * Deliberately delay unsetting all_visible until later during
+ * pruning. Removable dead tuples shouldn't preclude freezing
+ * the page. After finishing this first pass of tuple
+ * visibility checks, initialize all_visible_except_removable
+ * with the current value of all_visible to indicate whether
+ * or not the page is all visible except for dead tuples. This
+ * will allow us to attempt to freeze the page after pruning.
+ * Later during pruning, if we encounter an LP_DEAD item or
+ * are setting an item LP_DEAD, we will unset all_visible. As
+ * long as we unset it before updating the visibility map,
+ * this will be correct.
+ */
+ break;
+ case HEAPTUPLE_LIVE:
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed
+ * asynchronously. See SetHintBits for more info. Check that
+ * the tuple is hinted xmin-committed because of that.
+ */
+ if (presult->all_visible)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed? A
+ * FrozenTransactionId is seen as committed to everyone.
+ * Otherwise, we check if there is a snapshot that
+ * considers this xid to still be running, and if so, we
+ * don't consider the page all-visible.
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+ if (xmin != FrozenTransactionId &&
+ !GlobalVisTestIsRemovableXid(vistest, xmin))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, presult->frz_conflict_horizon) &&
+ TransactionIdIsNormal(xmin))
+ presult->frz_conflict_horizon = xmin;
+ }
+ break;
+ case HEAPTUPLE_RECENTLY_DEAD:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+ /* This is an expected case during concurrent vacuum */
+ presult->all_visible = false;
+ break;
+ default:
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+ break;
+ }
}
+ /*
+ * For vacuum, if the whole page will become frozen, we consider
+ * opportunistically freezing tuples. Dead tuples which will be removed by
+ * the end of vacuuming should not preclude us from opportunistically
+ * freezing. We will not be able to freeze the whole page if there are
+ * tuples present which are not visible to everyone or if there are dead
+ * tuples which are not yet removable. We need all_visible to be false if
+ * LP_DEAD tuples remain after pruning so that we do not incorrectly
+ * update the visibility map or page hint bit. So, we will update
+ * presult->all_visible to reflect the presence of LP_DEAD items while
+ * pruning and keep all_visible_except_removable to permit freezing if the
+ * whole page will eventually become all visible after removing tuples.
+ */
+ presult->all_visible_except_removable = presult->all_visible;
+
/* Scan the page */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -593,10 +692,14 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
/*
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now. We don't increment ndeleted here since
- * the LP was already marked dead.
+ * the LP was already marked dead. If it will not be marked
+ * LP_UNUSED, it will remain LP_DEAD, making the page not
+ * all_visible.
*/
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
+ else
+ presult->all_visible = false;
break;
}
@@ -733,7 +836,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
}
@@ -746,7 +849,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect item. We can clean up by setting the redirect item to
* DEAD state or LP_UNUSED if the caller indicated.
*/
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
return ndeleted;
@@ -783,13 +886,20 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
-heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+
+ /*
+ * Setting the line pointer LP_DEAD means the page will definitely not be
+ * all_visible.
+ */
+ presult->all_visible = false;
}
/*
@@ -799,7 +909,8 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
* pointers LP_DEAD if mark_unused_now is true.
*/
static void
-heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
@@ -810,7 +921,7 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
- heap_prune_record_dead(prstate, offnum);
+ heap_prune_record_dead(prstate, offnum, presult);
}
/* Record line pointer to be marked unused */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3a991f0ea71..f9892f4cd08 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1422,9 +1422,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_visible,
- all_frozen;
- TransactionId visibility_cutoff_xid;
+ bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1465,17 +1463,16 @@ lazy_scan_prune(LVRelState *vacrel,
&presult, &vacrel->offnum);
/*
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Keep track of whether or not the page is all_visible and
- * all_frozen and use this information to update the VM. all_visible
- * implies 0 lpdead_items, but don't trust all_frozen result unless
- * all_visible is also set to true.
+ * Now scan the page to collect LP_DEAD items and check for tuples
+ * requiring freezing among remaining tuples with storage. We will update
+ * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
+ * have determined whether or not the page is all_visible. Keep track of
+ * whether or not the page is all_frozen and use this information to
+ * update the VM. all_visible implies lpdead_items == 0, but don't trust
+ * all_frozen result unless all_visible is also set to true.
*
- * Also keep track of the visibility cutoff xid for recovery conflicts.
*/
- all_visible = true;
all_frozen = true;
- visibility_cutoff_xid = InvalidTransactionId;
/*
* Now scan the page to collect LP_DEAD items and update the variables set
@@ -1516,11 +1513,6 @@ lazy_scan_prune(LVRelState *vacrel,
* will only happen every other VACUUM, at most. Besides, VACUUM
* must treat hastup/nonempty_pages as provisional no matter how
* LP_DEAD items are handled (handled here, or handled later on).
- *
- * Also deliberately delay unsetting all_visible until just before
- * we return to lazy_scan_heap caller, as explained in full below.
- * (This is another case where it's useful to anticipate that any
- * LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
*/
deadoffsets[lpdead_items++] = offnum;
continue;
@@ -1558,46 +1550,6 @@ lazy_scan_prune(LVRelState *vacrel,
* what acquire_sample_rows() does.
*/
live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A
- * FrozenTransactionId is seen as committed to everyone.
- * Otherwise, we check if there is a snapshot that
- * considers this xid to still be running, and if so, we
- * don't consider the page all-visible.
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (xmin != FrozenTransactionId &&
- !GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
- {
- all_visible = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
break;
case HEAPTUPLE_RECENTLY_DEAD:
@@ -1607,7 +1559,6 @@ lazy_scan_prune(LVRelState *vacrel,
* pruning.)
*/
recently_dead_tuples++;
- all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1618,16 +1569,13 @@ lazy_scan_prune(LVRelState *vacrel,
* results. This assumption is a bit shaky, but it is what
* acquire_sample_rows() does, so be consistent.
*/
- all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
- all_visible = false;
/*
- * Count such rows as live. As above, we assume the deleting
- * transaction will commit and update the counters after we
- * report.
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
*/
live_tuples++;
break;
@@ -1670,7 +1618,7 @@ lazy_scan_prune(LVRelState *vacrel,
* page all-frozen afterwards (might not happen until final heap pass).
*/
if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (all_visible && all_frozen &&
+ (presult.all_visible_except_removable && all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1703,16 +1651,16 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->frozen_pages++;
/*
- * We can use visibility_cutoff_xid as our cutoff for conflicts
+ * We can use frz_conflict_horizon as our cutoff for conflicts
* when the whole page is eligible to become all-frozen in the VM
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (all_visible && all_frozen)
+ if (presult.all_visible_except_removable && all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = visibility_cutoff_xid;
- visibility_cutoff_xid = InvalidTransactionId;
+ snapshotConflictHorizon = presult.frz_conflict_horizon;
+ presult.frz_conflict_horizon = InvalidTransactionId;
}
else
{
@@ -1748,17 +1696,19 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && lpdead_items == 0)
+ if (presult.all_visible)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
+ Assert(lpdead_items == 0);
+
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
Assert(false);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == visibility_cutoff_xid);
+ debug_cutoff == presult.frz_conflict_horizon);
}
#endif
@@ -1783,19 +1733,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(dead_items->num_items <= dead_items->max_items);
pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
dead_items->num_items);
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on
- * to make the choice of whether or not to freeze the page unaffected
- * by the short-term presence of LP_DEAD items. These LP_DEAD items
- * were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible. It needs
- * to reflect the present state of things, as expected by our caller.
- */
- all_visible = false;
}
/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1812,20 +1749,20 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (lpdead_items > 0);
- Assert(!all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_visible || !(*has_lpdead_items));
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (all_frozen)
{
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1845,7 +1782,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
+ vmbuffer, presult.frz_conflict_horizon,
flags);
}
@@ -1893,7 +1830,7 @@ lazy_scan_prune(LVRelState *vacrel,
* it as all-frozen. Note that all_frozen is only valid if all_visible is
* true, so we must check both all_visible and all_frozen.
*/
- else if (all_visible_according_to_vm && all_visible &&
+ else if (all_visible_according_to_vm && presult.all_visible &&
all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
@@ -1910,11 +1847,11 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Set the page all-frozen (and all-visible) in the VM.
*
- * We can pass InvalidTransactionId as our visibility_cutoff_xid,
- * since a snapshotConflictHorizon sufficient to make everything safe
- * for REDO was logged when the page's tuples were frozen.
+ * We can pass InvalidTransactionId as our frz_conflict_horizon, since
+ * a snapshotConflictHorizon sufficient to make everything safe for
+ * REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f68593..d8e65ae7e35 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,6 +198,9 @@ typedef struct PruneResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ bool all_visible; /* Whether or not the page is all visible */
+ bool all_visible_except_removable;
+ TransactionId frz_conflict_horizon; /* Newest xmin on the page */
/*
* Tuple visibility is only computed once for each tuple, for correctness
--
2.40.1
v4-0006-Add-reference-to-VacuumCutoffs-in-HeapPageFreeze.patchtext/x-diff; charset=us-asciiDownload
From 23036ff6bad6a50574dac7f398fcdf9f171f7120 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 16:22:17 -0500
Subject: [PATCH v4 06/19] Add reference to VacuumCutoffs in HeapPageFreeze
Future commits will move opportunistic freezing into the main path of
pruning in heap_page_prune(). Because on-access pruning will not do
opportunistic freezing, it is cleaner to keep the visibility information
required for calling heap_prepare_freeze_tuple() inside of the
HeapPageFreeze structure itself by saving a reference to VacuumCutoffs.
---
src/backend/access/heap/heapam.c | 67 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 3 +-
src/include/access/heapam.h | 2 +-
3 files changed, 36 insertions(+), 36 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34bc60f625f..7261c4988d7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6023,7 +6023,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
*/
static TransactionId
FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
- const struct VacuumCutoffs *cutoffs, uint16 *flags,
+ uint16 *flags,
HeapPageFreeze *pagefrz)
{
TransactionId newxmax;
@@ -6049,12 +6049,12 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
pagefrz->freeze_required = true;
return InvalidTransactionId;
}
- else if (MultiXactIdPrecedes(multi, cutoffs->relminmxid))
+ else if (MultiXactIdPrecedes(multi, pagefrz->cutoffs->relminmxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found multixact %u from before relminmxid %u",
- multi, cutoffs->relminmxid)));
- else if (MultiXactIdPrecedes(multi, cutoffs->OldestMxact))
+ multi, pagefrz->cutoffs->relminmxid)));
+ else if (MultiXactIdPrecedes(multi, pagefrz->cutoffs->OldestMxact))
{
TransactionId update_xact;
@@ -6069,7 +6069,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u from before multi freeze cutoff %u found to be still running",
- multi, cutoffs->OldestMxact)));
+ multi, pagefrz->cutoffs->OldestMxact)));
if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
{
@@ -6080,13 +6080,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
/* replace multi with single XID for its updater? */
update_xact = MultiXactIdGetUpdateXid(multi, t_infomask);
- if (TransactionIdPrecedes(update_xact, cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(update_xact, pagefrz->cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains update XID %u from before relfrozenxid %u",
multi, update_xact,
- cutoffs->relfrozenxid)));
- else if (TransactionIdPrecedes(update_xact, cutoffs->OldestXmin))
+ pagefrz->cutoffs->relfrozenxid)));
+ else if (TransactionIdPrecedes(update_xact, pagefrz->cutoffs->OldestXmin))
{
/*
* Updater XID has to have aborted (otherwise the tuple would have
@@ -6098,7 +6098,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains committed update XID %u from before removable cutoff %u",
multi, update_xact,
- cutoffs->OldestXmin)));
+ pagefrz->cutoffs->OldestXmin)));
*flags |= FRM_INVALIDATE_XMAX;
pagefrz->freeze_required = true;
return InvalidTransactionId;
@@ -6150,9 +6150,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
{
TransactionId xid = members[i].xid;
- Assert(!TransactionIdPrecedes(xid, cutoffs->relfrozenxid));
+ Assert(!TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid));
- if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->FreezeLimit))
{
/* Can't violate the FreezeLimit postcondition */
need_replace = true;
@@ -6164,7 +6164,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
/* Can't violate the MultiXactCutoff postcondition, either */
if (!need_replace)
- need_replace = MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff);
+ need_replace = MultiXactIdPrecedes(multi, pagefrz->cutoffs->MultiXactCutoff);
if (!need_replace)
{
@@ -6203,7 +6203,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
TransactionId xid = members[i].xid;
MultiXactStatus mstatus = members[i].status;
- Assert(!TransactionIdPrecedes(xid, cutoffs->relfrozenxid));
+ Assert(!TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid));
if (!ISUPDATE_from_mxstatus(mstatus))
{
@@ -6214,12 +6214,12 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdIsCurrentTransactionId(xid) ||
TransactionIdIsInProgress(xid))
{
- if (TransactionIdPrecedes(xid, cutoffs->OldestXmin))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains running locker XID %u from before removable cutoff %u",
multi, xid,
- cutoffs->OldestXmin)));
+ pagefrz->cutoffs->OldestXmin)));
newmembers[nnewmembers++] = members[i];
has_lockers = true;
}
@@ -6277,11 +6277,11 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* We determined that updater must be kept -- add it to pending new
* members list
*/
- if (TransactionIdPrecedes(xid, cutoffs->OldestXmin))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains committed update XID %u from before removable cutoff %u",
- multi, xid, cutoffs->OldestXmin)));
+ multi, xid, pagefrz->cutoffs->OldestXmin)));
newmembers[nnewmembers++] = members[i];
}
@@ -6373,7 +6373,6 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
*/
bool
heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen)
{
@@ -6401,14 +6400,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
xmin_already_frozen = true;
else
{
- if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found xmin %u from before relfrozenxid %u",
- xid, cutoffs->relfrozenxid)));
+ xid, pagefrz->cutoffs->relfrozenxid)));
/* Will set freeze_xmin flags in freeze plan below */
- freeze_xmin = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
+ freeze_xmin = TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin);
/* Verify that xmin committed if and when freeze plan is executed */
if (freeze_xmin)
@@ -6422,8 +6421,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
xid = HeapTupleHeaderGetXvac(tuple);
if (TransactionIdIsNormal(xid))
{
- Assert(TransactionIdPrecedesOrEquals(cutoffs->relfrozenxid, xid));
- Assert(TransactionIdPrecedes(xid, cutoffs->OldestXmin));
+ Assert(TransactionIdPrecedesOrEquals(pagefrz->cutoffs->relfrozenxid, xid));
+ Assert(TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin));
/*
* For Xvac, we always freeze proactively. This allows totally_frozen
@@ -6448,8 +6447,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* perform no-op xmax processing. The only constraint is that the
* FreezeLimit/MultiXactCutoff postcondition must never be violated.
*/
- newxmax = FreezeMultiXactId(xid, tuple->t_infomask, cutoffs,
- &flags, pagefrz);
+ newxmax = FreezeMultiXactId(xid, tuple->t_infomask, &flags, pagefrz);
if (flags & FRM_NOOP)
{
@@ -6472,7 +6470,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* (This repeats work from FreezeMultiXactId, but allows "no
* freeze" tracker maintenance to happen in only one place.)
*/
- Assert(!MultiXactIdPrecedes(newxmax, cutoffs->MultiXactCutoff));
+ Assert(!MultiXactIdPrecedes(newxmax, pagefrz->cutoffs->MultiXactCutoff));
Assert(MultiXactIdIsValid(newxmax) && xid == newxmax);
}
else if (flags & FRM_RETURN_IS_XID)
@@ -6481,7 +6479,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* xmax will become an updater Xid (original MultiXact's updater
* member Xid will be carried forward as a simple Xid in Xmax).
*/
- Assert(!TransactionIdPrecedes(newxmax, cutoffs->OldestXmin));
+ Assert(!TransactionIdPrecedes(newxmax, pagefrz->cutoffs->OldestXmin));
/*
* NB -- some of these transformations are only valid because we
@@ -6505,7 +6503,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* xmax is an old MultiXactId that we have to replace with a new
* MultiXactId, to carry forward two or more original member XIDs.
*/
- Assert(!MultiXactIdPrecedes(newxmax, cutoffs->OldestMxact));
+ Assert(!MultiXactIdPrecedes(newxmax, pagefrz->cutoffs->OldestMxact));
/*
* We can't use GetMultiXactIdHintBits directly on the new multi
@@ -6540,14 +6538,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
else if (TransactionIdIsNormal(xid))
{
/* Raw xmax is normal XID */
- if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found xmax %u from before relfrozenxid %u",
- xid, cutoffs->relfrozenxid)));
+ xid, pagefrz->cutoffs->relfrozenxid)));
/* Will set freeze_xmax flags in freeze plan below */
- freeze_xmax = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
+ freeze_xmax = TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin);
/*
* Verify that xmax aborted if and when freeze plan is executed,
@@ -6627,7 +6625,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* Does this tuple force caller to freeze the entire page?
*/
pagefrz->freeze_required =
- heap_tuple_should_freeze(tuple, cutoffs,
+ heap_tuple_should_freeze(tuple, pagefrz->cutoffs,
&pagefrz->NoFreezePageRelfrozenXid,
&pagefrz->NoFreezePageRelminMxid);
}
@@ -6949,8 +6947,9 @@ heap_freeze_tuple(HeapTupleHeader tuple,
pagefrz.NoFreezePageRelfrozenXid = FreezeLimit;
pagefrz.NoFreezePageRelminMxid = MultiXactCutoff;
- do_freeze = heap_prepare_freeze_tuple(tuple, &cutoffs,
- &pagefrz, &frz, &totally_frozen);
+ pagefrz.cutoffs = &cutoffs;
+
+ do_freeze = heap_prepare_freeze_tuple(tuple, &pagefrz, &frz, &totally_frozen);
/*
* Note that because this is not a WAL-logged operation, we don't need to
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f9892f4cd08..06e0e841582 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1442,6 +1442,7 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ pagefrz.cutoffs = &vacrel->cutoffs;
tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
@@ -1587,7 +1588,7 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
+ if (heap_prepare_freeze_tuple(htup, &pagefrz,
&frozen[tuples_frozen], &totally_frozen))
{
/* Save prepared freeze plan for later */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index d8e65ae7e35..297ba03bf09 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -189,6 +189,7 @@ typedef struct HeapPageFreeze
TransactionId NoFreezePageRelfrozenXid;
MultiXactId NoFreezePageRelminMxid;
+ struct VacuumCutoffs *cutoffs;
} HeapPageFreeze;
/*
@@ -295,7 +296,6 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
--
2.40.1
v4-0007-Prepare-freeze-tuples-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 470e695363d881d48d1fe839ea6770938f20d078 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 18 Mar 2024 20:01:38 -0400
Subject: [PATCH v4 07/19] Prepare freeze tuples in heap_page_prune()
In order to combine the freeze and prune records, we must determine
which tuples are freezable before actually executing pruning. All of the
page modifications should be made in the same critical section along
with emitting the combined WAL. Determine whether or not tuples should
or must be frozen and whether or not the page will be all frozen as a
consequence during pruning.
---
src/backend/access/heap/pruneheap.c | 41 +++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 68 +++++++---------------------
src/include/access/heapam.h | 12 +++++
3 files changed, 66 insertions(+), 55 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index bd30296ef1a..afc5ea5e0e7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -153,7 +153,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false,
+ heap_page_prune(relation, buffer, vistest, false, NULL,
&presult, NULL);
/*
@@ -206,6 +206,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED
* during pruning.
*
+ * pagefrz contains both input and output parameters used if the caller is
+ * interested in potentially freezing tuples on the page.
+ *
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
* heap_page_prune() is responsible for initializing it.
@@ -217,6 +220,7 @@ void
heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
OffsetNumber *off_loc)
{
@@ -247,11 +251,16 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ presult->nfrozen = 0;
/*
- * Keep track of whether or not the page is all_visible in case the caller
- * wants to use this information to update the VM.
+ * Caller will update the VM after pruning, collecting LP_DEAD items, and
+ * freezing tuples. Keep track of whether or not the page is all_visible
+ * and all_frozen and use this information to update the VM. all_visible
+ * implies lpdead_items == 0, but don't trust all_frozen result unless
+ * all_visible is also set to true.
*/
+ presult->all_frozen = true;
presult->all_visible = true;
/* for recovery conflicts */
presult->frz_conflict_horizon = InvalidTransactionId;
@@ -381,6 +390,32 @@ heap_page_prune(Relation relation, Buffer buffer,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
+
+ /*
+ * Consider freezing any normal tuples which will not be removed
+ */
+ if (presult->htsv[offnum] != HEAPTUPLE_DEAD && pagefrz)
+ {
+ bool totally_frozen;
+
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, pagefrz,
+ &presult->frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ presult->frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to
+ * become totally frozen (according to its freeze plan), then the
+ * page definitely cannot be set all-frozen in the visibility map
+ * later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+ }
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 06e0e841582..4187c998d25 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1416,16 +1416,13 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff;
ItemId itemid;
PruneResult presult;
- int tuples_frozen,
- lpdead_items,
+ int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1443,7 +1440,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
recently_dead_tuples = 0;
@@ -1461,31 +1457,20 @@ lazy_scan_prune(LVRelState *vacrel,
* false otherwise.
*/
heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
- &presult, &vacrel->offnum);
+ &pagefrz, &presult, &vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and check for tuples
* requiring freezing among remaining tuples with storage. We will update
* the VM after collecting LP_DEAD items and freezing tuples. Pruning will
- * have determined whether or not the page is all_visible. Keep track of
- * whether or not the page is all_frozen and use this information to
- * update the VM. all_visible implies lpdead_items == 0, but don't trust
- * all_frozen result unless all_visible is also set to true.
+ * have determined whether or not the page is all_visible and able to
+ * become all_frozen.
*
*/
- all_frozen = true;
-
- /*
- * Now scan the page to collect LP_DEAD items and update the variables set
- * just above.
- */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
- HeapTupleHeader htup;
- bool totally_frozen;
-
/*
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
@@ -1521,8 +1506,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(ItemIdIsNormal(itemid));
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
/*
* The criteria for counting a tuple as live in this block need to
* match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
@@ -1587,29 +1570,8 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
- /* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &pagefrz,
- &frozen[tuples_frozen], &totally_frozen))
- {
- /* Save prepared freeze plan for later */
- frozen[tuples_frozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the page
- * definitely cannot be set all-frozen in the visibility map later on
- */
- if (!totally_frozen)
- all_frozen = false;
}
- /*
- * We have now divided every item on the page into either an LP_DEAD item
- * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
- * that remains and needs to be considered for freezing now (LP_UNUSED and
- * LP_REDIRECT items also remain, but are of no further interest to us).
- */
vacrel->offnum = InvalidOffsetNumber;
/*
@@ -1618,8 +1580,8 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (presult.all_visible_except_removable && all_frozen &&
+ if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ (presult.all_visible_except_removable && presult.all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1629,7 +1591,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (tuples_frozen == 0)
+ if (presult.nfrozen == 0)
{
/*
* We have no freeze plans to execute, so there's no added cost
@@ -1657,7 +1619,7 @@ lazy_scan_prune(LVRelState *vacrel,
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (presult.all_visible_except_removable && all_frozen)
+ if (presult.all_visible_except_removable && presult.all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
snapshotConflictHorizon = presult.frz_conflict_horizon;
@@ -1673,7 +1635,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(vacrel->rel, buf,
snapshotConflictHorizon,
- frozen, tuples_frozen);
+ presult.frozen, presult.nfrozen);
}
}
else
@@ -1684,8 +1646,8 @@ lazy_scan_prune(LVRelState *vacrel,
*/
vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- all_frozen = false;
- tuples_frozen = 0; /* avoid miscounts in instrumentation */
+ presult.all_frozen = false;
+ presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1708,6 +1670,8 @@ lazy_scan_prune(LVRelState *vacrel,
&debug_cutoff, &debug_all_frozen))
Assert(false);
+ Assert(presult.all_frozen == debug_all_frozen);
+
Assert(!TransactionIdIsValid(debug_cutoff) ||
debug_cutoff == presult.frz_conflict_horizon);
}
@@ -1738,7 +1702,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += tuples_frozen;
+ vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
vacrel->live_tuples += live_tuples;
vacrel->recently_dead_tuples += recently_dead_tuples;
@@ -1761,7 +1725,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (all_frozen)
+ if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1832,7 +1796,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && presult.all_visible &&
- all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 297ba03bf09..2339abfd28a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -201,6 +201,11 @@ typedef struct PruneResult
int nnewlpdead; /* Number of newly LP_DEAD items */
bool all_visible; /* Whether or not the page is all visible */
bool all_visible_except_removable;
+ /* Whether or not the page can be set all frozen in the VM */
+ bool all_frozen;
+
+ /* Number of newly frozen tuples */
+ int nfrozen;
TransactionId frz_conflict_horizon; /* Newest xmin on the page */
/*
@@ -213,6 +218,12 @@ typedef struct PruneResult
* 1. Otherwise every access would need to subtract 1.
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneResult;
/*
@@ -324,6 +335,7 @@ extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer,
--
2.40.1
v4-0008-lazy_scan_prune-reorder-freeze-execution-logic.patchtext/x-diff; charset=us-asciiDownload
From 3c208039193ce94111e8ddc1b03828cf820e11e3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 19 Mar 2024 19:30:16 -0400
Subject: [PATCH v4 08/19] lazy_scan_prune reorder freeze execution logic
To combine the prune and freeze records, freezing must be done before a
pruning WAL record is emitted. We will move the freeze execution into
heap_page_prune() in future commits. lazy_scan_prune() currently
executes freezing, updates vacrel->NewRelfrozenXid and
vacrel->NewRelminMxid, and resets the snapshotConflictHorizon that the
visibility map update record may use in the same block of if statements.
This commit starts reordering that logic so that the freeze execution
can be separated from the other updates which should not be done in
pruning.
---
src/backend/access/heap/vacuumlazy.c | 92 +++++++++++++++-------------
1 file changed, 49 insertions(+), 43 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4187c998d25..74ebab25a95 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1421,6 +1421,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
+ bool do_freeze;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1580,10 +1581,15 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ do_freeze = pagefrz.freeze_required ||
(presult.all_visible_except_removable && presult.all_frozen &&
- fpi_before != pgWalUsage.wal_fpi))
+ presult.nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+
+ if (do_freeze)
{
+ TransactionId snapshotConflictHorizon;
+
/*
* We're freezing the page. Our final NewRelfrozenXid doesn't need to
* be affected by the XIDs that are just about to be frozen anyway.
@@ -1591,52 +1597,52 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (presult.nfrozen == 0)
- {
- /*
- * We have no freeze plans to execute, so there's no added cost
- * from following the freeze path. That's why it was chosen. This
- * is important in the case where the page only contains totally
- * frozen tuples at this point (perhaps only following pruning).
- * Such pages can be marked all-frozen in the VM by our caller,
- * even though none of its tuples were newly frozen here (note
- * that the "no freeze" path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter
- * here, since it only counts pages with newly frozen tuples
- * (don't confuse that with pages newly set all-frozen in VM).
- */
- }
+ vacrel->frozen_pages++;
+
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts when
+ * the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin.
+ */
+ if (presult.all_visible_except_removable && presult.all_frozen)
+ snapshotConflictHorizon = presult.frz_conflict_horizon;
else
{
- TransactionId snapshotConflictHorizon;
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ snapshotConflictHorizon = pagefrz.cutoffs->OldestXmin;
+ TransactionIdRetreat(snapshotConflictHorizon);
+ }
- vacrel->frozen_pages++;
+ /* Using same cutoff when setting VM is now unnecessary */
+ if (presult.all_visible_except_removable && presult.all_frozen)
+ presult.frz_conflict_horizon = InvalidTransactionId;
- /*
- * We can use frz_conflict_horizon as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM
- * once we're done with it. Otherwise we generate a conservative
- * cutoff by stepping back from OldestXmin.
- */
- if (presult.all_visible_except_removable && presult.all_frozen)
- {
- /* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = presult.frz_conflict_horizon;
- presult.frz_conflict_horizon = InvalidTransactionId;
- }
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(vacrel->rel, buf,
+ snapshotConflictHorizon,
+ presult.frozen, presult.nfrozen);
+ }
+ else if (presult.all_frozen && presult.nfrozen == 0)
+ {
+ /* Page should be all visible except to-be-removed tuples */
+ Assert(presult.all_visible_except_removable);
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
+ /*
+ * We have no freeze plans to execute, so there's no added cost from
+ * following the freeze path. That's why it was chosen. This is
+ * important in the case where the page only contains totally frozen
+ * tuples at this point (perhaps only following pruning). Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here (note that the "no freeze"
+ * path never sets pages all-frozen).
+ *
+ * We never increment the frozen_pages instrumentation counter here,
+ * since it only counts pages with newly frozen tuples (don't confuse
+ * that with pages newly set all-frozen in VM).
+ */
+ vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
+ vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
}
else
{
--
2.40.1
v4-0009-Execute-freezing-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From aec400aa0289ecb8cab6dfaeb8fee050db6487c3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 8 Mar 2024 16:45:57 -0500
Subject: [PATCH v4 09/19] Execute freezing in heap_page_prune()
As a step toward combining the prune and freeze WAL records, execute
freezing in heap_page_prune(). The logic to determine whether or not to
execute freeze plans was moved from lazy_scan_prune() over to
heap_page_prune() with little modification.
---
src/backend/access/heap/heapam_handler.c | 2 +-
src/backend/access/heap/pruneheap.c | 155 ++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 134 +++++---------------
src/backend/storage/ipc/procarray.c | 6 +-
src/include/access/heapam.h | 39 +++---
src/tools/pgindent/typedefs.list | 2 +-
6 files changed, 182 insertions(+), 156 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 680a50bf8b1..5e522f5b0ba 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1046,7 +1046,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
* We ignore unused and redirect line pointers. DEAD line pointers
* should be counted as dead, because we need vacuum to run to get rid
* of them. Note that this rule agrees with the way that
- * heap_page_prune() counts things.
+ * heap_page_prune_and_freeze() counts things.
*/
if (!ItemIdIsNormal(itemid))
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index afc5ea5e0e7..20907ba5408 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -17,16 +17,19 @@
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
+#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
+#include "commands/vacuum.h"
#include "access/xloginsert.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
-/* Working data for heap_page_prune and subroutines */
+/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
/* tuple visibility test, initialized for the relation */
@@ -51,6 +54,11 @@ typedef struct
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneState;
/* Local functions */
@@ -59,14 +67,15 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- PruneState *prstate, PruneResult *presult);
+ PruneState *prstate, PruneFreezeResult *presult);
+
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult);
+ PruneFreezeResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult);
+ PruneFreezeResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -146,15 +155,15 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
- PruneResult presult;
+ PruneFreezeResult presult;
/*
* For now, pass mark_unused_now as false regardless of whether or
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false, NULL,
- &presult, NULL);
+ heap_page_prune_and_freeze(relation, buffer, vistest, false, NULL,
+ &presult, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -188,7 +197,12 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
/*
- * Prune and repair fragmentation in the specified page.
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * If the page can be marked all-frozen in the visibility map, we may
+ * opportunistically freeze tuples on the page if either its tuples are old
+ * enough or freezing will be cheap enough.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -206,23 +220,24 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED
* during pruning.
*
- * pagefrz contains both input and output parameters used if the caller is
- * interested in potentially freezing tuples on the page.
+ * pagefrz is an input parameter containing visibility cutoff information and
+ * the current relfrozenxid and relminmxids used if the caller is interested in
+ * freezing tuples on the page.
*
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
- * heap_page_prune() is responsible for initializing it.
+ * heap_page_prune_and_freeze() is responsible for initializing it.
*
* off_loc is the offset location required by the caller to use in error
* callback.
*/
void
-heap_page_prune(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
- PruneResult *presult,
- OffsetNumber *off_loc)
+heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ GlobalVisState *vistest,
+ bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
+ PruneFreezeResult *presult,
+ OffsetNumber *off_loc)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -230,6 +245,8 @@ heap_page_prune(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
+ bool do_freeze;
+ int64 fpi_before = pgWalUsage.wal_fpi;
/*
* First, initialize the new pd_prune_xid value to zero (indicating no
@@ -265,6 +282,10 @@ heap_page_prune(Relation relation, Buffer buffer,
/* for recovery conflicts */
presult->frz_conflict_horizon = InvalidTransactionId;
+ /* For advancing relfrozenxid and relminmxid */
+ presult->new_relfrozenxid = InvalidTransactionId;
+ presult->new_relminmxid = InvalidMultiXactId;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -400,11 +421,11 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Tuple with storage -- consider need to freeze */
if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &presult->frozen[presult->nfrozen],
+ &prstate.frozen[presult->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- presult->frozen[presult->nfrozen++].offset = offnum;
+ prstate.frozen[presult->nfrozen++].offset = offnum;
}
/*
@@ -557,6 +578,72 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ */
+ if (pagefrz)
+ do_freeze = pagefrz->freeze_required ||
+ (presult->all_visible_except_removable && presult->all_frozen &&
+ presult->nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+ else
+ do_freeze = false;
+
+ if (do_freeze)
+ {
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts when
+ * the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin.
+ */
+ if (!(presult->all_visible_except_removable && presult->all_frozen))
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(presult->frz_conflict_horizon);
+ }
+
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(relation, buffer,
+ presult->frz_conflict_horizon,
+ prstate.frozen, presult->nfrozen);
+ }
+ else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
+ {
+ /*
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all frozen and there
+ * will be no newly frozen tuples.
+ */
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+
+ /* Caller won't update new_relfrozenxid and new_relminmxid */
+ if (!pagefrz)
+ return;
+
+ /*
+ * If we will freeze tuples on the page or, even if we don't freeze tuples
+ * on the page, if we will set the page all-frozen in the visibility map,
+ * we can advance relfrozenxid and relminmxid to the values in
+ * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
+ */
+ if (presult->all_frozen || presult->nfrozen > 0)
+ {
+ presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
+ }
+ else
+ {
+ presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
+ }
}
@@ -614,7 +701,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- PruneState *prstate, PruneResult *presult)
+ PruneState *prstate, PruneFreezeResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -879,10 +966,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
/*
* We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune caused us to
- * visit the dead successor of a redirect item before visiting the
- * redirect item. We can clean up by setting the redirect item to
- * DEAD state or LP_UNUSED if the caller indicated.
+ * item. This can happen if the loop in heap_page_prune_and_freeze()
+ * caused us to visit the dead successor of a redirect item before
+ * visiting the redirect item. We can clean up by setting the
+ * redirect item to DEAD state or LP_UNUSED if the caller indicated.
*/
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
@@ -922,7 +1009,7 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult)
+ PruneFreezeResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
@@ -945,7 +1032,7 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
*/
static void
heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult)
+ PruneFreezeResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
@@ -972,9 +1059,9 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
/*
- * Perform the actual page changes needed by heap_page_prune.
- * It is expected that the caller has a full cleanup lock on the
- * buffer.
+ * Perform the actual page pruning modifications needed by
+ * heap_page_prune_and_freeze(). It is expected that the caller has a full
+ * cleanup lock on the buffer.
*/
void
heap_page_prune_execute(Buffer buffer,
@@ -1088,11 +1175,11 @@ heap_page_prune_execute(Buffer buffer,
#ifdef USE_ASSERT_CHECKING
/*
- * When heap_page_prune() was called, mark_unused_now may have been
- * passed as true, which allows would-be LP_DEAD items to be made
- * LP_UNUSED instead. This is only possible if the relation has no
- * indexes. If there are any dead items, then mark_unused_now was not
- * true and every item being marked LP_UNUSED must refer to a
+ * When heap_page_prune_and_freeze() was called, mark_unused_now may
+ * have been passed as true, which allows would-be LP_DEAD items to be
+ * made LP_UNUSED instead. This is only possible if the relation has
+ * no indexes. If there are any dead items, then mark_unused_now was
+ * not true and every item being marked LP_UNUSED must refer to a
* heap-only tuple.
*/
if (ndead > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 74ebab25a95..c4553a4159c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,12 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* as an upper bound on the XIDs stored in the pages we'll actually scan
* (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
*
- * Next acquire vistest, a related cutoff that's used in heap_page_prune.
- * We expect vistest will always make heap_page_prune remove any deleted
- * tuple whose xmax is < OldestXmin. lazy_scan_prune must never become
- * confused about whether a tuple should be frozen or removed. (In the
- * future we might want to teach lazy_scan_prune to recompute vistest from
- * time to time, to increase the number of dead tuples it can prune away.)
+ * Next acquire vistest, a related cutoff that's used in
+ * heap_page_prune_and_freeze(). We expect vistest will always make
+ * heap_page_prune_and_freeze() remove any deleted tuple whose xmax is <
+ * OldestXmin. (In the future we might want to teach lazy_scan_prune to
+ * recompute vistest from time to time, to increase the number of dead
+ * tuples it can prune away.)
*/
vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
@@ -1378,21 +1378,21 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
- * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
- * whether or not a tuple should be considered DEAD. This happened when an
- * inserting transaction concurrently aborted (after our heap_page_prune()
- * call, before our HeapTupleSatisfiesVacuum() call). There was rather a lot
- * of complexity just so we could deal with tuples that were DEAD to VACUUM,
- * but nevertheless were left with storage after pruning.
+ * Prior to PostgreSQL 14 there were very rare cases where
+ * heap_page_prune_and_freeze() was allowed to disagree with our
+ * HeapTupleSatisfiesVacuum() call about whether or not a tuple should be
+ * considered DEAD. This happened when an inserting transaction concurrently
+ * aborted (after our heap_page_prune_and_freeze() call, before our
+ * HeapTupleSatisfiesVacuum() call). There was rather a lot of complexity just
+ * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
+ * left with storage after pruning.
*
* As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune()'s visibility check. Without the second call to
- * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
- * disagreement. We'll just handle such tuples as if they had become fully dead
- * right after this operation completes instead of in the middle of it. Note that
- * any tuple that becomes dead after the call to heap_page_prune() can't need to
- * be frozen, because it was visible to another session when vacuum started.
+ * result of heap_page_prune_and_freeze()'s visibility check. Without the
+ * second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
+ * there can be no disagreement. We'll just handle such tuples as if they had
+ * become fully dead right after this operation completes instead of in the
+ * middle of it.
*
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
@@ -1415,26 +1415,24 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
- PruneResult presult;
+ PruneFreezeResult presult;
int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool do_freeze;
- int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
/*
* maxoff might be reduced following line pointer array truncation in
- * heap_page_prune. That's safe for us to ignore, since the reclaimed
- * space will continue to look like LP_UNUSED items below.
+ * heap_page_prune_and_freeze(). That's safe for us to ignore, since the
+ * reclaimed space will continue to look like LP_UNUSED items below.
*/
maxoff = PageGetMaxOffsetNumber(page);
- /* Initialize (or reset) page-level state */
+ /* Initialize pagefrz */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
@@ -1446,7 +1444,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples = 0;
/*
- * Prune all HOT-update chains in this page.
+ * Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
* in presult.ndeleted. It should not be confused with lpdead_items;
@@ -1457,8 +1455,8 @@ lazy_scan_prune(LVRelState *vacrel,
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
* false otherwise.
*/
- heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
- &pagefrz, &presult, &vacrel->offnum);
+ heap_page_prune_and_freeze(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+ &pagefrz, &presult, &vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and check for tuples
@@ -1575,85 +1573,23 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = InvalidOffsetNumber;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- do_freeze = pagefrz.freeze_required ||
- (presult.all_visible_except_removable && presult.all_frozen &&
- presult.nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
+ Assert(MultiXactIdIsValid(presult.new_relminmxid));
+ vacrel->NewRelfrozenXid = presult.new_relfrozenxid;
+ Assert(TransactionIdIsValid(presult.new_relfrozenxid));
+ vacrel->NewRelminMxid = presult.new_relminmxid;
- if (do_freeze)
+ if (presult.nfrozen > 0)
{
- TransactionId snapshotConflictHorizon;
-
/*
- * We're freezing the page. Our final NewRelfrozenXid doesn't need to
- * be affected by the XIDs that are just about to be frozen anyway.
+ * We never increment the frozen_pages instrumentation counter when
+ * nfrozen == 0, since it only counts pages with newly frozen tuples
+ * (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
-
vacrel->frozen_pages++;
- /*
- * We can use frz_conflict_horizon as our cutoff for conflicts when
- * the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin.
- */
- if (presult.all_visible_except_removable && presult.all_frozen)
- snapshotConflictHorizon = presult.frz_conflict_horizon;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = pagefrz.cutoffs->OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
-
/* Using same cutoff when setting VM is now unnecessary */
- if (presult.all_visible_except_removable && presult.all_frozen)
+ if (presult.all_frozen)
presult.frz_conflict_horizon = InvalidTransactionId;
-
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
- else if (presult.all_frozen && presult.nfrozen == 0)
- {
- /* Page should be all visible except to-be-removed tuples */
- Assert(presult.all_visible_except_removable);
-
- /*
- * We have no freeze plans to execute, so there's no added cost from
- * following the freeze path. That's why it was chosen. This is
- * important in the case where the page only contains totally frozen
- * tuples at this point (perhaps only following pruning). Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here (note that the "no freeze"
- * path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter here,
- * since it only counts pages with newly frozen tuples (don't confuse
- * that with pages newly set all-frozen in VM).
- */
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- }
- else
- {
- /*
- * Page requires "no freeze" processing. It might be set all-visible
- * in the visibility map, but it can never be set all-frozen.
- */
- vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- presult.all_frozen = false;
- presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index b3cd248fb64..88a6d504dff 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1715,9 +1715,9 @@ TransactionIdIsActive(TransactionId xid)
* Note: the approximate horizons (see definition of GlobalVisState) are
* updated by the computations done here. That's currently required for
* correctness and a small optimization. Without doing so it's possible that
- * heap vacuum's call to heap_page_prune() uses a more conservative horizon
- * than later when deciding which tuples can be removed - which the code
- * doesn't expect (breaking HOT).
+ * heap vacuum's call to heap_page_prune_and_freeze() uses a more conservative
+ * horizon than later when deciding which tuples can be removed - which the
+ * code doesn't expect (breaking HOT).
*/
static void
ComputeXidHorizons(ComputeXidHorizonsResult *h)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2339abfd28a..b2a4caeb33a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -195,7 +195,7 @@ typedef struct HeapPageFreeze
/*
* Per-page state returned from pruning
*/
-typedef struct PruneResult
+typedef struct PruneFreezeResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
@@ -210,9 +210,10 @@ typedef struct PruneResult
/*
* Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
*
* This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
* 1. Otherwise every access would need to subtract 1.
@@ -220,17 +221,18 @@ typedef struct PruneResult
int8 htsv[MaxHeapTuplesPerPage + 1];
- /*
- * One entry for every tuple that we may freeze.
- */
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
-} PruneResult;
+ /* New value of relfrozenxid found by heap_page_prune_and_freeze() */
+ TransactionId new_relfrozenxid;
+
+ /* New value of relminmxid found by heap_page_prune_and_freeze() */
+ MultiXactId new_relminmxid;
+} PruneFreezeResult;
/*
* Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneResult.htsv for details. This helper function is meant to
- * guard against examining visibility status array members which have not yet
- * been computed.
+ * of int8. See PruneFreezeResult.htsv for details. This helper function is
+ * meant to guard against examining visibility status array members which have
+ * not yet been computed.
*/
static inline HTSV_Result
htsv_get_valid_status(int status)
@@ -306,6 +308,7 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
Buffer *buffer, struct TM_FailureData *tmfd);
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
@@ -332,12 +335,12 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
- PruneResult *presult,
- OffsetNumber *off_loc);
+extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ struct GlobalVisState *vistest,
+ bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
+ PruneFreezeResult *presult,
+ OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 042d04c8de2..b2ddc1e2549 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2179,7 +2179,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
-PruneResult
+PruneFreezeResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.40.1
v4-0010-Make-opp-freeze-heuristic-compatible-with-prune-f.patchtext/x-diff; charset=us-asciiDownload
From dd6bdad1253dc6c2e62cfe8144d62456ee7be2e8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 16:11:35 -0500
Subject: [PATCH v4 10/19] Make opp freeze heuristic compatible with
prune+freeze record
Once the prune and freeze records are combined, we will no longer be
able to use a test of whether or not pruning emitted an FPI to decide
whether or not to opportunistically freeze a freezable page.
While this heuristic should be improved, for now, approximate the
previous logic by keeping track of whether or not a hint bit FPI was
emitted during visibility checks (when checksums are on) and combine
that with checking XLogCheckBufferNeedsBackup(). If we just finished
deciding whether or not to prune and the current buffer seems to need an
FPI after modification, it is likely that pruning would have emitted an
FPI.
---
src/backend/access/heap/pruneheap.c | 57 +++++++++++++++++++++--------
1 file changed, 42 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 20907ba5408..9edf6bf72d7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -246,6 +246,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PruneState prstate;
HeapTupleData tup;
bool do_freeze;
+ bool do_prune;
+ bool whole_page_freezable;
+ bool hint_bit_fpi;
+ bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
/*
@@ -439,6 +443,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
}
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted. Then reset fpi_before for no prune case.
+ */
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ fpi_before = pgWalUsage.wal_fpi;
+
/*
* For vacuum, if the whole page will become frozen, we consider
* opportunistically freezing tuples. Dead tuples which will be removed by
@@ -483,11 +494,41 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = InvalidOffsetNumber;
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
+ /*
+ * Only incur overhead of checking if we will do an FPI if we might use
+ * the information.
+ */
+ if (do_prune && pagefrz)
+ prune_fpi = XLogCheckBufferNeedsBackup(buffer);
+
+ /* Is the whole page freezable? And is there something to freeze */
+ whole_page_freezable = presult->all_visible_except_removable &&
+ presult->all_frozen;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and prune
+ * records are combined, this heuristic couldn't be used anymore. The
+ * opportunistic freeze heuristic must be improved; however, for now, try
+ * to approximate it.
+ */
+ do_freeze = pagefrz &&
+ (pagefrz->freeze_required ||
+ (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
/* Have we found any prunable items? */
- if (prstate.nredirected > 0 || prstate.ndead > 0 || prstate.nunused > 0)
+ if (do_prune)
{
/*
* Apply the planned item changes, then repair page fragmentation, and
@@ -579,20 +620,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- if (pagefrz)
- do_freeze = pagefrz->freeze_required ||
- (presult->all_visible_except_removable && presult->all_frozen &&
- presult->nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
- else
- do_freeze = false;
-
if (do_freeze)
{
/*
--
2.40.1
v4-0011-Separate-tuple-pre-freeze-checks-and-invoke-earli.patchtext/x-diff; charset=us-asciiDownload
From 2f9a262a6b85e3bfc25f2c066634ccb9958529e3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 16:53:45 -0500
Subject: [PATCH v4 11/19] Separate tuple pre freeze checks and invoke earlier
When combining the prune and freeze records their critical sections will
have to be combined. heap_freeze_execute_prepared() does a set of pre
freeze validations before starting its critical section. Move these
validations into a helper function, heap_pre_freeze_checks(), and invoke
it in heap_page_prune() before the pruning critical section.
Also move up the calculation of the freeze snapshot conflict horizon.
---
src/backend/access/heap/heapam.c | 58 ++++++++++++++++-------------
src/backend/access/heap/pruneheap.c | 31 ++++++++-------
src/include/access/heapam.h | 3 ++
3 files changed, 54 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7261c4988d7..16e3f2520a4 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6659,35 +6659,19 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
- */
+* Perform xmin/xmax XID status sanity checks before calling
+* heap_freeze_execute_prepared().
+*
+* heap_prepare_freeze_tuple doesn't perform these checks directly because
+* pg_xact lookups are relatively expensive. They shouldn't be repeated
+* by successive VACUUMs that each decide against freezing the same page.
+*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- /*
- * Perform xmin/xmax XID status sanity checks before critical section.
- *
- * heap_prepare_freeze_tuple doesn't perform these checks directly because
- * pg_xact lookups are relatively expensive. They shouldn't be repeated
- * by successive VACUUMs that each decide against freezing the same page.
- */
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6726,6 +6710,30 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xmax)));
}
}
+}
+
+/*
+ * heap_freeze_execute_prepared
+ *
+ * Executes freezing of one or more heap tuples on a page on behalf of caller.
+ * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
+ * Caller must set 'offset' in each plan for us. Note that we destructively
+ * sort caller's tuples array in-place, so caller had better be done with it.
+ *
+ * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
+ * later on without any risk of unsafe pg_xact lookups, even following a hard
+ * crash (or when querying from a standby). We represent freezing by setting
+ * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
+ * See section on buffer access rules in src/backend/storage/buffer/README.
+ */
+void
+heap_freeze_execute_prepared(Relation rel, Buffer buffer,
+ TransactionId snapshotConflictHorizon,
+ HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
+
+ Assert(ntuples > 0);
START_CRIT_SECTION();
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9edf6bf72d7..87f99497865 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -524,6 +524,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
(pagefrz->freeze_required ||
(whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+ if (do_freeze)
+ {
+ heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
+
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts when
+ * the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin.
+ */
+ if (!(presult->all_visible_except_removable && presult->all_frozen))
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(presult->frz_conflict_horizon);
+ }
+ }
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -622,19 +640,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- /*
- * We can use frz_conflict_horizon as our cutoff for conflicts when
- * the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin.
- */
- if (!(presult->all_visible_except_removable && presult->all_frozen))
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
- TransactionIdRetreat(presult->frz_conflict_horizon);
- }
-
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(relation, buffer,
presult->frz_conflict_horizon,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b2a4caeb33a..02e33f213e1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -312,6 +312,9 @@ extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
+
+extern void heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
TransactionId snapshotConflictHorizon,
HeapTupleFreeze *tuples, int ntuples);
--
2.40.1
v4-0012-Remove-heap_freeze_execute_prepared.patchtext/x-diff; charset=us-asciiDownload
From 2a2e5407f8e6f8200dbda1b95cd3ec8d379282dd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 17:03:17 -0500
Subject: [PATCH v4 12/19] Remove heap_freeze_execute_prepared()
In order to merge freeze and prune records, the execution of tuple
freezing and the WAL logging of the changes to the page must be
separated so that the WAL logging can be combined with prune WAL
logging. This commit makes a helper for the tuple freezing and then
inlines the contents of heap_freeze_execute_prepared() where it is
called in heap_page_prune().
---
src/backend/access/heap/heapam.c | 79 +++++------------------------
src/backend/access/heap/pruneheap.c | 51 +++++++++++++++++--
src/include/access/heapam.h | 31 ++++++-----
3 files changed, 77 insertions(+), 84 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 16e3f2520a4..e47b56e7856 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -91,9 +91,6 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
ItemPointer ctid, TransactionId xid,
LockTupleMode mode);
-static int heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
- xl_heap_freeze_plan *plans_out,
- OffsetNumber *offsets_out);
static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
uint16 *new_infomask2);
static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
@@ -6343,9 +6340,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* XIDs or MultiXactIds that will need to be processed by a future VACUUM.
*
* VACUUM caller must assemble HeapTupleFreeze freeze plan entries for every
- * tuple that we returned true for, and call heap_freeze_execute_prepared to
- * execute freezing. Caller must initialize pagefrz fields for page as a
- * whole before first call here for each heap page.
+ * tuple that we returned true for, and then execute freezing. Caller must
+ * initialize pagefrz fields for page as a whole before first call here for
+ * each heap page.
*
* VACUUM caller decides on whether or not to freeze the page as a whole.
* We'll often prepare freeze plans for a page that caller just discards.
@@ -6659,8 +6656,8 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
-* Perform xmin/xmax XID status sanity checks before calling
-* heap_freeze_execute_prepared().
+* Perform xmin/xmax XID status sanity checks before actually executing freeze
+* plans.
*
* heap_prepare_freeze_tuple doesn't perform these checks directly because
* pg_xact lookups are relatively expensive. They shouldn't be repeated
@@ -6713,30 +6710,17 @@ heap_pre_freeze_checks(Buffer buffer,
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
+ * Helper which executes freezing of one or more heap tuples on a page on
+ * behalf of caller. Caller passes an array of tuple plans from
+ * heap_prepare_freeze_tuple. Caller must set 'offset' in each plan for us.
+ * Must be called in a critical section that also marks the buffer dirty and,
+ * if needed, emits WAL.
*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- START_CRIT_SECTION();
-
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6746,45 +6730,6 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
htup = (HeapTupleHeader) PageGetItem(page, itemid);
heap_execute_freeze_tuple(htup, frz);
}
-
- MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(rel))
- {
- xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
- OffsetNumber offsets[MaxHeapTuplesPerPage];
- int nplans;
- xl_heap_freeze_page xlrec;
- XLogRecPtr recptr;
-
- /* Prepare deduplicated representation for use in WAL record */
- nplans = heap_log_freeze_plan(tuples, ntuples, plans, offsets);
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(rel);
- xlrec.nplans = nplans;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapFreezePage);
-
- /*
- * The freeze plan array and offset array are not actually in the
- * buffer, but pretend that they are. When XLogInsert stores the
- * whole buffer, the arrays need not be stored too.
- */
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- XLogRegisterBufData(0, (char *) plans,
- nplans * sizeof(xl_heap_freeze_plan));
- XLogRegisterBufData(0, (char *) offsets,
- ntuples * sizeof(OffsetNumber));
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE_PAGE);
-
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
}
/*
@@ -6874,7 +6819,7 @@ heap_log_freeze_new_plan(xl_heap_freeze_plan *plan, HeapTupleFreeze *frz)
* (actually there is one array per freeze plan, but that's not of immediate
* concern to our caller).
*/
-static int
+int
heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
xl_heap_freeze_plan *plans_out,
OffsetNumber *offsets_out)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 87f99497865..7bd479cfd4e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -640,10 +640,53 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(relation, buffer,
- presult->frz_conflict_horizon,
- prstate.frozen, presult->nfrozen);
+ START_CRIT_SECTION();
+
+ Assert(presult->nfrozen > 0);
+
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+
+ MarkBufferDirty(buffer);
+
+ /* Now WAL-log freezing if necessary */
+ if (RelationNeedsWAL(relation))
+ {
+ xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
+ OffsetNumber offsets[MaxHeapTuplesPerPage];
+ int nplans;
+ xl_heap_freeze_page xlrec;
+ XLogRecPtr recptr;
+
+ /*
+ * Prepare deduplicated representation for use in WAL record
+ * Destructively sorts tuples array in-place.
+ */
+ nplans = heap_log_freeze_plan(prstate.frozen, presult->nfrozen, plans, offsets);
+
+ xlrec.snapshotConflictHorizon = presult->frz_conflict_horizon;
+ xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
+ xlrec.nplans = nplans;
+
+ XLogBeginInsert();
+ XLogRegisterData((char *) &xlrec, SizeOfHeapFreezePage);
+
+ /*
+ * The freeze plan array and offset array are not actually in the
+ * buffer, but pretend that they are. When XLogInsert stores the
+ * whole buffer, the arrays need not be stored too.
+ */
+ XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBufData(0, (char *) plans,
+ nplans * sizeof(xl_heap_freeze_plan));
+ XLogRegisterBufData(0, (char *) offsets,
+ presult->nfrozen * sizeof(OffsetNumber));
+
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE_PAGE);
+
+ PageSetLSN(page, recptr);
+ }
+
+ END_CRIT_SECTION();
}
else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
{
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 02e33f213e1..321a46185e1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -14,6 +14,7 @@
#ifndef HEAPAM_H
#define HEAPAM_H
+#include "access/heapam_xlog.h"
#include "access/relation.h" /* for backward compatibility */
#include "access/relscan.h"
#include "access/sdir.h"
@@ -101,8 +102,8 @@ typedef enum
} HTSV_Result;
/*
- * heap_prepare_freeze_tuple may request that heap_freeze_execute_prepared
- * check any tuple's to-be-frozen xmin and/or xmax status using pg_xact
+ * heap_prepare_freeze_tuple may request that any tuple's to-be-frozen xmin
+ * and/or xmax status is checked using pg_xact during freezing execution.
*/
#define HEAP_FREEZE_CHECK_XMIN_COMMITTED 0x01
#define HEAP_FREEZE_CHECK_XMAX_ABORTED 0x02
@@ -154,14 +155,14 @@ typedef struct HeapPageFreeze
/*
* "Freeze" NewRelfrozenXid/NewRelminMxid trackers.
*
- * Trackers used when heap_freeze_execute_prepared freezes, or when there
- * are zero freeze plans for a page. It is always valid for vacuumlazy.c
- * to freeze any page, by definition. This even includes pages that have
- * no tuples with storage to consider in the first place. That way the
- * 'totally_frozen' results from heap_prepare_freeze_tuple can always be
- * used in the same way, even when no freeze plans need to be executed to
- * "freeze the page". Only the "freeze" path needs to consider the need
- * to set pages all-frozen in the visibility map under this scheme.
+ * Trackers used when tuples will be frozen, or when there are zero freeze
+ * plans for a page. It is always valid for vacuumlazy.c to freeze any
+ * page, by definition. This even includes pages that have no tuples with
+ * storage to consider in the first place. That way the 'totally_frozen'
+ * results from heap_prepare_freeze_tuple can always be used in the same
+ * way, even when no freeze plans need to be executed to "freeze the
+ * page". Only the "freeze" path needs to consider the need to set pages
+ * all-frozen in the visibility map under this scheme.
*
* When we freeze a page, we generally freeze all XIDs < OldestXmin, only
* leaving behind XIDs that are ineligible for freezing, if any. And so
@@ -315,12 +316,16 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
extern void heap_pre_freeze_checks(Buffer buffer,
HeapTupleFreeze *tuples, int ntuples);
-extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples);
+
+extern void heap_freeze_prepared_tuples(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern bool heap_freeze_tuple(HeapTupleHeader tuple,
TransactionId relfrozenxid, TransactionId relminmxid,
TransactionId FreezeLimit, TransactionId MultiXactCutoff);
+
+extern int heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
+ xl_heap_freeze_plan *plans_out,
+ OffsetNumber *offsets_out);
extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
TransactionId *NoFreezePageRelfrozenXid,
--
2.40.1
v4-0013-Merge-prune-and-freeze-records.patchtext/x-diff; charset=us-asciiDownload
From b13f4d1d5fb4e8fcb3f97fe1f0043fdfaf319b4c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 17:55:31 -0500
Subject: [PATCH v4 13/19] Merge prune and freeze records
Eliminate xl_heap_freeze and XLOG_HEAP2_FREEZE record. When vacuum
freezes tuples, the information needed to replay those changes is
now recorded in the xl_heap_prune record.
When both pruning and freezing is done, this means a single, combined
WAL record is emitted for both operations. This will reduce the number
of WAL records emitted.
When there are only tuples to freeze present, we can avoid taking a full
cleanup lock when replaying the record.
The XLOG_HEAP2_PRUNE record is now bigger than it was previously and
bigger than the XLOG_HEAP2_FREEZE record. A future commit will
streamline the record.
---
src/backend/access/heap/heapam.c | 146 ++++------
src/backend/access/heap/pruneheap.c | 326 ++++++++++++-----------
src/backend/access/rmgrdesc/heapdesc.c | 95 ++++---
src/backend/replication/logical/decode.c | 1 -
src/include/access/heapam_xlog.h | 97 ++++---
5 files changed, 318 insertions(+), 347 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e47b56e7856..532868039d5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8706,8 +8706,6 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
/*
* Handles XLOG_HEAP2_PRUNE record type.
- *
- * Acquires a full cleanup lock.
*/
static void
heap_xlog_prune(XLogReaderState *record)
@@ -8718,12 +8716,22 @@ heap_xlog_prune(XLogReaderState *record)
RelFileLocator rlocator;
BlockNumber blkno;
XLogRedoAction action;
+ bool get_cleanup_lock;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
/*
- * We're about to remove tuples. In Hot Standby mode, ensure that there's
- * no queries running for which the removed tuples are still visible.
+ * If there are dead, redirected, or unused items set unused by
+ * heap_page_prune_and_freeze(), heap_page_prune_execute() will call
+ * PageRepairFragementation() which expects a full cleanup lock.
+ */
+ get_cleanup_lock = xlrec->nredirected > 0 ||
+ xlrec->ndead > 0 || xlrec->nunused > 0;
+
+ /*
+ * We are either about to remove tuples or freeze them. In Hot Standby
+ * mode, ensure that there's no queries running for which any removed
+ * tuples are still visible or which consider the frozen xids as running.
*/
if (InHotStandby)
ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
@@ -8731,38 +8739,69 @@ heap_xlog_prune(XLogReaderState *record)
rlocator);
/*
- * If we have a full-page image, restore it (using a cleanup lock) and
- * we're done.
+ * If we have a full-page image, restore it and we're done.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true,
- &buffer);
+ action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ get_cleanup_lock, &buffer);
+
if (action == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
- OffsetNumber *end;
OffsetNumber *redirected;
OffsetNumber *nowdead;
OffsetNumber *nowunused;
int nredirected;
int ndead;
int nunused;
+ int nplans;
Size datalen;
+ xl_heap_freeze_plan *plans;
+ OffsetNumber *frz_offsets;
+ int curoff = 0;
- redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
-
+ nplans = xlrec->nplans;
nredirected = xlrec->nredirected;
ndead = xlrec->ndead;
- end = (OffsetNumber *) ((char *) redirected + datalen);
+ nunused = xlrec->nunused;
+
+ plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, &datalen);
+ redirected = (OffsetNumber *) &plans[nplans];
nowdead = redirected + (nredirected * 2);
nowunused = nowdead + ndead;
- nunused = (end - nowunused);
- Assert(nunused >= 0);
+ frz_offsets = nowunused + nunused;
/* Update all line pointers per the record, and repair fragmentation */
- heap_page_prune_execute(buffer,
- redirected, nredirected,
- nowdead, ndead,
- nowunused, nunused);
+ if (nredirected > 0 || ndead > 0 || nunused > 0)
+ heap_page_prune_execute(buffer,
+ redirected, nredirected,
+ nowdead, ndead,
+ nowunused, nunused);
+
+ for (int p = 0; p < nplans; p++)
+ {
+ HeapTupleFreeze frz;
+
+ /*
+ * Convert freeze plan representation from WAL record into
+ * per-tuple format used by heap_execute_freeze_tuple
+ */
+ frz.xmax = plans[p].xmax;
+ frz.t_infomask2 = plans[p].t_infomask2;
+ frz.t_infomask = plans[p].t_infomask;
+ frz.frzflags = plans[p].frzflags;
+ frz.offset = InvalidOffsetNumber; /* unused, but be tidy */
+
+ for (int i = 0; i < plans[p].ntuples; i++)
+ {
+ OffsetNumber offset = frz_offsets[curoff++];
+ ItemId lp;
+ HeapTupleHeader tuple;
+
+ lp = PageGetItemId(page, offset);
+ tuple = (HeapTupleHeader) PageGetItem(page, lp);
+ heap_execute_freeze_tuple(tuple, &frz);
+ }
+ }
/*
* Note: we don't worry about updating the page's prunability hints.
@@ -9001,74 +9040,6 @@ heap_xlog_visible(XLogReaderState *record)
UnlockReleaseBuffer(vmbuffer);
}
-/*
- * Replay XLOG_HEAP2_FREEZE_PAGE records
- */
-static void
-heap_xlog_freeze_page(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_freeze_page *xlrec = (xl_heap_freeze_page *) XLogRecGetData(record);
- Buffer buffer;
-
- /*
- * In Hot Standby mode, ensure that there's no queries running which still
- * consider the frozen xids as running.
- */
- if (InHotStandby)
- {
- RelFileLocator rlocator;
-
- XLogRecGetBlockTag(record, 0, &rlocator, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->isCatalogRel,
- rlocator);
- }
-
- if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
- {
- Page page = BufferGetPage(buffer);
- xl_heap_freeze_plan *plans;
- OffsetNumber *offsets;
- int curoff = 0;
-
- plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
- offsets = (OffsetNumber *) ((char *) plans +
- (xlrec->nplans *
- sizeof(xl_heap_freeze_plan)));
- for (int p = 0; p < xlrec->nplans; p++)
- {
- HeapTupleFreeze frz;
-
- /*
- * Convert freeze plan representation from WAL record into
- * per-tuple format used by heap_execute_freeze_tuple
- */
- frz.xmax = plans[p].xmax;
- frz.t_infomask2 = plans[p].t_infomask2;
- frz.t_infomask = plans[p].t_infomask;
- frz.frzflags = plans[p].frzflags;
- frz.offset = InvalidOffsetNumber; /* unused, but be tidy */
-
- for (int i = 0; i < plans[p].ntuples; i++)
- {
- OffsetNumber offset = offsets[curoff++];
- ItemId lp;
- HeapTupleHeader tuple;
-
- lp = PageGetItemId(page, offset);
- tuple = (HeapTupleHeader) PageGetItem(page, lp);
- heap_execute_freeze_tuple(tuple, &frz);
- }
- }
-
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
- }
- if (BufferIsValid(buffer))
- UnlockReleaseBuffer(buffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -9975,9 +9946,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_VACUUM:
heap_xlog_vacuum(record);
break;
- case XLOG_HEAP2_FREEZE_PAGE:
- heap_xlog_freeze_page(record);
- break;
case XLOG_HEAP2_VISIBLE:
heap_xlog_visible(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7bd479cfd4e..19b50931b90 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -79,6 +79,9 @@ static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber o
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
+static void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ PruneState *prstate, PruneFreezeResult *presult);
+
/*
* Optionally prune and repair fragmentation in the specified page.
@@ -247,9 +250,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
+ bool do_hint;
bool whole_page_freezable;
bool hint_bit_fpi;
- bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
/*
@@ -445,10 +448,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
- * an FPI to be emitted. Then reset fpi_before for no prune case.
+ * an FPI to be emitted.
*/
hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
- fpi_before = pgWalUsage.wal_fpi;
/*
* For vacuum, if the whole page will become frozen, we consider
@@ -498,14 +500,18 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /* Record number of newly-set-LP_DEAD items for caller */
+ presult->nnewlpdead = prstate.ndead;
+
/*
- * Only incur overhead of checking if we will do an FPI if we might use
- * the information.
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
*/
- if (do_prune && pagefrz)
- prune_fpi = XLogCheckBufferNeedsBackup(buffer);
+ do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
- /* Is the whole page freezable? And is there something to freeze */
+ /* Is the whole page freezable? And is there something to freeze? */
whole_page_freezable = presult->all_visible_except_removable &&
presult->all_frozen;
@@ -520,43 +526,51 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* opportunistic freeze heuristic must be improved; however, for now, try
* to approximate it.
*/
- do_freeze = pagefrz &&
- (pagefrz->freeze_required ||
- (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+ do_freeze = false;
+ if (pagefrz)
+ {
+ if (pagefrz->freeze_required)
+ do_freeze = true;
+ else if (whole_page_freezable && presult->nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. In this case, we will
+ * freeze if we have already emitted an FPI or will do so anyway.
+ * Be sure only to incur the overhead of checking if we will do an
+ * FPI if we may use that information.
+ */
+ if (hint_bit_fpi ||
+ ((do_prune || do_hint) && XLogCheckBufferNeedsBackup(buffer)))
+ {
+ do_freeze = true;
+ }
+ }
+ }
+ /*
+ * Validate the tuples we are considering freezing. We do this even if
+ * pruning and hint bit setting have not emitted an FPI so far because we
+ * still may emit an FPI while setting the page hint bit later. But we
+ * want to avoid doing the pre-freeze checks in a critical section.
+ */
if (do_freeze)
- {
heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
+ if (!do_freeze && (!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
+ {
/*
- * We can use frz_conflict_horizon as our cutoff for conflicts when
- * the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin.
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all-frozen and there
+ * will be no newly frozen tuples.
*/
- if (!(presult->all_visible_except_removable && presult->all_frozen))
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
- TransactionIdRetreat(presult->frz_conflict_horizon);
- }
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
}
- /* Any error while applying the changes is critical */
START_CRIT_SECTION();
- /* Have we found any prunable items? */
- if (do_prune)
+ if (do_hint)
{
- /*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
- */
- heap_page_prune_execute(buffer,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
-
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
* XID of any soon-prunable tuple.
@@ -564,163 +578,159 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
/*
- * Also clear the "page is full" flag, since there's no point in
- * repeating the prune/defrag process until something else happens to
- * the page.
+ * Clear the "page is full" flag if it is set since there's no point
+ * in repeating the prune/defrag process until something else happens
+ * to the page.
*/
PageClearFull(page);
- MarkBufferDirty(buffer);
+ /*
+ * We only needed to update pd_prune_xid and clear the page-is-full
+ * hint bit, this is a non-WAL-logged hint. If we will also freeze or
+ * prune the page, we will mark the buffer dirty below.
+ */
+ if (!do_freeze && !do_prune)
+ MarkBufferDirtyHint(buffer, true);
+ }
+ if (do_prune || do_freeze)
+ {
/*
- * Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
+ * Apply the planned item changes, then repair page fragmentation, and
+ * update the page's hint bit about whether it has free line pointers.
*/
- if (RelationNeedsWAL(relation))
+ if (do_prune)
{
- xl_heap_prune xlrec;
- XLogRecPtr recptr;
-
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
- xlrec.nredirected = prstate.nredirected;
- xlrec.ndead = prstate.ndead;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
-
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ heap_page_prune_execute(buffer,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
+ }
+ if (do_freeze)
+ {
/*
- * The OffsetNumber arrays are not actually in the buffer, but we
- * pretend that they are. When XLogInsert stores the whole
- * buffer, the offset arrays need not be stored too.
+ * We can use frz_conflict_horizon as our cutoff for conflicts
+ * when the whole page is eligible to become all-frozen in the VM
+ * once we're done with it. Otherwise we generate a conservative
+ * cutoff by stepping back from OldestXmin. This avoids false
+ * conflicts when hot_standby_feedback is in use.
*/
- if (prstate.nredirected > 0)
- XLogRegisterBufData(0, (char *) prstate.redirected,
- prstate.nredirected *
- sizeof(OffsetNumber) * 2);
-
- if (prstate.ndead > 0)
- XLogRegisterBufData(0, (char *) prstate.nowdead,
- prstate.ndead * sizeof(OffsetNumber));
-
- if (prstate.nunused > 0)
- XLogRegisterBufData(0, (char *) prstate.nowunused,
- prstate.nunused * sizeof(OffsetNumber));
+ if (!(presult->all_visible_except_removable && presult->all_frozen))
+ {
+ presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(presult->frz_conflict_horizon);
+ }
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+ }
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+ MarkBufferDirty(buffer);
- PageSetLSN(BufferGetPage(buffer), recptr);
- }
- }
- else
- {
/*
- * If we didn't prune anything, but have found a new value for the
- * pd_prune_xid field, update it and mark the buffer dirty. This is
- * treated as a non-WAL-logged hint.
- *
- * Also clear the "page is full" flag if it is set, since there's no
- * point in repeating the prune/defrag process until something else
- * happens to the page.
+ * Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
*/
- if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page))
- {
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
- PageClearFull(page);
- MarkBufferDirtyHint(buffer, true);
- }
+ if (RelationNeedsWAL(relation))
+ log_heap_prune_and_freeze(relation, buffer, &prstate, presult);
}
END_CRIT_SECTION();
- /* Record number of newly-set-LP_DEAD items for caller */
- presult->nnewlpdead = prstate.ndead;
-
- if (do_freeze)
+ /*
+ * If we froze tuples on the page, the caller can advance relfrozenxid and
+ * relminmxid to the values in pagefrz->FreezePageRelfrozenXid and
+ * pagefrz->FreezePageRelminMxid. Otherwise, it is only safe to advance to
+ * the values in pagefrz->NoFreezePage[RelfrozenXid|RelminMxid]
+ */
+ if (pagefrz)
{
- START_CRIT_SECTION();
+ if (presult->nfrozen > 0)
+ {
+ presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
+ }
+ else
+ {
+ presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
+ }
+ }
+}
- Assert(presult->nfrozen > 0);
- heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+static void
+log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ PruneState *prstate, PruneFreezeResult *presult)
+{
+ xl_heap_prune xlrec;
+ XLogRecPtr recptr;
- MarkBufferDirty(buffer);
+ xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
+ OffsetNumber offsets[MaxHeapTuplesPerPage];
+ bool do_freeze = presult->nfrozen > 0;
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(relation))
- {
- xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
- OffsetNumber offsets[MaxHeapTuplesPerPage];
- int nplans;
- xl_heap_freeze_page xlrec;
- XLogRecPtr recptr;
+ xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
+ xlrec.nredirected = prstate->nredirected;
+ xlrec.ndead = prstate->ndead;
+ xlrec.nunused = prstate->nunused;
+ xlrec.nplans = 0;
- /*
- * Prepare deduplicated representation for use in WAL record
- * Destructively sorts tuples array in-place.
- */
- nplans = heap_log_freeze_plan(prstate.frozen, presult->nfrozen, plans, offsets);
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ if (do_freeze)
+ xlrec.snapshotConflictHorizon = Max(prstate->snapshotConflictHorizon,
+ presult->frz_conflict_horizon);
+ else
+ xlrec.snapshotConflictHorizon = prstate->snapshotConflictHorizon;
- xlrec.snapshotConflictHorizon = presult->frz_conflict_horizon;
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.nplans = nplans;
+ /*
+ * Prepare deduplicated representation for use in WAL record Destructively
+ * sorts tuples array in-place.
+ */
+ if (do_freeze)
+ xlrec.nplans = heap_log_freeze_plan(prstate->frozen,
+ presult->nfrozen, plans, offsets);
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapFreezePage);
+ XLogBeginInsert();
+ XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
- /*
- * The freeze plan array and offset array are not actually in the
- * buffer, but pretend that they are. When XLogInsert stores the
- * whole buffer, the arrays need not be stored too.
- */
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- XLogRegisterBufData(0, (char *) plans,
- nplans * sizeof(xl_heap_freeze_plan));
- XLogRegisterBufData(0, (char *) offsets,
- presult->nfrozen * sizeof(OffsetNumber));
+ XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE_PAGE);
+ /*
+ * The OffsetNumber arrays are not actually in the buffer, but we pretend
+ * that they are. When XLogInsert stores the whole buffer, the offset
+ * arrays need not be stored too.
+ */
+ if (xlrec.nplans > 0)
+ XLogRegisterBufData(0, (char *) plans,
+ xlrec.nplans * sizeof(xl_heap_freeze_plan));
- PageSetLSN(page, recptr);
- }
+ if (prstate->nredirected > 0)
+ XLogRegisterBufData(0, (char *) prstate->redirected,
+ prstate->nredirected *
+ sizeof(OffsetNumber) * 2);
- END_CRIT_SECTION();
- }
- else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
- {
- /*
- * If we will neither freeze tuples on the page nor set the page all
- * frozen in the visibility map, the page is not all frozen and there
- * will be no newly frozen tuples.
- */
- presult->all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumentation */
- }
+ if (prstate->ndead > 0)
+ XLogRegisterBufData(0, (char *) prstate->nowdead,
+ prstate->ndead * sizeof(OffsetNumber));
- /* Caller won't update new_relfrozenxid and new_relminmxid */
- if (!pagefrz)
- return;
+ if (prstate->nunused > 0)
+ XLogRegisterBufData(0, (char *) prstate->nowunused,
+ prstate->nunused * sizeof(OffsetNumber));
+ if (xlrec.nplans > 0)
+ XLogRegisterBufData(0, (char *) offsets,
+ presult->nfrozen * sizeof(OffsetNumber));
- /*
- * If we will freeze tuples on the page or, even if we don't freeze tuples
- * on the page, if we will set the page all-frozen in the visibility map,
- * we can advance relfrozenxid and relminmxid to the values in
- * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
- */
- if (presult->all_frozen || presult->nfrozen > 0)
- {
- presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
- presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
- }
- else
- {
- presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
- presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
- }
-}
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+ PageSetLSN(BufferGetPage(buffer), recptr);
+}
/*
* Perform visibility checks for heap pruning.
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 36a3d83c8c2..9f0a0341d40 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -179,43 +179,67 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
xl_heap_prune *xlrec = (xl_heap_prune *) rec;
- appendStringInfo(buf, "snapshotConflictHorizon: %u, nredirected: %u, ndead: %u, isCatalogRel: %c",
+ appendStringInfo(buf, "snapshotConflictHorizon: %u, isCatalogRel: %c",
xlrec->snapshotConflictHorizon,
- xlrec->nredirected,
- xlrec->ndead,
xlrec->isCatalogRel ? 'T' : 'F');
if (XLogRecHasBlockData(record, 0))
{
- OffsetNumber *end;
OffsetNumber *redirected;
OffsetNumber *nowdead;
OffsetNumber *nowunused;
int nredirected;
+ int ndead;
int nunused;
+ int nplans;
Size datalen;
+ xl_heap_freeze_plan *plans;
+ OffsetNumber *frz_offsets;
- redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0,
- &datalen);
-
+ nplans = xlrec->nplans;
nredirected = xlrec->nredirected;
- end = (OffsetNumber *) ((char *) redirected + datalen);
- nowdead = redirected + (nredirected * 2);
- nowunused = nowdead + xlrec->ndead;
- nunused = (end - nowunused);
- Assert(nunused >= 0);
+ ndead = xlrec->ndead;
+ nunused = xlrec->nunused;
- appendStringInfo(buf, ", nunused: %d", nunused);
-
- appendStringInfoString(buf, ", redirected:");
- array_desc(buf, redirected, sizeof(OffsetNumber) * 2,
- nredirected, &redirect_elem_desc, NULL);
- appendStringInfoString(buf, ", dead:");
- array_desc(buf, nowdead, sizeof(OffsetNumber), xlrec->ndead,
- &offset_elem_desc, NULL);
- appendStringInfoString(buf, ", unused:");
- array_desc(buf, nowunused, sizeof(OffsetNumber), nunused,
- &offset_elem_desc, NULL);
+ plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, &datalen);
+ redirected = (OffsetNumber *) &plans[nplans];
+ nowdead = redirected + (nredirected * 2);
+ nowunused = nowdead + ndead;
+ frz_offsets = nowunused + nunused;
+
+ appendStringInfo(buf, ", nredirected: %u, ndead: %u, nunused: %u, nplans: %u,",
+ nredirected,
+ ndead,
+ nunused,
+ nplans);
+
+ if (nredirected > 0)
+ {
+ appendStringInfoString(buf, ", redirected:");
+ array_desc(buf, redirected, sizeof(OffsetNumber) * 2,
+ nredirected, &redirect_elem_desc, NULL);
+ }
+
+ if (ndead > 0)
+ {
+ appendStringInfoString(buf, ", dead:");
+ array_desc(buf, nowdead, sizeof(OffsetNumber), ndead,
+ &offset_elem_desc, NULL);
+ }
+
+ if (nunused > 0)
+ {
+ appendStringInfoString(buf, ", unused:");
+ array_desc(buf, nowunused, sizeof(OffsetNumber), nunused,
+ &offset_elem_desc, NULL);
+ }
+
+ if (nplans > 0)
+ {
+ appendStringInfoString(buf, ", plans:");
+ array_desc(buf, plans, sizeof(xl_heap_freeze_plan), nplans,
+ &plan_elem_desc, &frz_offsets);
+ }
}
}
else if (info == XLOG_HEAP2_VACUUM)
@@ -235,28 +259,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
&offset_elem_desc, NULL);
}
}
- else if (info == XLOG_HEAP2_FREEZE_PAGE)
- {
- xl_heap_freeze_page *xlrec = (xl_heap_freeze_page *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, nplans: %u, isCatalogRel: %c",
- xlrec->snapshotConflictHorizon, xlrec->nplans,
- xlrec->isCatalogRel ? 'T' : 'F');
-
- if (XLogRecHasBlockData(record, 0))
- {
- xl_heap_freeze_plan *plans;
- OffsetNumber *offsets;
-
- plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
- offsets = (OffsetNumber *) ((char *) plans +
- (xlrec->nplans *
- sizeof(xl_heap_freeze_plan)));
- appendStringInfoString(buf, ", plans:");
- array_desc(buf, plans, sizeof(xl_heap_freeze_plan), xlrec->nplans,
- &plan_elem_desc, &offsets);
- }
- }
else if (info == XLOG_HEAP2_VISIBLE)
{
xl_heap_visible *xlrec = (xl_heap_visible *) rec;
@@ -361,9 +363,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_VACUUM:
id = "VACUUM";
break;
- case XLOG_HEAP2_FREEZE_PAGE:
- id = "FREEZE_PAGE";
- break;
case XLOG_HEAP2_VISIBLE:
id = "VISIBLE";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index e5ab7b78b78..f77051572fd 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -445,7 +445,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
* Everything else here is just low level physical stuff we're not
* interested in.
*/
- case XLOG_HEAP2_FREEZE_PAGE:
case XLOG_HEAP2_PRUNE:
case XLOG_HEAP2_VACUUM:
case XLOG_HEAP2_VISIBLE:
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 6488dad5e64..fe4a8ff0620 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -53,11 +53,10 @@
#define XLOG_HEAP2_REWRITE 0x00
#define XLOG_HEAP2_PRUNE 0x10
#define XLOG_HEAP2_VACUUM 0x20
-#define XLOG_HEAP2_FREEZE_PAGE 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
-#define XLOG_HEAP2_MULTI_INSERT 0x50
-#define XLOG_HEAP2_LOCK_UPDATED 0x60
-#define XLOG_HEAP2_NEW_CID 0x70
+#define XLOG_HEAP2_VISIBLE 0x30
+#define XLOG_HEAP2_MULTI_INSERT 0x40
+#define XLOG_HEAP2_LOCK_UPDATED 0x50
+#define XLOG_HEAP2_NEW_CID 0x60
/*
* xl_heap_insert/xl_heap_multi_insert flag values, 8 bits are available.
@@ -226,28 +225,65 @@ typedef struct xl_heap_update
#define SizeOfHeapUpdate (offsetof(xl_heap_update, new_offnum) + sizeof(OffsetNumber))
+/*
+ * This struct represents a 'freeze plan', which describes how to freeze a
+ * group of one or more heap tuples (appears in xl_heap_prune record)
+ */
+/* 0x01 was XLH_FREEZE_XMIN */
+#define XLH_FREEZE_XVAC 0x02
+#define XLH_INVALID_XVAC 0x04
+
+typedef struct xl_heap_freeze_plan
+{
+ TransactionId xmax;
+ uint16 t_infomask2;
+ uint16 t_infomask;
+ uint8 frzflags;
+
+ /* Length of individual page offset numbers array for this plan */
+ uint16 ntuples;
+} xl_heap_freeze_plan;
+
+/*
+ * As of Postgres 17, XLOG_HEAP2_PRUNE records replace
+ * XLOG_HEAP2_FREEZE_PAGE records.
+ */
+
/*
* This is what we need to know about page pruning (both during VACUUM and
* during opportunistic pruning)
*
* The array of OffsetNumbers following the fixed part of the record contains:
+ * * for each freeze plan: the freeze plan
* * for each redirected item: the item offset, then the offset redirected to
* * for each now-dead item: the item offset
* * for each now-unused item: the item offset
- * The total number of OffsetNumbers is therefore 2*nredirected+ndead+nunused.
- * Note that nunused is not explicitly stored, but may be found by reference
- * to the total record length.
+ * * for each tuple frozen by the freeze plans: the offset of the item corresponding to that tuple
+ * The total number of OffsetNumbers is therefore
+ * (2*nredirected) + ndead + nunused + (sum[plan.ntuples for plan in plans])
*
- * Acquires a full cleanup lock.
+ * Acquires a full cleanup lock if heap_page_prune_execute() must be called
*/
typedef struct xl_heap_prune
{
TransactionId snapshotConflictHorizon;
+ uint16 nplans;
uint16 nredirected;
uint16 ndead;
+ uint16 nunused;
bool isCatalogRel; /* to handle recovery conflict during logical
* decoding on standby */
- /* OFFSET NUMBERS are in the block reference 0 */
+ /*--------------------------------------------------------------------
+ * OFFSET NUMBERS and freeze plans are in the block reference 0 in the
+ * following order:
+ *
+ * * xl_heap_freeze_plan plans[nplans];
+ * * OffsetNumber redirected[2 * nredirected];
+ * * OffsetNumber nowdead[ndead];
+ * * OffsetNumber nowunused[nunused];
+ * * OffsetNumber frz_offsets[...];
+ *--------------------------------------------------------------------
+ */
} xl_heap_prune;
#define SizeOfHeapPrune (offsetof(xl_heap_prune, isCatalogRel) + sizeof(bool))
@@ -315,47 +351,6 @@ typedef struct xl_heap_inplace
#define SizeOfHeapInplace (offsetof(xl_heap_inplace, offnum) + sizeof(OffsetNumber))
-/*
- * This struct represents a 'freeze plan', which describes how to freeze a
- * group of one or more heap tuples (appears in xl_heap_freeze_page record)
- */
-/* 0x01 was XLH_FREEZE_XMIN */
-#define XLH_FREEZE_XVAC 0x02
-#define XLH_INVALID_XVAC 0x04
-
-typedef struct xl_heap_freeze_plan
-{
- TransactionId xmax;
- uint16 t_infomask2;
- uint16 t_infomask;
- uint8 frzflags;
-
- /* Length of individual page offset numbers array for this plan */
- uint16 ntuples;
-} xl_heap_freeze_plan;
-
-/*
- * This is what we need to know about a block being frozen during vacuum
- *
- * Backup block 0's data contains an array of xl_heap_freeze_plan structs
- * (with nplans elements), followed by one or more page offset number arrays.
- * Each such page offset number array corresponds to a single freeze plan
- * (REDO routine freezes corresponding heap tuples using freeze plan).
- */
-typedef struct xl_heap_freeze_page
-{
- TransactionId snapshotConflictHorizon;
- uint16 nplans;
- bool isCatalogRel; /* to handle recovery conflict during logical
- * decoding on standby */
-
- /*
- * In payload of blk 0 : FREEZE PLANS and OFFSET NUMBER ARRAY
- */
-} xl_heap_freeze_page;
-
-#define SizeOfHeapFreezePage (offsetof(xl_heap_freeze_page, isCatalogRel) + sizeof(bool))
-
/*
* This is what we need to know about setting a visibility map bit
*
--
2.40.1
v4-0014-Vacuum-second-pass-emits-XLOG_HEAP2_PRUNE-record.patchtext/x-diff; charset=us-asciiDownload
From 94f7f2fe6b6b7fdcbe4d60a157d87e607ef70dc0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 19 Mar 2024 18:50:33 -0400
Subject: [PATCH v4 14/19] Vacuum second pass emits XLOG_HEAP2_PRUNE record
Remove the XLOG_HEAP2_VACUUM record and update vacuum's second pass to
emit a XLOG_HEAP2_PRUNE record. This temporarily wastes some space but a
future commit will streamline xl_heap_prune and ensure that no unused
members are included in the WAL record.
---
src/backend/access/heap/heapam.c | 94 ++++--------------------
src/backend/access/heap/pruneheap.c | 67 ++++++++++-------
src/backend/access/heap/vacuumlazy.c | 12 ++-
src/backend/access/rmgrdesc/heapdesc.c | 20 -----
src/backend/replication/logical/decode.c | 1 -
src/include/access/heapam.h | 2 +-
src/include/access/heapam_xlog.h | 33 +++++----
7 files changed, 85 insertions(+), 144 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 532868039d5..16bab55ba02 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8717,23 +8717,34 @@ heap_xlog_prune(XLogReaderState *record)
BlockNumber blkno;
XLogRedoAction action;
bool get_cleanup_lock;
+ bool lp_truncate_only;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
+ lp_truncate_only = xlrec->flags & XLHP_LP_TRUNCATE_ONLY;
+
/*
* If there are dead, redirected, or unused items set unused by
* heap_page_prune_and_freeze(), heap_page_prune_execute() will call
* PageRepairFragementation() which expects a full cleanup lock.
*/
get_cleanup_lock = xlrec->nredirected > 0 ||
- xlrec->ndead > 0 || xlrec->nunused > 0;
+ xlrec->ndead > 0 ||
+ (xlrec->nunused > 0 && !lp_truncate_only);
+
+ if (lp_truncate_only)
+ {
+ Assert(xlrec->nredirected == 0);
+ Assert(xlrec->ndead == 0);
+ Assert(xlrec->nunused > 0);
+ }
/*
* We are either about to remove tuples or freeze them. In Hot Standby
* mode, ensure that there's no queries running for which any removed
* tuples are still visible or which consider the frozen xids as running.
*/
- if (InHotStandby)
+ if (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON && InHotStandby)
ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
xlrec->isCatalogRel,
rlocator);
@@ -8772,7 +8783,7 @@ heap_xlog_prune(XLogReaderState *record)
/* Update all line pointers per the record, and repair fragmentation */
if (nredirected > 0 || ndead > 0 || nunused > 0)
- heap_page_prune_execute(buffer,
+ heap_page_prune_execute(buffer, lp_truncate_only,
redirected, nredirected,
nowdead, ndead,
nowunused, nunused);
@@ -8819,7 +8830,7 @@ heap_xlog_prune(XLogReaderState *record)
UnlockReleaseBuffer(buffer);
/*
- * After pruning records from a page, it's useful to update the FSM
+ * After modifying records on a page, it's useful to update the FSM
* about it, as it may cause the page become target for insertions
* later even if vacuum decides not to visit it (which is possible if
* gets marked all-visible.)
@@ -8831,78 +8842,6 @@ heap_xlog_prune(XLogReaderState *record)
}
}
-/*
- * Handles XLOG_HEAP2_VACUUM record type.
- *
- * Acquires an ordinary exclusive lock only.
- */
-static void
-heap_xlog_vacuum(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_vacuum *xlrec = (xl_heap_vacuum *) XLogRecGetData(record);
- Buffer buffer;
- BlockNumber blkno;
- XLogRedoAction action;
-
- /*
- * If we have a full-page image, restore it (without using a cleanup lock)
- * and we're done.
- */
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, false,
- &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- Page page = (Page) BufferGetPage(buffer);
- OffsetNumber *nowunused;
- Size datalen;
- OffsetNumber *offnum;
-
- nowunused = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
-
- /* Shouldn't be a record unless there's something to do */
- Assert(xlrec->nunused > 0);
-
- /* Update all now-unused line pointers */
- offnum = nowunused;
- for (int i = 0; i < xlrec->nunused; i++)
- {
- OffsetNumber off = *offnum++;
- ItemId lp = PageGetItemId(page, off);
-
- Assert(ItemIdIsDead(lp) && !ItemIdHasStorage(lp));
- ItemIdSetUnused(lp);
- }
-
- /* Attempt to truncate line pointer array now */
- PageTruncateLinePointerArray(page);
-
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
- }
-
- if (BufferIsValid(buffer))
- {
- Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
- RelFileLocator rlocator;
-
- XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * After vacuuming LP_DEAD items from a page, it's useful to update
- * the FSM about it, as it may cause the page become target for
- * insertions later even if vacuum decides not to visit it (which is
- * possible if gets marked all-visible.)
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
- }
-}
-
/*
* Replay XLOG_HEAP2_VISIBLE record.
*
@@ -9943,9 +9882,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE:
heap_xlog_prune(record);
break;
- case XLOG_HEAP2_VACUUM:
- heap_xlog_vacuum(record);
- break;
case XLOG_HEAP2_VISIBLE:
heap_xlog_visible(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 19b50931b90..135fe2dba3e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -601,7 +601,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (do_prune)
{
- heap_page_prune_execute(buffer,
+ heap_page_prune_execute(buffer, false,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
@@ -668,12 +668,16 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber offsets[MaxHeapTuplesPerPage];
bool do_freeze = presult->nfrozen > 0;
+ xlrec.flags = 0;
+
xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
xlrec.nredirected = prstate->nredirected;
xlrec.ndead = prstate->ndead;
xlrec.nunused = prstate->nunused;
xlrec.nplans = 0;
+ xlrec.flags |= XLHP_HAS_CONFLICT_HORIZON;
+
/*
* The snapshotConflictHorizon for the whole record should be the most
* conservative of all the horizons calculated for any of the possible
@@ -1149,7 +1153,7 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
* cleanup lock on the buffer.
*/
void
-heap_page_prune_execute(Buffer buffer,
+heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
OffsetNumber *nowunused, int nunused)
@@ -1171,6 +1175,7 @@ heap_page_prune_execute(Buffer buffer,
ItemId tolp PG_USED_FOR_ASSERTS_ONLY;
#ifdef USE_ASSERT_CHECKING
+ Assert(!lp_truncate_only);
/*
* Any existing item that we set as an LP_REDIRECT (any 'from' item)
@@ -1226,6 +1231,7 @@ heap_page_prune_execute(Buffer buffer,
ItemId lp = PageGetItemId(page, off);
#ifdef USE_ASSERT_CHECKING
+ Assert(!lp_truncate_only);
/*
* An LP_DEAD line pointer must be left behind when the original item
@@ -1259,23 +1265,29 @@ heap_page_prune_execute(Buffer buffer,
#ifdef USE_ASSERT_CHECKING
- /*
- * When heap_page_prune_and_freeze() was called, mark_unused_now may
- * have been passed as true, which allows would-be LP_DEAD items to be
- * made LP_UNUSED instead. This is only possible if the relation has
- * no indexes. If there are any dead items, then mark_unused_now was
- * not true and every item being marked LP_UNUSED must refer to a
- * heap-only tuple.
- */
- if (ndead > 0)
+ if (lp_truncate_only)
{
- Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
- htup = (HeapTupleHeader) PageGetItem(page, lp);
- Assert(HeapTupleHeaderIsHeapOnly(htup));
+ /* Setting LP_DEAD to LP_UNUSED in vacuum's second pass */
+ Assert(ItemIdIsDead(lp) && !ItemIdHasStorage(lp));
}
else
{
- Assert(ItemIdIsUsed(lp));
+ /*
+ * When heap_page_prune_and_freeze() was called, mark_unused_now
+ * may have been passed as true, which allows would-be LP_DEAD
+ * items to be made LP_UNUSED instead. This is only possible if
+ * the relation has no indexes. If there are any dead items, then
+ * mark_unused_now was not true and every item being marked
+ * LP_UNUSED must refer to a heap-only tuple.
+ */
+ if (ndead > 0)
+ {
+ Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
+ htup = (HeapTupleHeader) PageGetItem(page, lp);
+ Assert(HeapTupleHeaderIsHeapOnly(htup));
+ }
+ else
+ Assert(ItemIdIsUsed(lp));
}
#endif
@@ -1283,17 +1295,22 @@ heap_page_prune_execute(Buffer buffer,
ItemIdSetUnused(lp);
}
- /*
- * Finally, repair any fragmentation, and update the page's hint bit about
- * whether it has free pointers.
- */
- PageRepairFragmentation(page);
+ if (lp_truncate_only)
+ PageTruncateLinePointerArray(page);
+ else
+ {
+ /*
+ * Finally, repair any fragmentation, and update the page's hint bit
+ * about whether it has free pointers.
+ */
+ PageRepairFragmentation(page);
- /*
- * Now that the page has been modified, assert that redirect items still
- * point to valid targets.
- */
- page_verify_redirects(page);
+ /*
+ * Now that the page has been modified, assert that redirect items
+ * still point to valid targets.
+ */
+ page_verify_redirects(page);
+ }
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c4553a4159c..9dfb56475cf 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2394,18 +2394,24 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* XLOG stuff */
if (RelationNeedsWAL(vacrel->rel))
{
- xl_heap_vacuum xlrec;
+ xl_heap_prune xlrec;
XLogRecPtr recptr;
+ xlrec.flags = XLHP_LP_TRUNCATE_ONLY;
+ xlrec.snapshotConflictHorizon = InvalidTransactionId;
+ xlrec.nplans = 0;
+ xlrec.nredirected = 0;
+ xlrec.ndead = 0;
xlrec.nunused = nunused;
+ xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(vacrel->rel);
XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapVacuum);
+ XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
XLogRegisterBufData(0, (char *) unused, nunused * sizeof(OffsetNumber));
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VACUUM);
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
PageSetLSN(page, recptr);
}
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 9f0a0341d40..ea03f902fc4 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -242,23 +242,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VACUUM)
- {
- xl_heap_vacuum *xlrec = (xl_heap_vacuum *) rec;
-
- appendStringInfo(buf, "nunused: %u", xlrec->nunused);
-
- if (XLogRecHasBlockData(record, 0))
- {
- OffsetNumber *nowunused;
-
- nowunused = (OffsetNumber *) XLogRecGetBlockData(record, 0, NULL);
-
- appendStringInfoString(buf, ", unused:");
- array_desc(buf, nowunused, sizeof(OffsetNumber), xlrec->nunused,
- &offset_elem_desc, NULL);
- }
- }
else if (info == XLOG_HEAP2_VISIBLE)
{
xl_heap_visible *xlrec = (xl_heap_visible *) rec;
@@ -360,9 +343,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE:
id = "PRUNE";
break;
- case XLOG_HEAP2_VACUUM:
- id = "VACUUM";
- break;
case XLOG_HEAP2_VISIBLE:
id = "VISIBLE";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index f77051572fd..38d1bdd825e 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -446,7 +446,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
* interested in.
*/
case XLOG_HEAP2_PRUNE:
- case XLOG_HEAP2_VACUUM:
case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 321a46185e1..d5cb8f99cac 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -349,7 +349,7 @@ extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapPageFreeze *pagefrz,
PruneFreezeResult *presult,
OffsetNumber *off_loc);
-extern void heap_page_prune_execute(Buffer buffer,
+extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
OffsetNumber *nowunused, int nunused);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index fe4a8ff0620..2393540cf68 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -52,11 +52,10 @@
*/
#define XLOG_HEAP2_REWRITE 0x00
#define XLOG_HEAP2_PRUNE 0x10
-#define XLOG_HEAP2_VACUUM 0x20
-#define XLOG_HEAP2_VISIBLE 0x30
-#define XLOG_HEAP2_MULTI_INSERT 0x40
-#define XLOG_HEAP2_LOCK_UPDATED 0x50
-#define XLOG_HEAP2_NEW_CID 0x60
+#define XLOG_HEAP2_VISIBLE 0x20
+#define XLOG_HEAP2_MULTI_INSERT 0x30
+#define XLOG_HEAP2_LOCK_UPDATED 0x40
+#define XLOG_HEAP2_NEW_CID 0x50
/*
* xl_heap_insert/xl_heap_multi_insert flag values, 8 bits are available.
@@ -266,6 +265,7 @@ typedef struct xl_heap_freeze_plan
*/
typedef struct xl_heap_prune
{
+ uint8 flags;
TransactionId snapshotConflictHorizon;
uint16 nplans;
uint16 nredirected;
@@ -288,19 +288,22 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, isCatalogRel) + sizeof(bool))
+/* Flags for xl_heap_prune */
+
/*
- * The vacuum page record is similar to the prune record, but can only mark
- * already LP_DEAD items LP_UNUSED (during VACUUM's second heap pass)
- *
- * Acquires an ordinary exclusive lock only.
+ * During vacuum's second pass which sets LP_DEAD items LP_UNUSED, we will only
+ * truncate the line pointer array, not call PageRepairFragmentation. We need
+ * this flag to differentiate what kind of lock (exclusive or cleanup) to take
+ * on the buffer and whether to call PageTruncateLinePointerArray() or
+ * PageRepairFragementation().
*/
-typedef struct xl_heap_vacuum
-{
- uint16 nunused;
- /* OFFSET NUMBERS are in the block reference 0 */
-} xl_heap_vacuum;
+#define XLHP_LP_TRUNCATE_ONLY (1 << 1)
-#define SizeOfHeapVacuum (offsetof(xl_heap_vacuum, nunused) + sizeof(uint16))
+/*
+ * Vacuum's first pass and on-access pruning may need to include a snapshot
+ * conflict horizon.
+ */
+#define XLHP_HAS_CONFLICT_HORIZON (1 << 2)
/* flags for infobits_set */
#define XLHL_XMAX_IS_MULTI 0x01
--
2.40.1
v4-0015-Set-hastup-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 77a8d20e1c332444d1bde1a41682186713e51e5f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 18 Mar 2024 20:12:18 -0400
Subject: [PATCH v4 15/19] Set hastup in heap_page_prune
lazy_scan_prune() loops through the line pointers and tuple visibility
information for each tuple on a page, setting hastup to true if there
are any LP_REDIRECT line pointers or tuples with storage which will not
be removed. We want to remove this extra loop from lazy_scan_prune(),
and we know about non-removable tuples during heap_page_prune() anyway.
Set hastup when recording LP_REDIRECT line pointers in
heap_prune_chain() and when LP_NORMAL line pointers refer to tuples
whose visibility status is not HEAPTUPLE_DEAD.
---
src/backend/access/heap/pruneheap.c | 64 ++++++++++++++++++----------
src/backend/access/heap/vacuumlazy.c | 24 +----------
src/include/access/heapam.h | 2 +
3 files changed, 45 insertions(+), 45 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 135fe2dba3e..d183912a402 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -71,7 +71,8 @@ static int heap_prune_chain(Buffer buffer,
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum);
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneFreezeResult *presult);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
PruneFreezeResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
@@ -277,6 +278,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->nnewlpdead = 0;
presult->nfrozen = 0;
+ presult->hastup = false;
+
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
* freezing tuples. Keep track of whether or not the page is all_visible
@@ -419,30 +422,42 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
}
- /*
- * Consider freezing any normal tuples which will not be removed
- */
- if (presult->htsv[offnum] != HEAPTUPLE_DEAD && pagefrz)
+ if (presult->htsv[offnum] != HEAPTUPLE_DEAD)
{
- bool totally_frozen;
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the
+ * soft assumption that any LP_DEAD items encountered here will
+ * become LP_UNUSED later on, before count_nondeletable_pages is
+ * reached. If we don't make this assumption then rel truncation
+ * will only happen every other VACUUM, at most. Besides, VACUUM
+ * must treat hastup/nonempty_pages as provisional no matter how
+ * LP_DEAD items are handled (handled here, or handled later on).
+ */
+ presult->hastup = true;
- /* Tuple with storage -- consider need to freeze */
- if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &prstate.frozen[presult->nfrozen],
- &totally_frozen)))
+ /* Consider freezing any normal tuples which will not be removed */
+ if (pagefrz)
{
- /* Save prepared freeze plan for later */
- prstate.frozen[presult->nfrozen++].offset = offnum;
- }
+ bool totally_frozen;
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the
- * page definitely cannot be set all-frozen in the visibility map
- * later on
- */
- if (!totally_frozen)
- presult->all_frozen = false;
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, pagefrz,
+ &prstate.frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ prstate.frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or
+ * eligible to become totally frozen (according to its freeze
+ * plan), then the page definitely cannot be set all-frozen in
+ * the visibility map later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+ }
}
}
@@ -1049,7 +1064,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (i >= nchain)
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
- heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
+ heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], presult);
}
else if (nchain < 2 && ItemIdIsRedirected(rootlp))
{
@@ -1083,7 +1098,8 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
/* Record line pointer to be redirected */
static void
heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum)
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneFreezeResult *presult)
{
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
@@ -1093,6 +1109,8 @@ heap_prune_record_redirect(PruneState *prstate,
prstate->marked[offnum] = true;
Assert(!prstate->marked[rdoffnum]);
prstate->marked[rdoffnum] = true;
+
+ presult->hastup = true;
}
/* Record line pointer to be marked dead */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9dfb56475cf..370721a619a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1420,7 +1420,6 @@ lazy_scan_prune(LVRelState *vacrel,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
- bool hastup = false;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1477,28 +1476,12 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- if (!ItemIdIsUsed(itemid))
- continue;
-
/* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid))
- {
- /* page makes rel truncation unsafe */
- hastup = true;
+ if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
continue;
- }
if (ItemIdIsDead(itemid))
{
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- */
deadoffsets[lpdead_items++] = offnum;
continue;
}
@@ -1566,9 +1549,6 @@ lazy_scan_prune(LVRelState *vacrel,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
-
- hastup = true; /* page makes rel truncation unsafe */
-
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1650,7 +1630,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->recently_dead_tuples += recently_dead_tuples;
/* Can't truncate this page */
- if (hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index d5cb8f99cac..25dbae8139e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -202,6 +202,8 @@ typedef struct PruneFreezeResult
int nnewlpdead; /* Number of newly LP_DEAD items */
bool all_visible; /* Whether or not the page is all visible */
bool all_visible_except_removable;
+ bool hastup; /* Does page make rel truncation unsafe */
+
/* Whether or not the page can be set all frozen in the VM */
bool all_frozen;
--
2.40.1
v4-0016-Count-tuples-for-vacuum-logging-in-heap_page_prun.patchtext/x-diff; charset=us-asciiDownload
From f9356564f651fa9f91e99adbbac50914601e3551 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Jan 2024 17:25:56 -0500
Subject: [PATCH v4 16/19] Count tuples for vacuum logging in heap_page_prune
lazy_scan_prune() loops through all of the tuple visibility information
that was recorded in heap_page_prune() and then counts live and recently
dead tuples. That information is available in heap_page_prune(), so just
record it there. Add live and recently dead tuple counters to the
PruneResult. Doing this counting in heap_page_prune() eliminates the
need for saving the tuple visibility status information in the
PruneResult. Instead, save it in the PruneState where it can be
referenced by heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 99 ++++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 93 +-------------------------
src/include/access/heapam.h | 29 +-------
3 files changed, 93 insertions(+), 128 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d183912a402..f59b03222b0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -55,6 +55,18 @@ typedef struct
*/
bool marked[MaxHeapTuplesPerPage + 1];
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
+
/*
* One entry for every tuple that we may freeze.
*/
@@ -69,6 +81,7 @@ static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
PruneState *prstate, PruneFreezeResult *presult);
+static inline HTSV_Result htsv_get_valid_status(int status);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
@@ -269,7 +282,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
/*
- * presult->htsv is not initialized here because all ntuple spots in the
+ * prstate.htsv is not initialized here because all ntuple spots in the
* array will be set either to a valid HTSV_Result value or -1.
*/
memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -280,6 +293,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = false;
+ presult->live_tuples = 0;
+ presult->recently_dead_tuples = 0;
+
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
* freezing tuples. Keep track of whether or not the page is all_visible
@@ -329,7 +345,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsNormal(itemid))
{
- presult->htsv[offnum] = -1;
+ prstate.htsv[offnum] = -1;
continue;
}
@@ -345,9 +361,29 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
- presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
- switch (presult->htsv[offnum])
+ prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
+
+ /*
+ * The criteria for counting a tuple as live in this block need to
+ * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
+ * and ANALYZE may produce wildly different reltuples values, e.g.
+ * when there are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as
+ * VACUUM can't run inside a transaction block, which makes some cases
+ * impossible (e.g. in-progress insert from the same transaction).
+ *
+ * We treat LP_DEAD items (which are the closest thing to DEAD tuples
+ * that might be seen here) differently, too: we assume that they'll
+ * become LP_UNUSED before VACUUM finishes. This difference is only
+ * superficial. VACUUM effectively agrees with ANALYZE about DEAD
+ * items, in the end. VACUUM won't remember LP_DEAD items, but only
+ * because they're not supposed to be left behind when it is done.
+ * (Cases where we bypass index vacuuming will violate this optimistic
+ * assumption, but the overall impact of that should be negligible.)
+ */
+ switch (prstate.htsv[offnum])
{
case HEAPTUPLE_DEAD:
@@ -367,6 +403,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
case HEAPTUPLE_LIVE:
+ /*
+ * Count it as live. Not only is this natural, but it's also
+ * what acquire_sample_rows() does.
+ */
+ presult->live_tuples++;
+
/*
* Is the tuple definitely visible to all transactions?
*
@@ -408,13 +450,34 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
+
+ /*
+ * If tuple is recently dead then we must not remove it from
+ * the relation. (We only remove items that are LP_DEAD from
+ * pruning.)
+ */
+ presult->recently_dead_tuples++;
presult->all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and
+ * we assume that will happen only after we report our
+ * results. This assumption is a bit shaky, but it is what
+ * acquire_sample_rows() does, so be consistent.
+ */
presult->all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
+
+ /*
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
+ */
+ presult->live_tuples++;
presult->all_visible = false;
break;
default:
@@ -422,7 +485,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
}
- if (presult->htsv[offnum] != HEAPTUPLE_DEAD)
+ if (prstate.htsv[offnum] != HEAPTUPLE_DEAD)
{
/*
* Deliberately don't set hastup for LP_DEAD items. We make the
@@ -772,10 +835,24 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
}
+/*
+ * Pruning calculates tuple visibility once and saves the results in an array
+ * of int8. See PruneState.htsv for details. This helper function is meant to
+ * guard against examining visibility status array members which have not yet
+ * been computed.
+ */
+static inline HTSV_Result
+htsv_get_valid_status(int status)
+{
+ Assert(status >= HEAPTUPLE_DEAD &&
+ status <= HEAPTUPLE_DELETE_IN_PROGRESS);
+ return (HTSV_Result) status;
+}
+
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in presult->htsv.
+ * Tuple visibility information is provided in prstate->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -826,7 +903,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(presult->htsv[rootoffnum] != -1);
+ Assert(prstate->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -849,7 +926,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -950,7 +1027,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(presult->htsv[offnum]))
+ switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 370721a619a..a3c971cd26d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1378,22 +1378,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where
- * heap_page_prune_and_freeze() was allowed to disagree with our
- * HeapTupleSatisfiesVacuum() call about whether or not a tuple should be
- * considered DEAD. This happened when an inserting transaction concurrently
- * aborted (after our heap_page_prune_and_freeze() call, before our
- * HeapTupleSatisfiesVacuum() call). There was rather a lot of complexity just
- * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
- * left with storage after pruning.
- *
- * As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune_and_freeze()'s visibility check. Without the
- * second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
- * there can be no disagreement. We'll just handle such tuples as if they had
- * become fully dead right after this operation completes instead of in the
- * middle of it.
- *
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
@@ -1415,10 +1399,8 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
+ int lpdead_items = 0;
PruneFreezeResult presult;
- int lpdead_items,
- live_tuples,
- recently_dead_tuples;
HeapPageFreeze pagefrz;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1438,9 +1420,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- lpdead_items = 0;
- live_tuples = 0;
- recently_dead_tuples = 0;
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1476,9 +1455,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- /* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
- continue;
if (ItemIdIsDead(itemid))
{
@@ -1486,69 +1462,6 @@ lazy_scan_prune(LVRelState *vacrel,
continue;
}
- Assert(ItemIdIsNormal(itemid));
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (htsv_get_valid_status(presult.htsv[offnum]))
- {
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- live_tuples++;
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- recently_dead_tuples++;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This an expected case during concurrent vacuum. Count such
- * rows as live. As above, we assume the deleting transaction
- * will commit and update the counters after we report.
- */
- live_tuples++;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1626,8 +1539,8 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
+ vacrel->live_tuples += presult.live_tuples;
+ vacrel->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 25dbae8139e..23ba23b5b01 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,6 +198,8 @@ typedef struct HeapPageFreeze
*/
typedef struct PruneFreezeResult
{
+ int live_tuples;
+ int recently_dead_tuples;
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
bool all_visible; /* Whether or not the page is all visible */
@@ -211,19 +213,6 @@ typedef struct PruneFreezeResult
int nfrozen;
TransactionId frz_conflict_horizon; /* Newest xmin on the page */
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
- * details. This is of type int8[], instead of HTSV_Result[], so we can
- * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
- * items.
- *
- * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
- * 1. Otherwise every access would need to subtract 1.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
-
-
/* New value of relfrozenxid found by heap_page_prune_and_freeze() */
TransactionId new_relfrozenxid;
@@ -231,20 +220,6 @@ typedef struct PruneFreezeResult
MultiXactId new_relminmxid;
} PruneFreezeResult;
-/*
- * Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneFreezeResult.htsv for details. This helper function is
- * meant to guard against examining visibility status array members which have
- * not yet been computed.
- */
-static inline HTSV_Result
-htsv_get_valid_status(int status)
-{
- Assert(status >= HEAPTUPLE_DEAD &&
- status <= HEAPTUPLE_DELETE_IN_PROGRESS);
- return (HTSV_Result) status;
-}
-
/* ----------------
* function prototypes for heap access method
*
--
2.40.1
v4-0017-Save-dead-tuple-offsets-during-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From a322901bb7fa86a0a7bc64b339d52c1446488915 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Jan 2024 16:55:28 -0500
Subject: [PATCH v4 17/19] Save dead tuple offsets during heap_page_prune
After heap_page_prune() returned, lazy_scan_prune() looped through all
of the offsets of LP_DEAD items which it later added to
LVRelState->dead_items. Instead take care of this when marking a line
pointer or when an existing non-removable LP_DEAD item is encountered in
heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 7 ++++
src/backend/access/heap/vacuumlazy.c | 61 ++++++----------------------
src/include/access/heapam.h | 2 +
3 files changed, 22 insertions(+), 48 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f59b03222b0..5decb1127d0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -295,6 +295,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->live_tuples = 0;
presult->recently_dead_tuples = 0;
+ presult->lpdead_items = 0;
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
@@ -1002,7 +1003,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
+ {
presult->all_visible = false;
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
+ }
break;
}
@@ -1206,6 +1210,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
* all_visible.
*/
presult->all_visible = false;
+
+ /* Record the dead offset for vacuum */
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a3c971cd26d..295846b854f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1396,23 +1396,11 @@ lazy_scan_prune(LVRelState *vacrel,
bool *has_lpdead_items)
{
Relation rel = vacrel->rel;
- OffsetNumber offnum,
- maxoff;
- ItemId itemid;
- int lpdead_items = 0;
PruneFreezeResult presult;
HeapPageFreeze pagefrz;
- OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
- /*
- * maxoff might be reduced following line pointer array truncation in
- * heap_page_prune_and_freeze(). That's safe for us to ignore, since the
- * reclaimed space will continue to look like LP_UNUSED items below.
- */
- maxoff = PageGetMaxOffsetNumber(page);
-
/* Initialize pagefrz */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
@@ -1425,9 +1413,9 @@ lazy_scan_prune(LVRelState *vacrel,
* Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
- * in presult.ndeleted. It should not be confused with lpdead_items;
- * lpdead_items's final value can be thought of as the number of tuples
- * that were deleted from indexes.
+ * in presult.ndeleted. It should not be confused with
+ * presult.lpdead_items; presult.lpdead_items's final value can be thought
+ * of as the number of tuples that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
@@ -1437,33 +1425,10 @@ lazy_scan_prune(LVRelState *vacrel,
&pagefrz, &presult, &vacrel->offnum);
/*
- * Now scan the page to collect LP_DEAD items and check for tuples
- * requiring freezing among remaining tuples with storage. We will update
- * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
- * have determined whether or not the page is all_visible and able to
- * become all_frozen.
- *
+ * We will update the VM after collecting LP_DEAD items and freezing
+ * tuples. Pruning will have determined whether or not the page is
+ * all_visible.
*/
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
- {
- /*
- * Set the offset number so that we can display it along with any
- * error that occurred while processing this tuple.
- */
- vacrel->offnum = offnum;
- itemid = PageGetItemId(page, offnum);
-
-
- if (ItemIdIsDead(itemid))
- {
- deadoffsets[lpdead_items++] = offnum;
- continue;
- }
-
- }
-
vacrel->offnum = InvalidOffsetNumber;
Assert(MultiXactIdIsValid(presult.new_relminmxid));
@@ -1499,7 +1464,7 @@ lazy_scan_prune(LVRelState *vacrel,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(lpdead_items == 0);
+ Assert(presult.lpdead_items == 0);
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
@@ -1515,7 +1480,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
- if (lpdead_items > 0)
+ if (presult.lpdead_items > 0)
{
VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
@@ -1524,9 +1489,9 @@ lazy_scan_prune(LVRelState *vacrel,
ItemPointerSetBlockNumber(&tmp, blkno);
- for (int i = 0; i < lpdead_items; i++)
+ for (int i = 0; i < presult.lpdead_items; i++)
{
- ItemPointerSetOffsetNumber(&tmp, deadoffsets[i]);
+ ItemPointerSetOffsetNumber(&tmp, presult.deadoffsets[i]);
dead_items->items[dead_items->num_items++] = tmp;
}
@@ -1538,7 +1503,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += lpdead_items;
+ vacrel->lpdead_items += presult.lpdead_items;
vacrel->live_tuples += presult.live_tuples;
vacrel->recently_dead_tuples += presult.recently_dead_tuples;
@@ -1547,7 +1512,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
- *has_lpdead_items = (lpdead_items > 0);
+ *has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
@@ -1615,7 +1580,7 @@ lazy_scan_prune(LVRelState *vacrel,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (lpdead_items > 0 && PageIsAllVisible(page))
+ else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 23ba23b5b01..bdde17eb230 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -218,6 +218,8 @@ typedef struct PruneFreezeResult
/* New value of relminmxid found by heap_page_prune_and_freeze() */
MultiXactId new_relminmxid;
+ int lpdead_items; /* includes existing LP_DEAD items */
+ OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
/* ----------------
--
2.40.1
v4-0018-Initialize-xl_heap_prune-deserialization-variable.patchtext/x-diff; charset=us-asciiDownload
From 89c1686279f59ac3e8536fcd6618291e3a22c199 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 19 Mar 2024 16:05:42 -0400
Subject: [PATCH v4 18/19] Initialize xl_heap_prune deserialization variables
Future commits will depend on these being initialized
---
src/backend/access/heap/heapam.c | 18 +++++++++---------
src/backend/access/rmgrdesc/heapdesc.c | 16 ++++++++--------
2 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 16bab55ba02..2ef7decdd05 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8758,16 +8758,16 @@ heap_xlog_prune(XLogReaderState *record)
if (action == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
- OffsetNumber *redirected;
- OffsetNumber *nowdead;
- OffsetNumber *nowunused;
- int nredirected;
- int ndead;
- int nunused;
- int nplans;
+ OffsetNumber *redirected = NULL;
+ OffsetNumber *nowdead = NULL;
+ OffsetNumber *nowunused = NULL;
+ int nredirected = 0;
+ int ndead = 0;
+ int nunused = 0;
+ int nplans = 0;
Size datalen;
- xl_heap_freeze_plan *plans;
- OffsetNumber *frz_offsets;
+ xl_heap_freeze_plan *plans = NULL;
+ OffsetNumber *frz_offsets = NULL;
int curoff = 0;
nplans = xlrec->nplans;
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ea03f902fc4..1fe5c78031f 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -185,15 +185,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
if (XLogRecHasBlockData(record, 0))
{
- OffsetNumber *redirected;
- OffsetNumber *nowdead;
- OffsetNumber *nowunused;
- int nredirected;
- int ndead;
- int nunused;
- int nplans;
Size datalen;
- xl_heap_freeze_plan *plans;
+ OffsetNumber *redirected = NULL;
+ OffsetNumber *nowdead = NULL;
+ OffsetNumber *nowunused = NULL;
+ int nredirected = 0;
+ int nunused = 0;
+ int ndead = 0;
+ int nplans = 0;
+ xl_heap_freeze_plan *plans = NULL;
OffsetNumber *frz_offsets;
nplans = xlrec->nplans;
--
2.40.1
v4-0019-Streamline-XLOG_HEAP2_PRUNE-record-and-use-for-fr.patchtext/x-diff; charset=us-asciiDownload
From 74411532903ec17cc9c29e17786fdd35ba1b0eac Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 19 Mar 2024 20:48:28 -0400
Subject: [PATCH v4 19/19] Streamline XLOG_HEAP2_PRUNE record and use for
freeze and vacuum
xl_heap_prune struct for the XLOG_HEAP2_PRUNE record type had members
for counting the number of freeze plans and number of redirected, dead,
and newly unused line pointers. However, only some of those are used in
many XLOG_HEAP2_PRUNE records.
Put all of those members in the XLOG buffer data and use flags to
indicate which are present. This makes it feasible to use the
XLOG_HEAP2_PRUNE record smaller than it was and smaller than the
previously used XLOG_HEAP2_VACUUM and XLOG_HEAP2_FREEZE records.
The snapshot conflict horizon is not used for vacuum but is for the
other record types. It must go in the main data (not per buffer data) so
that it can be used even if the record contains an FPI.
The new prune record is composed of sub-records for each type of
modification freezing tuples and setting line pointers unused, redirect,
or dead.
---
src/backend/access/heap/heapam.c | 33 +++----
src/backend/access/heap/pruneheap.c | 98 ++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 24 +++--
src/backend/access/rmgrdesc/heapdesc.c | 27 ++---
src/backend/access/rmgrdesc/xactdesc.c | 68 +++++++++++++
src/include/access/heapam_xlog.h | 130 +++++++++++++++----------
src/tools/pgindent/typedefs.list | 3 +
7 files changed, 273 insertions(+), 110 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2ef7decdd05..adc259fdca7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8728,15 +8728,15 @@ heap_xlog_prune(XLogReaderState *record)
* heap_page_prune_and_freeze(), heap_page_prune_execute() will call
* PageRepairFragementation() which expects a full cleanup lock.
*/
- get_cleanup_lock = xlrec->nredirected > 0 ||
- xlrec->ndead > 0 ||
- (xlrec->nunused > 0 && !lp_truncate_only);
+ get_cleanup_lock = xlrec->flags & XLHP_HAS_REDIRECTIONS ||
+ xlrec->flags & XLHP_HAS_DEAD_ITEMS ||
+ (xlrec->flags & XLHP_HAS_NOW_UNUSED_ITEMS && !lp_truncate_only);
if (lp_truncate_only)
{
- Assert(xlrec->nredirected == 0);
- Assert(xlrec->ndead == 0);
- Assert(xlrec->nunused > 0);
+ Assert(!(xlrec->flags & XLHP_HAS_REDIRECTIONS));
+ Assert(!(xlrec->flags & XLHP_HAS_DEAD_ITEMS));
+ Assert(xlrec->flags & XLHP_HAS_NOW_UNUSED_ITEMS);
}
/*
@@ -8745,9 +8745,13 @@ heap_xlog_prune(XLogReaderState *record)
* tuples are still visible or which consider the frozen xids as running.
*/
if (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON && InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->isCatalogRel,
+ {
+ xlhp_conflict_horizon *horizon = (xlhp_conflict_horizon *) (xlrec + SizeOfHeapPrune);
+
+ ResolveRecoveryConflictWithSnapshot(horizon->xid,
+ xlrec->flags & XLHP_IS_CATALOG_REL,
rlocator);
+ }
/*
* If we have a full-page image, restore it and we're done.
@@ -8770,16 +8774,11 @@ heap_xlog_prune(XLogReaderState *record)
OffsetNumber *frz_offsets = NULL;
int curoff = 0;
- nplans = xlrec->nplans;
- nredirected = xlrec->nredirected;
- ndead = xlrec->ndead;
- nunused = xlrec->nunused;
+ char *cursor = XLogRecGetBlockData(record, 0, &datalen);
- plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, &datalen);
- redirected = (OffsetNumber *) &plans[nplans];
- nowdead = redirected + (nredirected * 2);
- nowunused = nowdead + ndead;
- frz_offsets = nowunused + nunused;
+ heap_xlog_deserialize_prune_and_freeze(cursor, xlrec->flags,
+ &nredirected, &redirected, &ndead, &nowdead,
+ &nunused, &nowunused, &nplans, &plans, &frz_offsets);
/* Update all line pointers per the record, and repair fragmentation */
if (nredirected > 0 || ndead > 0 || nunused > 0)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5decb1127d0..26147f63c4c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -741,19 +741,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
PruneState *prstate, PruneFreezeResult *presult)
{
xl_heap_prune xlrec;
+ xlhp_conflict_horizon horizon;
XLogRecPtr recptr;
+ xlhp_freeze freeze;
+ xlhp_prune_items redirect,
+ dead,
+ unused;
+ int nplans = 0;
xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
- OffsetNumber offsets[MaxHeapTuplesPerPage];
- bool do_freeze = presult->nfrozen > 0;
+ OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_freeze = (presult->nfrozen > 0);
xlrec.flags = 0;
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.nredirected = prstate->nredirected;
- xlrec.ndead = prstate->ndead;
- xlrec.nunused = prstate->nunused;
- xlrec.nplans = 0;
+ if (RelationIsAccessibleInLogicalDecoding(relation))
+ xlrec.flags |= XLHP_IS_CATALOG_REL;
xlrec.flags |= XLHP_HAS_CONFLICT_HORIZON;
@@ -767,22 +770,25 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* youngest tuple this record will freeze will conflict.
*/
if (do_freeze)
- xlrec.snapshotConflictHorizon = Max(prstate->snapshotConflictHorizon,
- presult->frz_conflict_horizon);
+ horizon.xid = Max(prstate->snapshotConflictHorizon,
+ presult->frz_conflict_horizon);
else
- xlrec.snapshotConflictHorizon = prstate->snapshotConflictHorizon;
+ horizon.xid = prstate->snapshotConflictHorizon;
/*
* Prepare deduplicated representation for use in WAL record Destructively
* sorts tuples array in-place.
*/
if (do_freeze)
- xlrec.nplans = heap_log_freeze_plan(prstate->frozen,
- presult->nfrozen, plans, offsets);
-
+ nplans = heap_log_freeze_plan(prstate->frozen,
+ presult->nfrozen, plans,
+ frz_offsets);
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
+ xlrec.flags |= XLHP_HAS_CONFLICT_HORIZON;
+ XLogRegisterData((char *) &horizon, SizeOfSnapshotConflictHorizon);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
/*
@@ -790,25 +796,73 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* that they are. When XLogInsert stores the whole buffer, the offset
* arrays need not be stored too.
*/
- if (xlrec.nplans > 0)
+ if (nplans > 0)
+ {
+ xlrec.flags |= XLHP_HAS_FREEZE_PLANS;
+
+ freeze = (xlhp_freeze)
+ {
+ .nplans = nplans
+ };
+
+ XLogRegisterBufData(0, (char *) &freeze, offsetof(xlhp_freeze, plans));
+
XLogRegisterBufData(0, (char *) plans,
- xlrec.nplans * sizeof(xl_heap_freeze_plan));
+ sizeof(xl_heap_freeze_plan) * freeze.nplans);
+ }
+
if (prstate->nredirected > 0)
+ {
+ xlrec.flags |= XLHP_HAS_REDIRECTIONS;
+
+ redirect = (xlhp_prune_items)
+ {
+ .ntargets = prstate->nredirected
+ };
+
+ XLogRegisterBufData(0, (char *) &redirect,
+ offsetof(xlhp_prune_items, data));
+
XLogRegisterBufData(0, (char *) prstate->redirected,
- prstate->nredirected *
- sizeof(OffsetNumber) * 2);
+ sizeof(OffsetNumber[2]) * prstate->nredirected);
+ }
if (prstate->ndead > 0)
+ {
+ xlrec.flags |= XLHP_HAS_DEAD_ITEMS;
+
+ dead = (xlhp_prune_items)
+ {
+ .ntargets = prstate->ndead
+ };
+
+ XLogRegisterBufData(0, (char *) &dead,
+ offsetof(xlhp_prune_items, data));
+
XLogRegisterBufData(0, (char *) prstate->nowdead,
- prstate->ndead * sizeof(OffsetNumber));
+ sizeof(OffsetNumber) * dead.ntargets);
+ }
if (prstate->nunused > 0)
+ {
+ xlrec.flags |= XLHP_HAS_NOW_UNUSED_ITEMS;
+
+ unused = (xlhp_prune_items)
+ {
+ .ntargets = prstate->nunused
+ };
+
+ XLogRegisterBufData(0, (char *) &unused,
+ offsetof(xlhp_prune_items, data));
+
XLogRegisterBufData(0, (char *) prstate->nowunused,
- prstate->nunused * sizeof(OffsetNumber));
- if (xlrec.nplans > 0)
- XLogRegisterBufData(0, (char *) offsets,
- presult->nfrozen * sizeof(OffsetNumber));
+ sizeof(OffsetNumber) * unused.ntargets);
+ }
+
+ if (nplans > 0)
+ XLogRegisterBufData(0, (char *) frz_offsets,
+ sizeof(OffsetNumber) * presult->nfrozen);
recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 295846b854f..1e79cbbb107 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2253,21 +2253,27 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
xl_heap_prune xlrec;
+ xlhp_prune_items unused_rec;
XLogRecPtr recptr;
- xlrec.flags = XLHP_LP_TRUNCATE_ONLY;
- xlrec.snapshotConflictHorizon = InvalidTransactionId;
- xlrec.nplans = 0;
- xlrec.nredirected = 0;
- xlrec.ndead = 0;
- xlrec.nunused = nunused;
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(vacrel->rel);
+ xlrec.flags = XLHP_HAS_NOW_UNUSED_ITEMS;
+
+ xlrec.flags |= XLHP_LP_TRUNCATE_ONLY;
+
+ unused_rec = (xlhp_prune_items)
+ {
+ .ntargets = nunused
+ };
XLogBeginInsert();
XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
-
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- XLogRegisterBufData(0, (char *) unused, nunused * sizeof(OffsetNumber));
+
+ XLogRegisterBufData(0, (char *) &unused_rec,
+ offsetof(xlhp_prune_items, data));
+
+ XLogRegisterBufData(0, (char *) unused,
+ sizeof(OffsetNumber) * unused_rec.ntargets);
recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 1fe5c78031f..f542bcb94b6 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -179,9 +179,16 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
xl_heap_prune *xlrec = (xl_heap_prune *) rec;
- appendStringInfo(buf, "snapshotConflictHorizon: %u, isCatalogRel: %c",
- xlrec->snapshotConflictHorizon,
- xlrec->isCatalogRel ? 'T' : 'F');
+ if (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON)
+ {
+ xlhp_conflict_horizon *horizon = (xlhp_conflict_horizon *) (xlrec + SizeOfHeapPrune);
+
+ appendStringInfo(buf, "snapshotConflictHorizon: %u",
+ horizon->xid);
+ }
+
+ appendStringInfo(buf, ", isCatalogRel: %c",
+ xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
if (XLogRecHasBlockData(record, 0))
{
@@ -196,16 +203,12 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
xl_heap_freeze_plan *plans = NULL;
OffsetNumber *frz_offsets;
- nplans = xlrec->nplans;
- nredirected = xlrec->nredirected;
- ndead = xlrec->ndead;
- nunused = xlrec->nunused;
+ char *cursor = XLogRecGetBlockData(record, 0, &datalen);
+
+ heap_xlog_deserialize_prune_and_freeze(cursor, xlrec->flags,
+ &nredirected, &redirected, &ndead, &nowdead,
+ &nunused, &nowunused, &nplans, &plans, &frz_offsets);
- plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, &datalen);
- redirected = (OffsetNumber *) &plans[nplans];
- nowdead = redirected + (nredirected * 2);
- nowunused = nowdead + ndead;
- frz_offsets = nowunused + nunused;
appendStringInfo(buf, ", nredirected: %u, ndead: %u, nunused: %u, nplans: %u,",
nredirected,
diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 41b842d80ec..e120805e5e0 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/heapam_xlog.h"
#include "access/transam.h"
#include "access/xact.h"
#include "replication/origin.h"
@@ -21,6 +22,73 @@
#include "storage/standbydefs.h"
#include "utils/timestamp.h"
+/*
+ * Given a MAXALIGNed buffer returned by XLogRecGetBlockData() and pointed to
+ * by cursor and any xl_heap_prune flags, deserialize the arrays of
+ * OffsetNumbers contained in an xl_heap_prune record. This is in this file so
+ * it can be shared between heap2_redo and heap2_desc code, the latter of which
+ * is used in frontend code.
+ */
+void
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+ int *nredirected, OffsetNumber **redirected,
+ int *ndead, OffsetNumber **nowdead,
+ int *nunused, OffsetNumber **nowunused,
+ int *nplans, xl_heap_freeze_plan **plans,
+ OffsetNumber **frz_offsets)
+{
+ if (flags & XLHP_HAS_FREEZE_PLANS)
+ {
+ xlhp_freeze *freeze = (xlhp_freeze *) cursor;
+
+ *nplans = freeze->nplans;
+ Assert(*nplans > 0);
+ *plans = freeze->plans;
+
+ cursor += offsetof(xlhp_freeze, plans);
+ cursor += sizeof(xl_heap_freeze_plan) * freeze->nplans;
+ }
+
+ if (flags & XLHP_HAS_REDIRECTIONS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ *nredirected = subrecord->ntargets;
+ Assert(nredirected > 0);
+ *redirected = &subrecord->data[0];
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber[2]) * *nredirected;
+ }
+
+ if (flags & XLHP_HAS_DEAD_ITEMS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ *ndead = subrecord->ntargets;
+ Assert(ndead > 0);
+ *nowdead = subrecord->data;
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber) * *ndead;
+ }
+
+ if (flags & XLHP_HAS_NOW_UNUSED_ITEMS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ *nunused = subrecord->ntargets;
+ Assert(nunused > 0);
+ *nowunused = subrecord->data;
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber) * *nunused;
+ }
+
+ if (nplans > 0)
+ *frz_offsets = (OffsetNumber *) cursor;
+}
+
/*
* Parse the WAL format of an xact commit and abort records into an easier to
* understand format.
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 2393540cf68..dfeb703d136 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -224,9 +224,64 @@ typedef struct xl_heap_update
#define SizeOfHeapUpdate (offsetof(xl_heap_update, new_offnum) + sizeof(OffsetNumber))
+/*
+ * This is what we need to know about page pruning and freezing, both during
+ * VACUUM and during opportunistic pruning.
+ *
+ * If XLPH_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, or XLHP_HAS_NOW_UNUSED is set,
+ * acquires a full cleanup lock. Otherwise an ordinary exclusive lock is
+ * enough. This can happen if freezing was the only modification to the page.
+ *
+ * The data for block reference 0 contains "sub-records" depending on which
+ * of the XLHP_HAS_* flags are set. See xlhp_* struct definitions below.
+ * The layout is in the same order as the XLHP_* flags.
+ *
+ * OFFSET NUMBERS are in the block reference 0
+ *
+ * If only unused item offsets are included because the record is constructed
+ * during vacuum's second pass (marking LP_DEAD items LP_UNUSED) then only an
+ * ordinary exclusive lock is required to replay.
+ */
+typedef struct xl_heap_prune
+{
+ uint8 flags;
+} xl_heap_prune;
+
+/* to handle recovery conflict during logical decoding on standby */
+#define XLHP_IS_CATALOG_REL (1 << 1)
+
+/*
+ * During vacuum's second pass which sets LP_DEAD items LP_UNUSED, we will only
+ * truncate the line pointer array, not call PageRepairFragmentation. We need
+ * this flag to differentiate what kind of lock (exclusive or cleanup) to take
+ * on the buffer and whether to call PageTruncateLinePointerArray() or
+ * PageRepairFragementation().
+ */
+#define XLHP_LP_TRUNCATE_ONLY (1 << 2)
+
+/*
+ * Vacuum's first pass and on-access pruning may need to include a snapshot
+ * conflict horizon.
+ */
+#define XLHP_HAS_CONFLICT_HORIZON (1 << 3)
+#define XLHP_HAS_FREEZE_PLANS (1 << 4)
+#define XLHP_HAS_REDIRECTIONS (1 << 5)
+#define XLHP_HAS_DEAD_ITEMS (1 << 6)
+#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 7)
+
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+
+typedef struct xlhp_conflict_horizon
+{
+ TransactionId xid;
+} xlhp_conflict_horizon;
+
+#define SizeOfSnapshotConflictHorizon (offsetof(xlhp_conflict_horizon, xid) + sizeof(uint32))
+
/*
* This struct represents a 'freeze plan', which describes how to freeze a
- * group of one or more heap tuples (appears in xl_heap_prune record)
+ * group of one or more heap tuples (appears in xl_heap_prune's xlhp_freeze
+ * record)
*/
/* 0x01 was XLH_FREEZE_XMIN */
#define XLH_FREEZE_XVAC 0x02
@@ -246,64 +301,32 @@ typedef struct xl_heap_freeze_plan
/*
* As of Postgres 17, XLOG_HEAP2_PRUNE records replace
* XLOG_HEAP2_FREEZE_PAGE records.
- */
-
-/*
- * This is what we need to know about page pruning (both during VACUUM and
- * during opportunistic pruning)
*
- * The array of OffsetNumbers following the fixed part of the record contains:
- * * for each freeze plan: the freeze plan
- * * for each redirected item: the item offset, then the offset redirected to
- * * for each now-dead item: the item offset
- * * for each now-unused item: the item offset
- * * for each tuple frozen by the freeze plans: the offset of the item corresponding to that tuple
- * The total number of OffsetNumbers is therefore
- * (2*nredirected) + ndead + nunused + (sum[plan.ntuples for plan in plans])
+ * This is what we need to know about a block being frozen during vacuum
*
- * Acquires a full cleanup lock if heap_page_prune_execute() must be called
+ * Backup block 0's data contains an array of xl_heap_freeze_plan structs
+ * (with nplans elements), followed by one or more page offset number arrays.
+ * Each such page offset number array corresponds to a single freeze plan
+ * (REDO routine freezes corresponding heap tuples using freeze plan).
*/
-typedef struct xl_heap_prune
+typedef struct xlhp_freeze
{
- uint8 flags;
- TransactionId snapshotConflictHorizon;
uint16 nplans;
- uint16 nredirected;
- uint16 ndead;
- uint16 nunused;
- bool isCatalogRel; /* to handle recovery conflict during logical
- * decoding on standby */
- /*--------------------------------------------------------------------
- * OFFSET NUMBERS and freeze plans are in the block reference 0 in the
- * following order:
- *
- * * xl_heap_freeze_plan plans[nplans];
- * * OffsetNumber redirected[2 * nredirected];
- * * OffsetNumber nowdead[ndead];
- * * OffsetNumber nowunused[nunused];
- * * OffsetNumber frz_offsets[...];
- *--------------------------------------------------------------------
- */
-} xl_heap_prune;
-
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, isCatalogRel) + sizeof(bool))
-
-/* Flags for xl_heap_prune */
+ xl_heap_freeze_plan plans[FLEXIBLE_ARRAY_MEMBER];
+} xlhp_freeze;
/*
- * During vacuum's second pass which sets LP_DEAD items LP_UNUSED, we will only
- * truncate the line pointer array, not call PageRepairFragmentation. We need
- * this flag to differentiate what kind of lock (exclusive or cleanup) to take
- * on the buffer and whether to call PageTruncateLinePointerArray() or
- * PageRepairFragementation().
+ * Sub-record type contained in block reference 0 of a prune record if
+ * XLHP_HAS_REDIRECTIONS/XLHP_HAS_DEAD_ITEMS/XLHP_HAS_NOW_UNUSED_ITEMS is set.
+ * Note that in the XLHP_HAS_REDIRECTIONS variant, there are actually 2 *
+ * length number of OffsetNumbers in the data.
*/
-#define XLHP_LP_TRUNCATE_ONLY (1 << 1)
+typedef struct xlhp_prune_items
+{
+ uint16 ntargets;
+ OffsetNumber data[FLEXIBLE_ARRAY_MEMBER];
+} xlhp_prune_items;
-/*
- * Vacuum's first pass and on-access pruning may need to include a snapshot
- * conflict horizon.
- */
-#define XLHP_HAS_CONFLICT_HORIZON (1 << 2)
/* flags for infobits_set */
#define XLHL_XMAX_IS_MULTI 0x01
@@ -416,4 +439,11 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
TransactionId snapshotConflictHorizon,
uint8 vmflags);
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+ int *nredirected, OffsetNumber **redirected,
+ int *ndead, OffsetNumber **nowdead,
+ int *nunused, OffsetNumber **nowunused,
+ int *nplans, xl_heap_freeze_plan **plans,
+ OffsetNumber **frz_offsets);
+
#endif /* HEAPAM_XLOG_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b2ddc1e2549..40fb694c836 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4007,6 +4007,9 @@ xl_xact_stats_items
xl_xact_subxacts
xl_xact_twophase
xl_xact_xinfo
+xlhp_conflict_horizon
+xlhp_freeze
+xlhp_prune_items
xmlBuffer
xmlBufferPtr
xmlChar
--
2.40.1
On 20/03/2024 03:36, Melanie Plageman wrote:
On Mon, Mar 18, 2024 at 01:15:21AM +0200, Heikki Linnakangas wrote:
On 15/03/2024 02:56, Melanie Plageman wrote:
Okay, so I was going to start using xl_heap_prune for vacuum here too,
but I realized it would be bigger because of the
snapshotConflictHorizon. Do you think there is a non-terrible way to
make the snapshotConflictHorizon optional? Like with a flag?Yeah, another flag would do the trick.
Okay, I've done this in attached v4 (including removing
XLOG_HEAP2_VACUUM). I had to put the snapshot conflict horizon in the
"main chunk" of data available at replay regardless of whether or not
the record ended up including an FPI.I made it its own sub-record (xlhp_conflict_horizon) less to help with
alignment (though we can use all the help we can get there) and more to
keep it from getting lost. When you look at heapam_xlog.h, you can see
what a XLOG_HEAP2_PRUNE record will contain starting with the
xl_heap_prune struct and then all the sub-record types.
Ok, now that I look at this, I wonder if we're being overly cautious
about the WAL size. We probably could just always include the snapshot
field, and set it to InvalidTransactionId and waste 4 bytes when it's
not needed. For the sake of simplicity. I don't feel strongly either way
though, the flag is pretty simple too.
xl_heap_prune->flags is a uint8, but we are already using 7 of the bits.
Should we make it a unit16?
It doesn't matter much either way. We can also make it larger when we
need more bits, there's no need make room for them them beforehand.
Eventually, I would like to avoid emitting a separate XLOG_HEAP2_VISIBLE
record for vacuum's first and second passes and just include the VM
update flags in the xl_heap_prune record. xl_heap_visible->flags is a
uint8. If we made xl_heap_prune->flags uint16, we could probably combine
them (though maybe we want other bits available). Also vacuum's second
pass doesn't set a snapshotConflictHorizon, so if we combined
xl_heap_visible and xl_heap_prune for vacuum we would end up saving even
more space (since vacuum sets xl_heap_visible->snapshotConflictHorizon
to InvalidXLogRecPtr).
Makes sense.
A note on sub-record naming: I kept xl_heap_freeze_plan's name but
prefixed the other sub-records with xlhp. Do you think it is worth
renaming it (to xlhp_freeze_plan)?
Yeah, perhaps.
Also, should I change xlhp_freeze to xlhp_freeze_page?
I renamed it to xlhp_freeze_plans, for some consistency with
xlhp_prune_items.
I realized that the WAL record format changes are pretty independent
from the rest of the patches. They could be applied before the rest.
Without the rest of the changes, we'll still write two WAL records per
page in vacuum, one to prune and another one to freeze, but it's another
meaningful incremental step. So I reshuffled the patches, so that the
WAL format is changed first, before the rest of the changes.
0001-0008: These are the WAL format changes. There's some comment
cleanup needed, but as far as the code goes, I think these are pretty
much ready to be squashed & committed.
0009-: The rest of the v4 patches, rebased over the WAL format changes.
I also added a few small commits for little cleanups that caught my eye,
let me know if you disagree with those.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
v5-0007-Add-comment-to-log_heap_prune_and_freeze.patchtext/x-patch; charset=UTF-8; name=v5-0007-Add-comment-to-log_heap_prune_and_freeze.patchDownload
From 8af186ee9dd8c7dc20f37a69b34cab7b95faa43b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 14:03:06 +0200
Subject: [PATCH v5 07/26] Add comment to log_heap_prune_and_freeze().
XXX: This should be rewritten, but I tried to at least list some
important points.
---
src/backend/access/heap/pruneheap.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6482d9d05c1..cd86a6262f5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1274,6 +1274,25 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
return nplans;
}
+/*
+ * Write a XLOG_HEAP2_PRUNE_FREEZE WAL record
+ *
+ * This is used for several different page maintenance operations:
+ *
+ * Page pruning: some items are redirected, some marked dead, some removed altogether
+ *
+ * Freezing: only 'frozen' is used
+ *
+ * Vacuum, 2nd pass: only 'unused' is used, and lp_truncate_only is set to true.
+ *
+ * They have enough commonalities that we use a single WAL record for them
+ * all.
+ *
+ * Note: This function scribbles on the 'frozen' array.
+ *
+ * The caller must hold an appropriate lock on 'buffer'.
+ * Note: This is called in a critical section, so careful what you do here.
+ */
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId conflict_xid,
--
2.39.2
v5-0008-minor-refactoring-in-log_heap_prune_and_freeze.patchtext/x-patch; charset=UTF-8; name=v5-0008-minor-refactoring-in-log_heap_prune_and_freeze.patchDownload
From b26e36ba8614d907a6e15810ed4f684f8f628dd2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 14:53:31 +0200
Subject: [PATCH v5 08/26] minor refactoring in log_heap_prune_and_freeze()
Mostly to make local variables more tightly-scoped.
---
src/backend/access/heap/pruneheap.c | 80 +++++++++++---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 ++--
src/include/access/heapam_xlog.h | 4 +-
src/tools/pgindent/typedefs.list | 2 +
4 files changed, 44 insertions(+), 52 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index cd86a6262f5..d8be0f68bf9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1304,42 +1304,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
{
xl_heap_prune xlrec;
XLogRecPtr recptr;
- xlhp_freeze freeze;
- xlhp_prune_items redirect_items,
- dead_items,
- unused_items;
-
- int nplans = 0;
+ int nplans;
xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
- bool do_freeze = (nfrozen > 0);
xlrec.flags = 0;
- if (lp_truncate_only)
- {
- xlrec.flags |= XLHP_LP_TRUNCATE_ONLY;
- Assert(nfrozen == 0 && nredirected == 0 && ndead == 0);
- }
-
- if (RelationIsAccessibleInLogicalDecoding(relation))
- xlrec.flags |= XLHP_IS_CATALOG_REL;
-
- if (TransactionIdIsValid(conflict_xid))
- xlrec.flags |= XLHP_HAS_CONFLICT_HORIZON;
-
/*
* Prepare deduplicated representation for use in WAL record Destructively
* sorts tuples array in-place.
*/
- if (do_freeze)
+ if (nfrozen > 0)
nplans = heap_log_freeze_plan(frozen, nfrozen, plans, frz_offsets);
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
-
- if (TransactionIdIsValid(conflict_xid))
- XLogRegisterData((char *) &conflict_xid, sizeof(TransactionId));
+ else
+ nplans = 0;
+ XLogBeginInsert();
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
/*
@@ -1349,45 +1329,40 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (nplans > 0)
{
- xlrec.flags |= XLHP_HAS_FREEZE_PLANS;
-
- freeze = (xlhp_freeze)
- {
+ xlhp_freeze_plans freeze_plans = {
.nplans = nplans
};
- XLogRegisterBufData(0, (char *) &freeze, offsetof(xlhp_freeze, plans));
+ xlrec.flags |= XLHP_HAS_FREEZE_PLANS;
+ XLogRegisterBufData(0, (char *) &freeze_plans,
+ offsetof(xlhp_freeze_plans, plans));
XLogRegisterBufData(0, (char *) plans,
- sizeof(xl_heap_freeze_plan) * freeze.nplans);
+ sizeof(xl_heap_freeze_plan) * nplans);
}
-
if (nredirected > 0)
{
- xlrec.flags |= XLHP_HAS_REDIRECTIONS;
-
- redirect_items = (xlhp_prune_items)
- {
+ xlhp_prune_items redirect_items = {
.ntargets = nredirected
};
+ xlrec.flags |= XLHP_HAS_REDIRECTIONS;
+
XLogRegisterBufData(0, (char *) &redirect_items,
offsetof(xlhp_prune_items, data));
-
XLogRegisterBufData(0, (char *) redirected,
sizeof(OffsetNumber[2]) * nredirected);
}
if (ndead > 0)
{
- xlrec.flags |= XLHP_HAS_DEAD_ITEMS;
-
- dead_items = (xlhp_prune_items)
- {
+ xlhp_prune_items dead_items = {
.ntargets = ndead
};
+ xlrec.flags |= XLHP_HAS_DEAD_ITEMS;
+
XLogRegisterBufData(0, (char *) &dead_items,
offsetof(xlhp_prune_items, data));
@@ -1397,13 +1372,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
if (nunused > 0)
{
- xlrec.flags |= XLHP_HAS_NOW_UNUSED_ITEMS;
-
- unused_items = (xlhp_prune_items)
- {
+ xlhp_prune_items unused_items = {
.ntargets = nunused
};
+ xlrec.flags |= XLHP_HAS_NOW_UNUSED_ITEMS;
+
XLogRegisterBufData(0, (char *) &unused_items,
offsetof(xlhp_prune_items, data));
@@ -1415,6 +1389,22 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
XLogRegisterBufData(0, (char *) frz_offsets,
sizeof(OffsetNumber) * nfrozen);
+ if (lp_truncate_only)
+ {
+ Assert(nfrozen == 0 && nredirected == 0 && ndead == 0);
+ xlrec.flags |= XLHP_LP_TRUNCATE_ONLY;
+ }
+
+ if (RelationIsAccessibleInLogicalDecoding(relation))
+ xlrec.flags |= XLHP_IS_CATALOG_REL;
+
+ if (TransactionIdIsValid(conflict_xid))
+ xlrec.flags |= XLHP_HAS_CONFLICT_HORIZON;
+
+ XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
+ if (TransactionIdIsValid(conflict_xid))
+ XLogRegisterData((char *) &conflict_xid, sizeof(TransactionId));
+
recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE_FREEZE);
PageSetLSN(BufferGetPage(buffer), recptr);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index ff238d58279..7b5d13ffed3 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -109,14 +109,14 @@ heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
{
if (flags & XLHP_HAS_FREEZE_PLANS)
{
- xlhp_freeze *freeze = (xlhp_freeze *) cursor;
+ xlhp_freeze_plans *freeze_plans = (xlhp_freeze_plans *) cursor;
- *nplans = freeze->nplans;
+ *nplans = freeze_plans->nplans;
Assert(*nplans > 0);
- *plans = freeze->plans;
+ *plans = freeze_plans->plans;
- cursor += offsetof(xlhp_freeze, plans);
- cursor += sizeof(xl_heap_freeze_plan) * freeze->nplans;
+ cursor += offsetof(xlhp_freeze_plans, plans);
+ cursor += sizeof(xl_heap_freeze_plan) * freeze_plans->nplans;
}
if (flags & XLHP_HAS_REDIRECTIONS)
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index f0cbd31189e..9d64eaf3933 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -309,11 +309,11 @@ typedef struct xl_heap_freeze_plan
* Each such page offset number array corresponds to a single freeze plan
* (REDO routine freezes corresponding heap tuples using freeze plan).
*/
-typedef struct xlhp_freeze
+typedef struct xlhp_freeze_plans
{
uint16 nplans;
xl_heap_freeze_plan plans[FLEXIBLE_ARRAY_MEMBER];
-} xlhp_freeze;
+} xlhp_freeze_plans;
/*
* Sub-record type contained in block reference 0 of a prune record if
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e294f8bc4e6..6a46b34c5ca 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4008,6 +4008,8 @@ xl_xact_stats_items
xl_xact_subxacts
xl_xact_twophase
xl_xact_xinfo
+xlhp_freeze_plans
+xlhp_prune_items
xmlBuffer
xmlBufferPtr
xmlChar
--
2.39.2
v5-0009-lazy_scan_prune-tests-tuple-vis-with-GlobalVisTes.patchtext/x-patch; charset=UTF-8; name=v5-0009-lazy_scan_prune-tests-tuple-vis-with-GlobalVisTes.patchDownload
From 608dc122b7b41ea7617233f236a34f95e896115d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:14:47 -0500
Subject: [PATCH v5 09/26] lazy_scan_prune tests tuple vis with GlobalVisTest
One requirement for eventually combining the prune and freeze records,
is that we must check during pruning if live tuples on the page are
visible to everyone and thus, whether or not the page is all visible. We
only consider opportunistically freezing tuples if the whole page is all
visible and could be set all frozen.
During pruning (in heap_page_prune()), we do not have access to
VacuumCutoffs -- as on access pruning also calls heap_page_prune(). We
do, however, have access to a GlobalVisState. This can be used to
determine whether or not the tuple is visible to everyone. It also has
the potential of being more up-to-date than VacuumCutoffs->OldestXmin.
This commit simply modifies lazy_scan_prune() to use GlobalVisState
instead of OldestXmin. Future commits will move the
all_visible/all_frozen calculation into heap_page_prune().
---
src/backend/access/heap/vacuumlazy.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 25e8f0c30a7..69ec7150000 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1579,11 +1579,15 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
+ * that everyone sees it as committed? A
+ * FrozenTransactionId is seen as committed to everyone.
+ * Otherwise, we check if there is a snapshot that
+ * considers this xid to still be running, and if so, we
+ * don't consider the page all-visible.
*/
xmin = HeapTupleHeaderGetXmin(htup);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ if (xmin != FrozenTransactionId &&
+ !GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
{
all_visible = false;
break;
--
2.39.2
v5-0010-Pass-heap_prune_chain-PruneResult-output-paramete.patchtext/x-patch; charset=UTF-8; name=v5-0010-Pass-heap_prune_chain-PruneResult-output-paramete.patchDownload
From 32e38d4b2ef0e6e42002a15f26d1bf88bcebca80 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:39:59 -0500
Subject: [PATCH v5 10/26] Pass heap_prune_chain() PruneResult output parameter
Future commits will set other members of PruneResult in
heap_prune_chain(), so start passing it as an output parameter now. This
eliminates the output parameter htsv -- the array of HTSV_Results --
since that is a member of the PruneResult.
---
src/backend/access/heap/pruneheap.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d8be0f68bf9..94b18017aaa 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -59,8 +59,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- int8 *htsv,
- PruneState *prstate);
+ PruneState *prstate, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -321,7 +320,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Process this item or chain of items */
presult->ndeleted += heap_prune_chain(buffer, offnum,
- presult->htsv, &prstate);
+ &prstate, presult);
}
/* Clear the offset information once we have processed the given page. */
@@ -423,7 +422,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in htsv.
+ * Tuple visibility information is provided in presult->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -453,7 +452,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- int8 *htsv, PruneState *prstate)
+ PruneState *prstate, PruneResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -474,7 +473,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(htsv[rootoffnum] != -1);
+ Assert(presult->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -497,7 +496,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -594,7 +593,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(htsv[offnum]))
+ switch (htsv_get_valid_status(presult->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
--
2.39.2
v5-0001-Merge-prune-freeze-and-vacuum-WAL-record-formats.patchtext/x-patch; charset=UTF-8; name=v5-0001-Merge-prune-freeze-and-vacuum-WAL-record-formats.patchDownload
From 06d5ff5349a8aa95cbfd06a8043fe503b7b1bf7b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 14:50:14 +0200
Subject: [PATCH v5 01/26] Merge prune, freeze and vacuum WAL record formats
The new combined WAL record is now used for pruning, freezing and 2nd
pass of vacuum. This is in preparation of changing vacuuming to write
a combined prune+freeze record per page, instead of separate two
records. The new WAL record format now supports that, but the code
still always writes separate records for pruning and freezing.
XXX I tried to lift-and-shift the code from v4 patch set as unchanged
as possible, for easier review, but some noteworthy changes:
- Instead of passing PruneState and PageFreezeResult to
log_heap_prune_and_freeze(), pass the arrays of frozen, redirected
et al offsets directly. That way, it can be called from other places.
- moved heap_xlog_deserialize_prune_and_freeze() from xactdesc.c to
heapdesc.c. (Because that's clearly where it belongs)
Author: Melanie Plageman <melanieplageman@gmail.com>
---
src/backend/access/heap/heapam.c | 433 +++++------------------
src/backend/access/heap/pruneheap.c | 381 ++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 20 +-
src/backend/access/rmgrdesc/heapdesc.c | 198 +++++++----
src/backend/replication/logical/decode.c | 2 -
src/include/access/heapam.h | 9 +-
src/include/access/heapam_xlog.h | 172 +++++----
7 files changed, 642 insertions(+), 573 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34bc60f625f..e6cfffd9f3e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -91,9 +91,6 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
ItemPointer ctid, TransactionId xid,
LockTupleMode mode);
-static int heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
- xl_heap_freeze_plan *plans_out,
- OffsetNumber *offsets_out);
static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
uint16 *new_infomask2);
static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
@@ -6746,179 +6743,16 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
/* Now WAL-log freezing if necessary */
if (RelationNeedsWAL(rel))
{
- xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
- OffsetNumber offsets[MaxHeapTuplesPerPage];
- int nplans;
- xl_heap_freeze_page xlrec;
- XLogRecPtr recptr;
-
- /* Prepare deduplicated representation for use in WAL record */
- nplans = heap_log_freeze_plan(tuples, ntuples, plans, offsets);
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(rel);
- xlrec.nplans = nplans;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapFreezePage);
-
- /*
- * The freeze plan array and offset array are not actually in the
- * buffer, but pretend that they are. When XLogInsert stores the
- * whole buffer, the arrays need not be stored too.
- */
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- XLogRegisterBufData(0, (char *) plans,
- nplans * sizeof(xl_heap_freeze_plan));
- XLogRegisterBufData(0, (char *) offsets,
- ntuples * sizeof(OffsetNumber));
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE_PAGE);
-
- PageSetLSN(page, recptr);
+ log_heap_prune_and_freeze(rel, buffer, snapshotConflictHorizon, false,
+ tuples, ntuples,
+ NULL, 0, /* redirected */
+ NULL, 0, /* dead */
+ NULL, 0); /* unused */
}
END_CRIT_SECTION();
}
-/*
- * Comparator used to deduplicate XLOG_HEAP2_FREEZE_PAGE freeze plans
- */
-static int
-heap_log_freeze_cmp(const void *arg1, const void *arg2)
-{
- HeapTupleFreeze *frz1 = (HeapTupleFreeze *) arg1;
- HeapTupleFreeze *frz2 = (HeapTupleFreeze *) arg2;
-
- if (frz1->xmax < frz2->xmax)
- return -1;
- else if (frz1->xmax > frz2->xmax)
- return 1;
-
- if (frz1->t_infomask2 < frz2->t_infomask2)
- return -1;
- else if (frz1->t_infomask2 > frz2->t_infomask2)
- return 1;
-
- if (frz1->t_infomask < frz2->t_infomask)
- return -1;
- else if (frz1->t_infomask > frz2->t_infomask)
- return 1;
-
- if (frz1->frzflags < frz2->frzflags)
- return -1;
- else if (frz1->frzflags > frz2->frzflags)
- return 1;
-
- /*
- * heap_log_freeze_eq would consider these tuple-wise plans to be equal.
- * (So the tuples will share a single canonical freeze plan.)
- *
- * We tiebreak on page offset number to keep each freeze plan's page
- * offset number array individually sorted. (Unnecessary, but be tidy.)
- */
- if (frz1->offset < frz2->offset)
- return -1;
- else if (frz1->offset > frz2->offset)
- return 1;
-
- Assert(false);
- return 0;
-}
-
-/*
- * Compare fields that describe actions required to freeze tuple with caller's
- * open plan. If everything matches then the frz tuple plan is equivalent to
- * caller's plan.
- */
-static inline bool
-heap_log_freeze_eq(xl_heap_freeze_plan *plan, HeapTupleFreeze *frz)
-{
- if (plan->xmax == frz->xmax &&
- plan->t_infomask2 == frz->t_infomask2 &&
- plan->t_infomask == frz->t_infomask &&
- plan->frzflags == frz->frzflags)
- return true;
-
- /* Caller must call heap_log_freeze_new_plan again for frz */
- return false;
-}
-
-/*
- * Start new plan initialized using tuple-level actions. At least one tuple
- * will have steps required to freeze described by caller's plan during REDO.
- */
-static inline void
-heap_log_freeze_new_plan(xl_heap_freeze_plan *plan, HeapTupleFreeze *frz)
-{
- plan->xmax = frz->xmax;
- plan->t_infomask2 = frz->t_infomask2;
- plan->t_infomask = frz->t_infomask;
- plan->frzflags = frz->frzflags;
- plan->ntuples = 1; /* for now */
-}
-
-/*
- * Deduplicate tuple-based freeze plans so that each distinct set of
- * processing steps is only stored once in XLOG_HEAP2_FREEZE_PAGE records.
- * Called during original execution of freezing (for logged relations).
- *
- * Return value is number of plans set in *plans_out for caller. Also writes
- * an array of offset numbers into *offsets_out output argument for caller
- * (actually there is one array per freeze plan, but that's not of immediate
- * concern to our caller).
- */
-static int
-heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
- xl_heap_freeze_plan *plans_out,
- OffsetNumber *offsets_out)
-{
- int nplans = 0;
-
- /* Sort tuple-based freeze plans in the order required to deduplicate */
- qsort(tuples, ntuples, sizeof(HeapTupleFreeze), heap_log_freeze_cmp);
-
- for (int i = 0; i < ntuples; i++)
- {
- HeapTupleFreeze *frz = tuples + i;
-
- if (i == 0)
- {
- /* New canonical freeze plan starting with first tup */
- heap_log_freeze_new_plan(plans_out, frz);
- nplans++;
- }
- else if (heap_log_freeze_eq(plans_out, frz))
- {
- /* tup matches open canonical plan -- include tup in it */
- Assert(offsets_out[i - 1] < frz->offset);
- plans_out->ntuples++;
- }
- else
- {
- /* Tup doesn't match current plan -- done with it now */
- plans_out++;
-
- /* New canonical freeze plan starting with this tup */
- heap_log_freeze_new_plan(plans_out, frz);
- nplans++;
- }
-
- /*
- * Save page offset number in dedicated buffer in passing.
- *
- * REDO routine relies on the record's offset numbers array grouping
- * offset numbers by freeze plan. The sort order within each grouping
- * is ascending offset number order, just to keep things tidy.
- */
- offsets_out[i] = frz->offset;
- }
-
- Assert(nplans > 0 && nplans <= ntuples);
-
- return nplans;
-}
-
/*
* heap_freeze_tuple
* Freeze tuple in place, without WAL logging.
@@ -8754,8 +8588,6 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
/*
* Handles XLOG_HEAP2_PRUNE record type.
- *
- * Acquires a full cleanup lock.
*/
static void
heap_xlog_prune(XLogReaderState *record)
@@ -8766,125 +8598,109 @@ heap_xlog_prune(XLogReaderState *record)
RelFileLocator rlocator;
BlockNumber blkno;
XLogRedoAction action;
+ bool get_cleanup_lock;
+ bool lp_truncate_only;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
- /*
- * We're about to remove tuples. In Hot Standby mode, ensure that there's
- * no queries running for which the removed tuples are still visible.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->isCatalogRel,
- rlocator);
+ lp_truncate_only = xlrec->flags & XLHP_LP_TRUNCATE_ONLY;
/*
- * If we have a full-page image, restore it (using a cleanup lock) and
- * we're done.
+ * If there are dead, redirected, or unused items set unused by
+ * heap_page_prune_and_freeze(), heap_page_prune_execute() will call
+ * PageRepairFragementation() which expects a full cleanup lock.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true,
- &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- Page page = (Page) BufferGetPage(buffer);
- OffsetNumber *end;
- OffsetNumber *redirected;
- OffsetNumber *nowdead;
- OffsetNumber *nowunused;
- int nredirected;
- int ndead;
- int nunused;
- Size datalen;
-
- redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
-
- nredirected = xlrec->nredirected;
- ndead = xlrec->ndead;
- end = (OffsetNumber *) ((char *) redirected + datalen);
- nowdead = redirected + (nredirected * 2);
- nowunused = nowdead + ndead;
- nunused = (end - nowunused);
- Assert(nunused >= 0);
+ get_cleanup_lock = xlrec->flags & XLHP_HAS_REDIRECTIONS ||
+ xlrec->flags & XLHP_HAS_DEAD_ITEMS ||
+ (xlrec->flags & XLHP_HAS_NOW_UNUSED_ITEMS && !lp_truncate_only);
- /* Update all line pointers per the record, and repair fragmentation */
- heap_page_prune_execute(buffer,
- redirected, nredirected,
- nowdead, ndead,
- nowunused, nunused);
-
- /*
- * Note: we don't worry about updating the page's prunability hints.
- * At worst this will cause an extra prune cycle to occur soon.
- */
-
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
+ if (lp_truncate_only)
+ {
+ Assert(!(xlrec->flags & XLHP_HAS_REDIRECTIONS));
+ Assert(!(xlrec->flags & XLHP_HAS_DEAD_ITEMS));
+ Assert(xlrec->flags & XLHP_HAS_NOW_UNUSED_ITEMS);
}
- if (BufferIsValid(buffer))
+ /*
+ * We are either about to remove tuples or freeze them. In Hot Standby
+ * mode, ensure that there's no queries running for which any removed
+ * tuples are still visible or which consider the frozen xids as running.
+ */
+ if (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON && InHotStandby)
{
- Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
+ xlhp_conflict_horizon *horizon = (xlhp_conflict_horizon *) (xlrec + SizeOfHeapPrune);
- /*
- * After pruning records from a page, it's useful to update the FSM
- * about it, as it may cause the page become target for insertions
- * later even if vacuum decides not to visit it (which is possible if
- * gets marked all-visible.)
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+ ResolveRecoveryConflictWithSnapshot(horizon->xid,
+ xlrec->flags & XLHP_IS_CATALOG_REL,
+ rlocator);
}
-}
-
-/*
- * Handles XLOG_HEAP2_VACUUM record type.
- *
- * Acquires an ordinary exclusive lock only.
- */
-static void
-heap_xlog_vacuum(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_vacuum *xlrec = (xl_heap_vacuum *) XLogRecGetData(record);
- Buffer buffer;
- BlockNumber blkno;
- XLogRedoAction action;
/*
- * If we have a full-page image, restore it (without using a cleanup lock)
- * and we're done.
+ * If we have a full-page image, restore it and we're done.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, false,
- &buffer);
+ action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ get_cleanup_lock, &buffer);
+
if (action == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
- OffsetNumber *nowunused;
+ OffsetNumber *redirected = NULL;
+ OffsetNumber *nowdead = NULL;
+ OffsetNumber *nowunused = NULL;
+ int nredirected = 0;
+ int ndead = 0;
+ int nunused = 0;
+ int nplans = 0;
Size datalen;
- OffsetNumber *offnum;
+ xl_heap_freeze_plan *plans = NULL;
+ OffsetNumber *frz_offsets = NULL;
+ int curoff = 0;
- nowunused = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
+ char *cursor = XLogRecGetBlockData(record, 0, &datalen);
- /* Shouldn't be a record unless there's something to do */
- Assert(xlrec->nunused > 0);
+ heap_xlog_deserialize_prune_and_freeze(cursor, xlrec->flags,
+ &nredirected, &redirected,
+ &ndead, &nowdead,
+ &nunused, &nowunused,
+ &nplans, &plans, &frz_offsets);
+
+ /* Update all line pointers per the record, and repair fragmentation */
+ if (nredirected > 0 || ndead > 0 || nunused > 0)
+ heap_page_prune_execute(buffer, lp_truncate_only,
+ redirected, nredirected,
+ nowdead, ndead,
+ nowunused, nunused);
- /* Update all now-unused line pointers */
- offnum = nowunused;
- for (int i = 0; i < xlrec->nunused; i++)
+ for (int p = 0; p < nplans; p++)
{
- OffsetNumber off = *offnum++;
- ItemId lp = PageGetItemId(page, off);
+ HeapTupleFreeze frz;
- Assert(ItemIdIsDead(lp) && !ItemIdHasStorage(lp));
- ItemIdSetUnused(lp);
+ /*
+ * Convert freeze plan representation from WAL record into
+ * per-tuple format used by heap_execute_freeze_tuple
+ */
+ frz.xmax = plans[p].xmax;
+ frz.t_infomask2 = plans[p].t_infomask2;
+ frz.t_infomask = plans[p].t_infomask;
+ frz.frzflags = plans[p].frzflags;
+ frz.offset = InvalidOffsetNumber; /* unused, but be tidy */
+
+ for (int i = 0; i < plans[p].ntuples; i++)
+ {
+ OffsetNumber offset = frz_offsets[curoff++];
+ ItemId lp;
+ HeapTupleHeader tuple;
+
+ lp = PageGetItemId(page, offset);
+ tuple = (HeapTupleHeader) PageGetItem(page, lp);
+ heap_execute_freeze_tuple(tuple, &frz);
+ }
}
- /* Attempt to truncate line pointer array now */
- PageTruncateLinePointerArray(page);
+ /*
+ * Note: we don't worry about updating the page's prunability hints.
+ * At worst this will cause an extra prune cycle to occur soon.
+ */
PageSetLSN(page, lsn);
MarkBufferDirty(buffer);
@@ -8893,17 +8709,14 @@ heap_xlog_vacuum(XLogReaderState *record)
if (BufferIsValid(buffer))
{
Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
- RelFileLocator rlocator;
-
- XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
UnlockReleaseBuffer(buffer);
/*
- * After vacuuming LP_DEAD items from a page, it's useful to update
- * the FSM about it, as it may cause the page become target for
- * insertions later even if vacuum decides not to visit it (which is
- * possible if gets marked all-visible.)
+ * After modifying records on a page, it's useful to update the FSM
+ * about it, as it may cause the page become target for insertions
+ * later even if vacuum decides not to visit it (which is possible if
+ * gets marked all-visible.)
*
* Do this regardless of a full-page image being applied, since the
* FSM data is not in the page anyway.
@@ -9049,74 +8862,6 @@ heap_xlog_visible(XLogReaderState *record)
UnlockReleaseBuffer(vmbuffer);
}
-/*
- * Replay XLOG_HEAP2_FREEZE_PAGE records
- */
-static void
-heap_xlog_freeze_page(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_freeze_page *xlrec = (xl_heap_freeze_page *) XLogRecGetData(record);
- Buffer buffer;
-
- /*
- * In Hot Standby mode, ensure that there's no queries running which still
- * consider the frozen xids as running.
- */
- if (InHotStandby)
- {
- RelFileLocator rlocator;
-
- XLogRecGetBlockTag(record, 0, &rlocator, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->isCatalogRel,
- rlocator);
- }
-
- if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
- {
- Page page = BufferGetPage(buffer);
- xl_heap_freeze_plan *plans;
- OffsetNumber *offsets;
- int curoff = 0;
-
- plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
- offsets = (OffsetNumber *) ((char *) plans +
- (xlrec->nplans *
- sizeof(xl_heap_freeze_plan)));
- for (int p = 0; p < xlrec->nplans; p++)
- {
- HeapTupleFreeze frz;
-
- /*
- * Convert freeze plan representation from WAL record into
- * per-tuple format used by heap_execute_freeze_tuple
- */
- frz.xmax = plans[p].xmax;
- frz.t_infomask2 = plans[p].t_infomask2;
- frz.t_infomask = plans[p].t_infomask;
- frz.frzflags = plans[p].frzflags;
- frz.offset = InvalidOffsetNumber; /* unused, but be tidy */
-
- for (int i = 0; i < plans[p].ntuples; i++)
- {
- OffsetNumber offset = offsets[curoff++];
- ItemId lp;
- HeapTupleHeader tuple;
-
- lp = PageGetItemId(page, offset);
- tuple = (HeapTupleHeader) PageGetItem(page, lp);
- heap_execute_freeze_tuple(tuple, &frz);
- }
- }
-
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
- }
- if (BufferIsValid(buffer))
- UnlockReleaseBuffer(buffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -10020,12 +9765,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE:
heap_xlog_prune(record);
break;
- case XLOG_HEAP2_VACUUM:
- heap_xlog_vacuum(record);
- break;
- case XLOG_HEAP2_FREEZE_PAGE:
- heap_xlog_freeze_page(record);
- break;
case XLOG_HEAP2_VISIBLE:
heap_xlog_visible(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 69332b0d25c..9773681868c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -338,7 +338,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* Apply the planned item changes, then repair page fragmentation, and
* update the page's hint bit about whether it has free line pointers.
*/
- heap_page_prune_execute(buffer,
+ heap_page_prune_execute(buffer, false,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
@@ -363,40 +363,13 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
- xl_heap_prune xlrec;
- XLogRecPtr recptr;
-
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
- xlrec.nredirected = prstate.nredirected;
- xlrec.ndead = prstate.ndead;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
-
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
-
- /*
- * The OffsetNumber arrays are not actually in the buffer, but we
- * pretend that they are. When XLogInsert stores the whole
- * buffer, the offset arrays need not be stored too.
- */
- if (prstate.nredirected > 0)
- XLogRegisterBufData(0, (char *) prstate.redirected,
- prstate.nredirected *
- sizeof(OffsetNumber) * 2);
-
- if (prstate.ndead > 0)
- XLogRegisterBufData(0, (char *) prstate.nowdead,
- prstate.ndead * sizeof(OffsetNumber));
-
- if (prstate.nunused > 0)
- XLogRegisterBufData(0, (char *) prstate.nowunused,
- prstate.nunused * sizeof(OffsetNumber));
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
-
- PageSetLSN(BufferGetPage(buffer), recptr);
+ log_heap_prune_and_freeze(relation, buffer,
+ prstate.snapshotConflictHorizon,
+ false,
+ NULL, 0,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
}
}
else
@@ -826,12 +799,14 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
/*
- * Perform the actual page changes needed by heap_page_prune.
- * It is expected that the caller has a full cleanup lock on the
- * buffer.
+ * Perform the actual page pruning modifications needed by
+ * heap_page_prune_and_freeze().
+ *
+ * Unless 'lp_truncate_only' is set, it is expected that the caller has a full
+ * cleanup lock on the buffer.
*/
void
-heap_page_prune_execute(Buffer buffer,
+heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
OffsetNumber *nowunused, int nunused)
@@ -843,6 +818,9 @@ heap_page_prune_execute(Buffer buffer,
/* Shouldn't be called unless there's something to do */
Assert(nredirected > 0 || ndead > 0 || nunused > 0);
+ /* We can only remove already-dead line pointers with 'lp_truncate_only' */
+ Assert(!lp_truncate_only || (nredirected == 0 && ndead == 0));
+
/* Update all redirected line pointers */
offnum = redirected;
for (int i = 0; i < nredirected; i++)
@@ -941,23 +919,29 @@ heap_page_prune_execute(Buffer buffer,
#ifdef USE_ASSERT_CHECKING
- /*
- * When heap_page_prune() was called, mark_unused_now may have been
- * passed as true, which allows would-be LP_DEAD items to be made
- * LP_UNUSED instead. This is only possible if the relation has no
- * indexes. If there are any dead items, then mark_unused_now was not
- * true and every item being marked LP_UNUSED must refer to a
- * heap-only tuple.
- */
- if (ndead > 0)
+ if (lp_truncate_only)
{
- Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
- htup = (HeapTupleHeader) PageGetItem(page, lp);
- Assert(HeapTupleHeaderIsHeapOnly(htup));
+ /* Setting LP_DEAD to LP_UNUSED in vacuum's second pass */
+ Assert(ItemIdIsDead(lp) && !ItemIdHasStorage(lp));
}
else
{
- Assert(ItemIdIsUsed(lp));
+ /*
+ * When heap_page_prune_and_freeze() was called, mark_unused_now
+ * may have been passed as true, which allows would-be LP_DEAD
+ * items to be made LP_UNUSED instead. This is only possible if
+ * the relation has no indexes. If there are any dead items, then
+ * mark_unused_now was not true and every item being marked
+ * LP_UNUSED must refer to a heap-only tuple.
+ */
+ if (ndead > 0)
+ {
+ Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
+ htup = (HeapTupleHeader) PageGetItem(page, lp);
+ Assert(HeapTupleHeaderIsHeapOnly(htup));
+ }
+ else
+ Assert(ItemIdIsUsed(lp));
}
#endif
@@ -965,17 +949,22 @@ heap_page_prune_execute(Buffer buffer,
ItemIdSetUnused(lp);
}
- /*
- * Finally, repair any fragmentation, and update the page's hint bit about
- * whether it has free pointers.
- */
- PageRepairFragmentation(page);
+ if (lp_truncate_only)
+ PageTruncateLinePointerArray(page);
+ else
+ {
+ /*
+ * Finally, repair any fragmentation, and update the page's hint bit
+ * about whether it has free pointers.
+ */
+ PageRepairFragmentation(page);
- /*
- * Now that the page has been modified, assert that redirect items still
- * point to valid targets.
- */
- page_verify_redirects(page);
+ /*
+ * Now that the page has been modified, assert that redirect items
+ * still point to valid targets.
+ */
+ page_verify_redirects(page);
+ }
}
@@ -1144,3 +1133,271 @@ heap_get_root_tuples(Page page, OffsetNumber *root_offsets)
}
}
}
+
+
+/*
+ * Compare fields that describe actions required to freeze tuple with caller's
+ * open plan. If everything matches then the frz tuple plan is equivalent to
+ * caller's plan.
+ */
+static inline bool
+heap_log_freeze_eq(xl_heap_freeze_plan *plan, HeapTupleFreeze *frz)
+{
+ if (plan->xmax == frz->xmax &&
+ plan->t_infomask2 == frz->t_infomask2 &&
+ plan->t_infomask == frz->t_infomask &&
+ plan->frzflags == frz->frzflags)
+ return true;
+
+ /* Caller must call heap_log_freeze_new_plan again for frz */
+ return false;
+}
+
+
+/*
+ * Comparator used to deduplicate XLOG_HEAP2_FREEZE_PAGE freeze plans
+ */
+static int
+heap_log_freeze_cmp(const void *arg1, const void *arg2)
+{
+ HeapTupleFreeze *frz1 = (HeapTupleFreeze *) arg1;
+ HeapTupleFreeze *frz2 = (HeapTupleFreeze *) arg2;
+
+ if (frz1->xmax < frz2->xmax)
+ return -1;
+ else if (frz1->xmax > frz2->xmax)
+ return 1;
+
+ if (frz1->t_infomask2 < frz2->t_infomask2)
+ return -1;
+ else if (frz1->t_infomask2 > frz2->t_infomask2)
+ return 1;
+
+ if (frz1->t_infomask < frz2->t_infomask)
+ return -1;
+ else if (frz1->t_infomask > frz2->t_infomask)
+ return 1;
+
+ if (frz1->frzflags < frz2->frzflags)
+ return -1;
+ else if (frz1->frzflags > frz2->frzflags)
+ return 1;
+
+ /*
+ * heap_log_freeze_eq would consider these tuple-wise plans to be equal.
+ * (So the tuples will share a single canonical freeze plan.)
+ *
+ * We tiebreak on page offset number to keep each freeze plan's page
+ * offset number array individually sorted. (Unnecessary, but be tidy.)
+ */
+ if (frz1->offset < frz2->offset)
+ return -1;
+ else if (frz1->offset > frz2->offset)
+ return 1;
+
+ Assert(false);
+ return 0;
+}
+
+/*
+ * Start new plan initialized using tuple-level actions. At least one tuple
+ * will have steps required to freeze described by caller's plan during REDO.
+ */
+static inline void
+heap_log_freeze_new_plan(xl_heap_freeze_plan *plan, HeapTupleFreeze *frz)
+{
+ plan->xmax = frz->xmax;
+ plan->t_infomask2 = frz->t_infomask2;
+ plan->t_infomask = frz->t_infomask;
+ plan->frzflags = frz->frzflags;
+ plan->ntuples = 1; /* for now */
+}
+
+/*
+ * Deduplicate tuple-based freeze plans so that each distinct set of
+ * processing steps is only stored once in XLOG_HEAP2_FREEZE_PAGE records.
+ * Called during original execution of freezing (for logged relations).
+ *
+ * Return value is number of plans set in *plans_out for caller. Also writes
+ * an array of offset numbers into *offsets_out output argument for caller
+ * (actually there is one array per freeze plan, but that's not of immediate
+ * concern to our caller).
+ */
+static int
+heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
+ xl_heap_freeze_plan *plans_out,
+ OffsetNumber *offsets_out)
+{
+ int nplans = 0;
+
+ /* Sort tuple-based freeze plans in the order required to deduplicate */
+ qsort(tuples, ntuples, sizeof(HeapTupleFreeze), heap_log_freeze_cmp);
+
+ for (int i = 0; i < ntuples; i++)
+ {
+ HeapTupleFreeze *frz = tuples + i;
+
+ if (i == 0)
+ {
+ /* New canonical freeze plan starting with first tup */
+ heap_log_freeze_new_plan(plans_out, frz);
+ nplans++;
+ }
+ else if (heap_log_freeze_eq(plans_out, frz))
+ {
+ /* tup matches open canonical plan -- include tup in it */
+ Assert(offsets_out[i - 1] < frz->offset);
+ plans_out->ntuples++;
+ }
+ else
+ {
+ /* Tup doesn't match current plan -- done with it now */
+ plans_out++;
+
+ /* New canonical freeze plan starting with this tup */
+ heap_log_freeze_new_plan(plans_out, frz);
+ nplans++;
+ }
+
+ /*
+ * Save page offset number in dedicated buffer in passing.
+ *
+ * REDO routine relies on the record's offset numbers array grouping
+ * offset numbers by freeze plan. The sort order within each grouping
+ * is ascending offset number order, just to keep things tidy.
+ */
+ offsets_out[i] = frz->offset;
+ }
+
+ Assert(nplans > 0 && nplans <= ntuples);
+
+ return nplans;
+}
+
+void
+log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ TransactionId conflict_xid,
+ bool lp_truncate_only,
+ HeapTupleFreeze *frozen, int nfrozen,
+ OffsetNumber *redirected, int nredirected,
+ OffsetNumber *dead, int ndead,
+ OffsetNumber *unused, int nunused)
+{
+ xl_heap_prune xlrec;
+ xlhp_conflict_horizon horizon;
+ XLogRecPtr recptr;
+ xlhp_freeze freeze;
+ xlhp_prune_items redirect_items,
+ dead_items,
+ unused_items;
+
+ int nplans = 0;
+ xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
+ OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_freeze = (nfrozen > 0);
+
+ xlrec.flags = 0;
+
+ if (lp_truncate_only)
+ {
+ xlrec.flags |= XLHP_LP_TRUNCATE_ONLY;
+ Assert(nfrozen == 0 && nredirected == 0 && ndead == 0);
+ }
+
+ if (RelationIsAccessibleInLogicalDecoding(relation))
+ xlrec.flags |= XLHP_IS_CATALOG_REL;
+
+ if (TransactionIdIsValid(conflict_xid))
+ xlrec.flags |= XLHP_HAS_CONFLICT_HORIZON;
+
+ /*
+ * Prepare deduplicated representation for use in WAL record Destructively
+ * sorts tuples array in-place.
+ */
+ if (do_freeze)
+ nplans = heap_log_freeze_plan(frozen, nfrozen, plans, frz_offsets);
+ XLogBeginInsert();
+ XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
+
+ if (TransactionIdIsValid(conflict_xid))
+ XLogRegisterData((char *) &horizon, SizeOfSnapshotConflictHorizon);
+
+ XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+
+ /*
+ * The OffsetNumber arrays are not actually in the buffer, but we pretend
+ * that they are. When XLogInsert stores the whole buffer, the offset
+ * arrays need not be stored too.
+ */
+ if (nplans > 0)
+ {
+ xlrec.flags |= XLHP_HAS_FREEZE_PLANS;
+
+ freeze = (xlhp_freeze)
+ {
+ .nplans = nplans
+ };
+
+ XLogRegisterBufData(0, (char *) &freeze, offsetof(xlhp_freeze, plans));
+
+ XLogRegisterBufData(0, (char *) plans,
+ sizeof(xl_heap_freeze_plan) * freeze.nplans);
+ }
+
+
+ if (nredirected > 0)
+ {
+ xlrec.flags |= XLHP_HAS_REDIRECTIONS;
+
+ redirect_items = (xlhp_prune_items)
+ {
+ .ntargets = nredirected
+ };
+
+ XLogRegisterBufData(0, (char *) &redirect_items,
+ offsetof(xlhp_prune_items, data));
+
+ XLogRegisterBufData(0, (char *) redirected,
+ sizeof(OffsetNumber[2]) * nredirected);
+ }
+
+ if (ndead > 0)
+ {
+ xlrec.flags |= XLHP_HAS_DEAD_ITEMS;
+
+ dead_items = (xlhp_prune_items)
+ {
+ .ntargets = ndead
+ };
+
+ XLogRegisterBufData(0, (char *) &dead_items,
+ offsetof(xlhp_prune_items, data));
+
+ XLogRegisterBufData(0, (char *) dead,
+ sizeof(OffsetNumber) * dead_items.ntargets);
+ }
+
+ if (nunused > 0)
+ {
+ xlrec.flags |= XLHP_HAS_NOW_UNUSED_ITEMS;
+
+ unused_items = (xlhp_prune_items)
+ {
+ .ntargets = nunused
+ };
+
+ XLogRegisterBufData(0, (char *) &unused_items,
+ offsetof(xlhp_prune_items, data));
+
+ XLogRegisterBufData(0, (char *) unused,
+ sizeof(OffsetNumber) * unused_items.ntargets);
+ }
+
+ if (nplans > 0)
+ XLogRegisterBufData(0, (char *) frz_offsets,
+ sizeof(OffsetNumber) * nfrozen);
+
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+
+ PageSetLSN(BufferGetPage(buffer), recptr);
+}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 18004907750..25e8f0c30a7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2546,20 +2546,12 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* XLOG stuff */
if (RelationNeedsWAL(vacrel->rel))
{
- xl_heap_vacuum xlrec;
- XLogRecPtr recptr;
-
- xlrec.nunused = nunused;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapVacuum);
-
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- XLogRegisterBufData(0, (char *) unused, nunused * sizeof(OffsetNumber));
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VACUUM);
-
- PageSetLSN(page, recptr);
+ log_heap_prune_and_freeze(vacrel->rel, buffer,
+ InvalidTransactionId, true,
+ NULL, 0, /* frozen */
+ NULL, 0, /* redirected */
+ NULL, 0, /* dead */
+ unused, nunused);
}
/*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 36a3d83c8c2..0d7edffff20 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -91,6 +91,74 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
appendStringInfoString(buf, " }");
}
+
+/*
+ * Given a MAXALIGNed buffer returned by XLogRecGetBlockData() and pointed to
+ * by cursor and any xl_heap_prune flags, deserialize the arrays of
+ * OffsetNumbers contained in an xl_heap_prune record. This is in this file so
+ * it can be shared between heap2_redo and heap2_desc code, the latter of which
+ * is used in frontend code.
+ */
+void
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+ int *nredirected, OffsetNumber **redirected,
+ int *ndead, OffsetNumber **nowdead,
+ int *nunused, OffsetNumber **nowunused,
+ int *nplans, xl_heap_freeze_plan **plans,
+ OffsetNumber **frz_offsets)
+{
+ if (flags & XLHP_HAS_FREEZE_PLANS)
+ {
+ xlhp_freeze *freeze = (xlhp_freeze *) cursor;
+
+ *nplans = freeze->nplans;
+ Assert(*nplans > 0);
+ *plans = freeze->plans;
+
+ cursor += offsetof(xlhp_freeze, plans);
+ cursor += sizeof(xl_heap_freeze_plan) * freeze->nplans;
+ }
+
+ if (flags & XLHP_HAS_REDIRECTIONS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ *nredirected = subrecord->ntargets;
+ Assert(nredirected > 0);
+ *redirected = &subrecord->data[0];
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber[2]) * *nredirected;
+ }
+
+ if (flags & XLHP_HAS_DEAD_ITEMS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ *ndead = subrecord->ntargets;
+ Assert(ndead > 0);
+ *nowdead = subrecord->data;
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber) * *ndead;
+ }
+
+ if (flags & XLHP_HAS_NOW_UNUSED_ITEMS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ *nunused = subrecord->ntargets;
+ Assert(nunused > 0);
+ *nowunused = subrecord->data;
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber) * *nunused;
+ }
+
+ if (nplans > 0)
+ *frz_offsets = (OffsetNumber *) cursor;
+}
+
void
heap_desc(StringInfo buf, XLogReaderState *record)
{
@@ -179,82 +247,68 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
{
xl_heap_prune *xlrec = (xl_heap_prune *) rec;
- appendStringInfo(buf, "snapshotConflictHorizon: %u, nredirected: %u, ndead: %u, isCatalogRel: %c",
- xlrec->snapshotConflictHorizon,
- xlrec->nredirected,
- xlrec->ndead,
- xlrec->isCatalogRel ? 'T' : 'F');
-
- if (XLogRecHasBlockData(record, 0))
- {
- OffsetNumber *end;
- OffsetNumber *redirected;
- OffsetNumber *nowdead;
- OffsetNumber *nowunused;
- int nredirected;
- int nunused;
- Size datalen;
-
- redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0,
- &datalen);
-
- nredirected = xlrec->nredirected;
- end = (OffsetNumber *) ((char *) redirected + datalen);
- nowdead = redirected + (nredirected * 2);
- nowunused = nowdead + xlrec->ndead;
- nunused = (end - nowunused);
- Assert(nunused >= 0);
-
- appendStringInfo(buf, ", nunused: %d", nunused);
-
- appendStringInfoString(buf, ", redirected:");
- array_desc(buf, redirected, sizeof(OffsetNumber) * 2,
- nredirected, &redirect_elem_desc, NULL);
- appendStringInfoString(buf, ", dead:");
- array_desc(buf, nowdead, sizeof(OffsetNumber), xlrec->ndead,
- &offset_elem_desc, NULL);
- appendStringInfoString(buf, ", unused:");
- array_desc(buf, nowunused, sizeof(OffsetNumber), nunused,
- &offset_elem_desc, NULL);
- }
- }
- else if (info == XLOG_HEAP2_VACUUM)
- {
- xl_heap_vacuum *xlrec = (xl_heap_vacuum *) rec;
-
- appendStringInfo(buf, "nunused: %u", xlrec->nunused);
-
- if (XLogRecHasBlockData(record, 0))
+ if (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON)
{
- OffsetNumber *nowunused;
-
- nowunused = (OffsetNumber *) XLogRecGetBlockData(record, 0, NULL);
+ xlhp_conflict_horizon *horizon = (xlhp_conflict_horizon *) (xlrec + SizeOfHeapPrune);
- appendStringInfoString(buf, ", unused:");
- array_desc(buf, nowunused, sizeof(OffsetNumber), xlrec->nunused,
- &offset_elem_desc, NULL);
+ appendStringInfo(buf, "snapshotConflictHorizon: %u",
+ horizon->xid);
}
- }
- else if (info == XLOG_HEAP2_FREEZE_PAGE)
- {
- xl_heap_freeze_page *xlrec = (xl_heap_freeze_page *) rec;
- appendStringInfo(buf, "snapshotConflictHorizon: %u, nplans: %u, isCatalogRel: %c",
- xlrec->snapshotConflictHorizon, xlrec->nplans,
- xlrec->isCatalogRel ? 'T' : 'F');
+ appendStringInfo(buf, ", isCatalogRel: %c",
+ xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
if (XLogRecHasBlockData(record, 0))
{
- xl_heap_freeze_plan *plans;
- OffsetNumber *offsets;
-
- plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
- offsets = (OffsetNumber *) ((char *) plans +
- (xlrec->nplans *
- sizeof(xl_heap_freeze_plan)));
- appendStringInfoString(buf, ", plans:");
- array_desc(buf, plans, sizeof(xl_heap_freeze_plan), xlrec->nplans,
- &plan_elem_desc, &offsets);
+ Size datalen;
+ OffsetNumber *redirected = NULL;
+ OffsetNumber *nowdead = NULL;
+ OffsetNumber *nowunused = NULL;
+ int nredirected = 0;
+ int nunused = 0;
+ int ndead = 0;
+ int nplans = 0;
+ xl_heap_freeze_plan *plans = NULL;
+ OffsetNumber *frz_offsets;
+
+ char *cursor = XLogRecGetBlockData(record, 0, &datalen);
+
+ heap_xlog_deserialize_prune_and_freeze(cursor, xlrec->flags,
+ &nredirected, &redirected,
+ &ndead, &nowdead,
+ &nunused, &nowunused,
+ &nplans, &plans, &frz_offsets);
+
+ appendStringInfo(buf, ", nredirected: %u, ndead: %u, nunused: %u, nplans: %u,",
+ nredirected, ndead, nunused, nplans);
+
+ if (nredirected > 0)
+ {
+ appendStringInfoString(buf, ", redirected:");
+ array_desc(buf, redirected, sizeof(OffsetNumber) * 2,
+ nredirected, &redirect_elem_desc, NULL);
+ }
+
+ if (ndead > 0)
+ {
+ appendStringInfoString(buf, ", dead:");
+ array_desc(buf, nowdead, sizeof(OffsetNumber), ndead,
+ &offset_elem_desc, NULL);
+ }
+
+ if (nunused > 0)
+ {
+ appendStringInfoString(buf, ", unused:");
+ array_desc(buf, nowunused, sizeof(OffsetNumber), nunused,
+ &offset_elem_desc, NULL);
+ }
+
+ if (nplans > 0)
+ {
+ appendStringInfoString(buf, ", plans:");
+ array_desc(buf, plans, sizeof(xl_heap_freeze_plan), nplans,
+ &plan_elem_desc, &frz_offsets);
+ }
}
}
else if (info == XLOG_HEAP2_VISIBLE)
@@ -358,12 +412,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE:
id = "PRUNE";
break;
- case XLOG_HEAP2_VACUUM:
- id = "VACUUM";
- break;
- case XLOG_HEAP2_FREEZE_PAGE:
- id = "FREEZE_PAGE";
- break;
case XLOG_HEAP2_VISIBLE:
id = "VISIBLE";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index e5ab7b78b78..38d1bdd825e 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -445,9 +445,7 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
* Everything else here is just low level physical stuff we're not
* interested in.
*/
- case XLOG_HEAP2_FREEZE_PAGE:
case XLOG_HEAP2_PRUNE:
- case XLOG_HEAP2_VACUUM:
case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f68593..ca6ddab91ea 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -323,11 +323,18 @@ extern void heap_page_prune(Relation relation, Buffer buffer,
bool mark_unused_now,
PruneResult *presult,
OffsetNumber *off_loc);
-extern void heap_page_prune_execute(Buffer buffer,
+extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
+extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ TransactionId conflict_xid,
+ bool lp_truncate_only,
+ HeapTupleFreeze *frozen, int nfrozen,
+ OffsetNumber *redirected, int nredirected,
+ OffsetNumber *dead, int ndead,
+ OffsetNumber *unused, int nunused);
/* in heap/vacuumlazy.c */
struct VacuumParams;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 6488dad5e64..dfeb703d136 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -52,12 +52,10 @@
*/
#define XLOG_HEAP2_REWRITE 0x00
#define XLOG_HEAP2_PRUNE 0x10
-#define XLOG_HEAP2_VACUUM 0x20
-#define XLOG_HEAP2_FREEZE_PAGE 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
-#define XLOG_HEAP2_MULTI_INSERT 0x50
-#define XLOG_HEAP2_LOCK_UPDATED 0x60
-#define XLOG_HEAP2_NEW_CID 0x70
+#define XLOG_HEAP2_VISIBLE 0x20
+#define XLOG_HEAP2_MULTI_INSERT 0x30
+#define XLOG_HEAP2_LOCK_UPDATED 0x40
+#define XLOG_HEAP2_NEW_CID 0x50
/*
* xl_heap_insert/xl_heap_multi_insert flag values, 8 bits are available.
@@ -227,44 +225,108 @@ typedef struct xl_heap_update
#define SizeOfHeapUpdate (offsetof(xl_heap_update, new_offnum) + sizeof(OffsetNumber))
/*
- * This is what we need to know about page pruning (both during VACUUM and
- * during opportunistic pruning)
+ * This is what we need to know about page pruning and freezing, both during
+ * VACUUM and during opportunistic pruning.
*
- * The array of OffsetNumbers following the fixed part of the record contains:
- * * for each redirected item: the item offset, then the offset redirected to
- * * for each now-dead item: the item offset
- * * for each now-unused item: the item offset
- * The total number of OffsetNumbers is therefore 2*nredirected+ndead+nunused.
- * Note that nunused is not explicitly stored, but may be found by reference
- * to the total record length.
+ * If XLPH_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, or XLHP_HAS_NOW_UNUSED is set,
+ * acquires a full cleanup lock. Otherwise an ordinary exclusive lock is
+ * enough. This can happen if freezing was the only modification to the page.
*
- * Acquires a full cleanup lock.
+ * The data for block reference 0 contains "sub-records" depending on which
+ * of the XLHP_HAS_* flags are set. See xlhp_* struct definitions below.
+ * The layout is in the same order as the XLHP_* flags.
+ *
+ * OFFSET NUMBERS are in the block reference 0
+ *
+ * If only unused item offsets are included because the record is constructed
+ * during vacuum's second pass (marking LP_DEAD items LP_UNUSED) then only an
+ * ordinary exclusive lock is required to replay.
*/
typedef struct xl_heap_prune
{
- TransactionId snapshotConflictHorizon;
- uint16 nredirected;
- uint16 ndead;
- bool isCatalogRel; /* to handle recovery conflict during logical
- * decoding on standby */
- /* OFFSET NUMBERS are in the block reference 0 */
+ uint8 flags;
} xl_heap_prune;
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, isCatalogRel) + sizeof(bool))
+/* to handle recovery conflict during logical decoding on standby */
+#define XLHP_IS_CATALOG_REL (1 << 1)
+
+/*
+ * During vacuum's second pass which sets LP_DEAD items LP_UNUSED, we will only
+ * truncate the line pointer array, not call PageRepairFragmentation. We need
+ * this flag to differentiate what kind of lock (exclusive or cleanup) to take
+ * on the buffer and whether to call PageTruncateLinePointerArray() or
+ * PageRepairFragementation().
+ */
+#define XLHP_LP_TRUNCATE_ONLY (1 << 2)
+
+/*
+ * Vacuum's first pass and on-access pruning may need to include a snapshot
+ * conflict horizon.
+ */
+#define XLHP_HAS_CONFLICT_HORIZON (1 << 3)
+#define XLHP_HAS_FREEZE_PLANS (1 << 4)
+#define XLHP_HAS_REDIRECTIONS (1 << 5)
+#define XLHP_HAS_DEAD_ITEMS (1 << 6)
+#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 7)
+
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+
+typedef struct xlhp_conflict_horizon
+{
+ TransactionId xid;
+} xlhp_conflict_horizon;
+
+#define SizeOfSnapshotConflictHorizon (offsetof(xlhp_conflict_horizon, xid) + sizeof(uint32))
/*
- * The vacuum page record is similar to the prune record, but can only mark
- * already LP_DEAD items LP_UNUSED (during VACUUM's second heap pass)
+ * This struct represents a 'freeze plan', which describes how to freeze a
+ * group of one or more heap tuples (appears in xl_heap_prune's xlhp_freeze
+ * record)
+ */
+/* 0x01 was XLH_FREEZE_XMIN */
+#define XLH_FREEZE_XVAC 0x02
+#define XLH_INVALID_XVAC 0x04
+
+typedef struct xl_heap_freeze_plan
+{
+ TransactionId xmax;
+ uint16 t_infomask2;
+ uint16 t_infomask;
+ uint8 frzflags;
+
+ /* Length of individual page offset numbers array for this plan */
+ uint16 ntuples;
+} xl_heap_freeze_plan;
+
+/*
+ * As of Postgres 17, XLOG_HEAP2_PRUNE records replace
+ * XLOG_HEAP2_FREEZE_PAGE records.
*
- * Acquires an ordinary exclusive lock only.
+ * This is what we need to know about a block being frozen during vacuum
+ *
+ * Backup block 0's data contains an array of xl_heap_freeze_plan structs
+ * (with nplans elements), followed by one or more page offset number arrays.
+ * Each such page offset number array corresponds to a single freeze plan
+ * (REDO routine freezes corresponding heap tuples using freeze plan).
+ */
+typedef struct xlhp_freeze
+{
+ uint16 nplans;
+ xl_heap_freeze_plan plans[FLEXIBLE_ARRAY_MEMBER];
+} xlhp_freeze;
+
+/*
+ * Sub-record type contained in block reference 0 of a prune record if
+ * XLHP_HAS_REDIRECTIONS/XLHP_HAS_DEAD_ITEMS/XLHP_HAS_NOW_UNUSED_ITEMS is set.
+ * Note that in the XLHP_HAS_REDIRECTIONS variant, there are actually 2 *
+ * length number of OffsetNumbers in the data.
*/
-typedef struct xl_heap_vacuum
+typedef struct xlhp_prune_items
{
- uint16 nunused;
- /* OFFSET NUMBERS are in the block reference 0 */
-} xl_heap_vacuum;
+ uint16 ntargets;
+ OffsetNumber data[FLEXIBLE_ARRAY_MEMBER];
+} xlhp_prune_items;
-#define SizeOfHeapVacuum (offsetof(xl_heap_vacuum, nunused) + sizeof(uint16))
/* flags for infobits_set */
#define XLHL_XMAX_IS_MULTI 0x01
@@ -315,47 +377,6 @@ typedef struct xl_heap_inplace
#define SizeOfHeapInplace (offsetof(xl_heap_inplace, offnum) + sizeof(OffsetNumber))
-/*
- * This struct represents a 'freeze plan', which describes how to freeze a
- * group of one or more heap tuples (appears in xl_heap_freeze_page record)
- */
-/* 0x01 was XLH_FREEZE_XMIN */
-#define XLH_FREEZE_XVAC 0x02
-#define XLH_INVALID_XVAC 0x04
-
-typedef struct xl_heap_freeze_plan
-{
- TransactionId xmax;
- uint16 t_infomask2;
- uint16 t_infomask;
- uint8 frzflags;
-
- /* Length of individual page offset numbers array for this plan */
- uint16 ntuples;
-} xl_heap_freeze_plan;
-
-/*
- * This is what we need to know about a block being frozen during vacuum
- *
- * Backup block 0's data contains an array of xl_heap_freeze_plan structs
- * (with nplans elements), followed by one or more page offset number arrays.
- * Each such page offset number array corresponds to a single freeze plan
- * (REDO routine freezes corresponding heap tuples using freeze plan).
- */
-typedef struct xl_heap_freeze_page
-{
- TransactionId snapshotConflictHorizon;
- uint16 nplans;
- bool isCatalogRel; /* to handle recovery conflict during logical
- * decoding on standby */
-
- /*
- * In payload of blk 0 : FREEZE PLANS and OFFSET NUMBER ARRAY
- */
-} xl_heap_freeze_page;
-
-#define SizeOfHeapFreezePage (offsetof(xl_heap_freeze_page, isCatalogRel) + sizeof(bool))
-
/*
* This is what we need to know about setting a visibility map bit
*
@@ -418,4 +439,11 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
TransactionId snapshotConflictHorizon,
uint8 vmflags);
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+ int *nredirected, OffsetNumber **redirected,
+ int *ndead, OffsetNumber **nowdead,
+ int *nunused, OffsetNumber **nowunused,
+ int *nplans, xl_heap_freeze_plan **plans,
+ OffsetNumber **frz_offsets);
+
#endif /* HEAPAM_XLOG_H */
--
2.39.2
v5-0002-Keep-the-original-numbers-for-existing-WAL-record.patchtext/x-patch; charset=UTF-8; name=v5-0002-Keep-the-original-numbers-for-existing-WAL-record.patchDownload
From cd6cdaebb362b014733e99ecd868896caf0fb3aa Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 13:45:01 +0200
Subject: [PATCH v5 02/26] Keep the original numbers for existing WAL records
Doesn't matter much because the WAL format is not compatible across
major versions anyway. But still seems nice to keep the identifiers
unchanged when we can. (There's some precedence for this if you search
the git history for "is free, was").
---
src/include/access/heapam_xlog.h | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index dfeb703d136..6a934be7ecc 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -52,10 +52,12 @@
*/
#define XLOG_HEAP2_REWRITE 0x00
#define XLOG_HEAP2_PRUNE 0x10
-#define XLOG_HEAP2_VISIBLE 0x20
-#define XLOG_HEAP2_MULTI_INSERT 0x30
-#define XLOG_HEAP2_LOCK_UPDATED 0x40
-#define XLOG_HEAP2_NEW_CID 0x50
+/* 0x20 is free, was XLOG_HEAP2_VACUUM */
+/* 0x30 is free, was XLOG_HEAP2_FREEZE_PAGE */
+#define XLOG_HEAP2_VISIBLE 0x40
+#define XLOG_HEAP2_MULTI_INSERT 0x50
+#define XLOG_HEAP2_LOCK_UPDATED 0x60
+#define XLOG_HEAP2_NEW_CID 0x70
/*
* xl_heap_insert/xl_heap_multi_insert flag values, 8 bits are available.
--
2.39.2
v5-0003-Rename-record-to-XLOG_HEAP2_PRUNE_FREEZE.patchtext/x-patch; charset=UTF-8; name=v5-0003-Rename-record-to-XLOG_HEAP2_PRUNE_FREEZE.patchDownload
From d3207bb557aa1d2868a50d357a06318a6c0cb5cd Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 13:48:29 +0200
Subject: [PATCH v5 03/26] Rename record to XLOG_HEAP2_PRUNE_FREEZE
To clarify that it also freezes now, and to make it clear that it's
significantly different from the old XLOG_HEAP2_PRUNE format.
---
src/backend/access/gist/gistxlog.c | 8 ++++----
src/backend/access/hash/hash_xlog.c | 8 ++++----
src/backend/access/heap/heapam.c | 16 ++++++++--------
src/backend/access/heap/pruneheap.c | 4 ++--
src/backend/access/rmgrdesc/heapdesc.c | 6 +++---
src/backend/replication/logical/decode.c | 2 +-
src/include/access/heapam_xlog.h | 4 ++--
7 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index fafd9f1c94f..588cade585b 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -183,10 +183,10 @@ gistRedoDeleteRecord(XLogReaderState *record)
*
* GiST delete records can conflict with standby queries. You might think
* that vacuum records would conflict as well, but we've handled that
- * already. XLOG_HEAP2_PRUNE records provide the highest xid cleaned by
- * the vacuum of the heap and so we can resolve any conflicts just once
- * when that arrives. After that we know that no conflicts exist from
- * individual gist vacuum records on that index.
+ * already. XLOG_HEAP2_PRUNE_FREEZE records provide the highest xid
+ * cleaned by the vacuum of the heap and so we can resolve any conflicts
+ * just once when that arrives. After that we know that no conflicts
+ * exist from individual gist vacuum records on that index.
*/
if (InHotStandby)
{
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index 4e05a1b4632..883915fd1da 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -992,10 +992,10 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
* Hash index records that are marked as LP_DEAD and being removed during
* hash index tuple insertion can conflict with standby queries. You might
* think that vacuum records would conflict as well, but we've handled
- * that already. XLOG_HEAP2_PRUNE records provide the highest xid cleaned
- * by the vacuum of the heap and so we can resolve any conflicts just once
- * when that arrives. After that we know that no conflicts exist from
- * individual hash index vacuum records on that index.
+ * that already. XLOG_HEAP2_PRUNE_FREEZE records provide the highest xid
+ * cleaned by the vacuum of the heap and so we can resolve any conflicts
+ * just once when that arrives. After that we know that no conflicts
+ * exist from individual hash index vacuum records on that index.
*/
if (InHotStandby)
{
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e6cfffd9f3e..69a9aaa501d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7726,10 +7726,10 @@ heap_index_delete_tuples(Relation rel, TM_IndexDeleteOp *delstate)
* must have considered the original tuple header as part of
* generating its own snapshotConflictHorizon value.
*
- * Relying on XLOG_HEAP2_PRUNE records like this is the same
- * strategy that index vacuuming uses in all cases. Index VACUUM
- * WAL records don't even have a snapshotConflictHorizon field of
- * their own for this reason.
+ * Relying on XLOG_HEAP2_PRUNE_FREEZE records like this is the
+ * same strategy that index vacuuming uses in all cases. Index
+ * VACUUM WAL records don't even have a snapshotConflictHorizon
+ * field of their own for this reason.
*/
if (!ItemIdIsNormal(lp))
break;
@@ -8587,10 +8587,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
}
/*
- * Handles XLOG_HEAP2_PRUNE record type.
+ * Handles XLOG_HEAP2_PRUNE_FREEZE record type.
*/
static void
-heap_xlog_prune(XLogReaderState *record)
+heap_xlog_prune_freeze(XLogReaderState *record)
{
XLogRecPtr lsn = record->EndRecPtr;
xl_heap_prune *xlrec = (xl_heap_prune *) XLogRecGetData(record);
@@ -9762,8 +9762,8 @@ heap2_redo(XLogReaderState *record)
switch (info & XLOG_HEAP_OPMASK)
{
- case XLOG_HEAP2_PRUNE:
- heap_xlog_prune(record);
+ case XLOG_HEAP2_PRUNE_FREEZE:
+ heap_xlog_prune_freeze(record);
break;
case XLOG_HEAP2_VISIBLE:
heap_xlog_visible(record);
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9773681868c..704604d206a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -359,7 +359,7 @@ heap_page_prune(Relation relation, Buffer buffer,
MarkBufferDirty(buffer);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
+ * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
*/
if (RelationNeedsWAL(relation))
{
@@ -1397,7 +1397,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
XLogRegisterBufData(0, (char *) frz_offsets,
sizeof(OffsetNumber) * nfrozen);
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE_FREEZE);
PageSetLSN(BufferGetPage(buffer), recptr);
}
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 0d7edffff20..8b94c869faf 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -243,7 +243,7 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
info &= XLOG_HEAP_OPMASK;
- if (info == XLOG_HEAP2_PRUNE)
+ if (info == XLOG_HEAP2_PRUNE_FREEZE)
{
xl_heap_prune *xlrec = (xl_heap_prune *) rec;
@@ -409,8 +409,8 @@ heap2_identify(uint8 info)
switch (info & ~XLR_INFO_MASK)
{
- case XLOG_HEAP2_PRUNE:
- id = "PRUNE";
+ case XLOG_HEAP2_PRUNE_FREEZE:
+ id = "PRUNE_FREEZE";
break;
case XLOG_HEAP2_VISIBLE:
id = "VISIBLE";
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 38d1bdd825e..8c909514381 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -445,7 +445,7 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
* Everything else here is just low level physical stuff we're not
* interested in.
*/
- case XLOG_HEAP2_PRUNE:
+ case XLOG_HEAP2_PRUNE_FREEZE:
case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 6a934be7ecc..3d41aeb6d47 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -51,7 +51,7 @@
* these, too.
*/
#define XLOG_HEAP2_REWRITE 0x00
-#define XLOG_HEAP2_PRUNE 0x10
+#define XLOG_HEAP2_PRUNE_FREEZE 0x10
/* 0x20 is free, was XLOG_HEAP2_VACUUM */
/* 0x30 is free, was XLOG_HEAP2_FREEZE_PAGE */
#define XLOG_HEAP2_VISIBLE 0x40
@@ -301,7 +301,7 @@ typedef struct xl_heap_freeze_plan
} xl_heap_freeze_plan;
/*
- * As of Postgres 17, XLOG_HEAP2_PRUNE records replace
+ * As of Postgres 17, XLOG_HEAP2_PRUNE_FREEZE records replace
* XLOG_HEAP2_FREEZE_PAGE records.
*
* This is what we need to know about a block being frozen during vacuum
--
2.39.2
v5-0004-nplans-is-a-pointer.patchtext/x-patch; charset=UTF-8; name=v5-0004-nplans-is-a-pointer.patchDownload
From 5d6fc2ffbdd839e0b69242af16446a46cf6a2dc7 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 13:49:59 +0200
Subject: [PATCH v5 04/26] 'nplans' is a pointer
I'm surprised the compiler didn't warn about this
---
src/backend/access/rmgrdesc/heapdesc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 8b94c869faf..9ef8a745982 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -155,8 +155,7 @@ heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
cursor += sizeof(OffsetNumber) * *nunused;
}
- if (nplans > 0)
- *frz_offsets = (OffsetNumber *) cursor;
+ *frz_offsets = (OffsetNumber *) cursor;
}
void
--
2.39.2
v5-0005-Remind-myself-to-bump-XLOG_PAGE_MAGIC-when-this-i.patchtext/x-patch; charset=UTF-8; name=v5-0005-Remind-myself-to-bump-XLOG_PAGE_MAGIC-when-this-i.patchDownload
From 6a37a2e9b5d8464382eae5efb983e261cc5905c6 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 14:52:06 +0200
Subject: [PATCH v5 05/26] Remind myself to bump XLOG_PAGE_MAGIC when this is
committed
---
src/include/access/xlog_internal.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index b88b24f0c1e..fd720d87dbb 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -31,7 +31,8 @@
/*
* Each page of XLOG file has a header like this:
*/
-#define XLOG_PAGE_MAGIC 0xD114 /* can be used as WAL version indicator */
+/* FIXME: make sure this is still larger than on 'master' before committing! */
+#define XLOG_PAGE_MAGIC 0xD115 /* can be used as WAL version indicator */
typedef struct XLogPageHeaderData
{
--
2.39.2
v5-0006-Fix-logging-snapshot-conflict-horizon.patchtext/x-patch; charset=UTF-8; name=v5-0006-Fix-logging-snapshot-conflict-horizon.patchDownload
From 59f3f80f82ed7a63d86c991d0cb025e4cde2caec Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 13:36:41 +0200
Subject: [PATCH v5 06/26] Fix logging snapshot conflict horizon.
- it was accessed without proper alignment, which won't work on
architectures that are strict about alignment. Use memcpy.
- in heap_xlog_prune_freeze, the code tried to access the xid with
"(xlhp_conflict_horizon *) (xlrec + SizeOfHeapPrune);" But 'xlrec'
was "xl_heap_prune *" rather than "char *". That happened to work,
because sizeof(xl_heap_prune) == 1, but make it more robust by
adding a cast to char *.
- remove xlhp_conflict_horizon and store a TransactionId directly. A
separate struct would make sense if we needed to store anything else
there, but for now it just seems like more code.
---
src/backend/access/heap/heapam.c | 6 ++++--
src/backend/access/heap/pruneheap.c | 3 +--
src/backend/access/rmgrdesc/heapdesc.c | 6 ++++--
src/include/access/heapam_xlog.h | 12 +++++-------
4 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 69a9aaa501d..b8d21ddd4dd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8628,9 +8628,11 @@ heap_xlog_prune_freeze(XLogReaderState *record)
*/
if (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON && InHotStandby)
{
- xlhp_conflict_horizon *horizon = (xlhp_conflict_horizon *) (xlrec + SizeOfHeapPrune);
+ TransactionId conflict_xid;
- ResolveRecoveryConflictWithSnapshot(horizon->xid,
+ memcpy(&conflict_xid, ((char *) xlrec) + SizeOfHeapPrune, sizeof(TransactionId));
+
+ ResolveRecoveryConflictWithSnapshot(conflict_xid,
xlrec->flags & XLHP_IS_CATALOG_REL,
rlocator);
}
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 704604d206a..6482d9d05c1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1284,7 +1284,6 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber *unused, int nunused)
{
xl_heap_prune xlrec;
- xlhp_conflict_horizon horizon;
XLogRecPtr recptr;
xlhp_freeze freeze;
xlhp_prune_items redirect_items,
@@ -1320,7 +1319,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
if (TransactionIdIsValid(conflict_xid))
- XLogRegisterData((char *) &horizon, SizeOfSnapshotConflictHorizon);
+ XLogRegisterData((char *) &conflict_xid, sizeof(TransactionId));
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 9ef8a745982..ff238d58279 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -248,10 +248,12 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
if (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON)
{
- xlhp_conflict_horizon *horizon = (xlhp_conflict_horizon *) (xlrec + SizeOfHeapPrune);
+ TransactionId conflict_xid;
+
+ memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
appendStringInfo(buf, "snapshotConflictHorizon: %u",
- horizon->xid);
+ conflict_xid);
}
appendStringInfo(buf, ", isCatalogRel: %c",
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 3d41aeb6d47..f0cbd31189e 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -247,6 +247,11 @@ typedef struct xl_heap_update
typedef struct xl_heap_prune
{
uint8 flags;
+
+ /*
+ * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict XID follows,
+ * unaligned
+ */
} xl_heap_prune;
/* to handle recovery conflict during logical decoding on standby */
@@ -273,13 +278,6 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
-typedef struct xlhp_conflict_horizon
-{
- TransactionId xid;
-} xlhp_conflict_horizon;
-
-#define SizeOfSnapshotConflictHorizon (offsetof(xlhp_conflict_horizon, xid) + sizeof(uint32))
-
/*
* This struct represents a 'freeze plan', which describes how to freeze a
* group of one or more heap tuples (appears in xl_heap_prune's xlhp_freeze
--
2.39.2
v5-0011-heap_page_prune-sets-all_visible-and-frz_conflict.patchtext/x-patch; charset=UTF-8; name=v5-0011-heap_page_prune-sets-all_visible-and-frz_conflict.patchDownload
From f42ae5a503b5b62eb296af07d53002b68cd12d9b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 14:01:37 -0500
Subject: [PATCH v5 11/26] heap_page_prune sets all_visible and
frz_conflict_horizon
In order to combine the prune and freeze records, we must know if the
page is eligible to be opportunistically frozen before finishing
pruning. Save all_visible in the PruneResult and set it to false when we
see non-removable tuples which are not visible to everyone.
We will also need to ensure that the snapshotConflictHorizon for the combined
prune + freeze record is the more conservative of that calculated for each of
pruning and freezing. Calculate the visibility_cutoff_xid for the purposes of
freezing -- the newest xmin on the page -- in heap_page_prune() and save it in
PruneResult.frz_conflict_horizon.
---
src/backend/access/heap/pruneheap.c | 127 +++++++++++++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 121 ++++++-------------------
src/include/access/heapam.h | 3 +
3 files changed, 151 insertions(+), 100 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 94b18017aaa..624984457d3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -63,8 +63,10 @@ static int heap_prune_chain(Buffer buffer,
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
-static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -245,6 +247,14 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ /*
+ * Keep track of whether or not the page is all_visible in case the caller
+ * wants to use this information to update the VM.
+ */
+ presult->all_visible = true;
+ /* for recovery conflicts */
+ presult->frz_conflict_horizon = InvalidTransactionId;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -296,8 +306,97 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
+ switch (presult->htsv[offnum])
+ {
+ case HEAPTUPLE_DEAD:
+
+ /*
+ * Deliberately delay unsetting all_visible until later during
+ * pruning. Removable dead tuples shouldn't preclude freezing
+ * the page. After finishing this first pass of tuple
+ * visibility checks, initialize all_visible_except_removable
+ * with the current value of all_visible to indicate whether
+ * or not the page is all visible except for dead tuples. This
+ * will allow us to attempt to freeze the page after pruning.
+ * Later during pruning, if we encounter an LP_DEAD item or
+ * are setting an item LP_DEAD, we will unset all_visible. As
+ * long as we unset it before updating the visibility map,
+ * this will be correct.
+ */
+ break;
+ case HEAPTUPLE_LIVE:
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed
+ * asynchronously. See SetHintBits for more info. Check that
+ * the tuple is hinted xmin-committed because of that.
+ */
+ if (presult->all_visible)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed? A
+ * FrozenTransactionId is seen as committed to everyone.
+ * Otherwise, we check if there is a snapshot that
+ * considers this xid to still be running, and if so, we
+ * don't consider the page all-visible.
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+ if (xmin != FrozenTransactionId &&
+ !GlobalVisTestIsRemovableXid(vistest, xmin))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, presult->frz_conflict_horizon) &&
+ TransactionIdIsNormal(xmin))
+ presult->frz_conflict_horizon = xmin;
+ }
+ break;
+ case HEAPTUPLE_RECENTLY_DEAD:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+ /* This is an expected case during concurrent vacuum */
+ presult->all_visible = false;
+ break;
+ default:
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+ break;
+ }
}
+ /*
+ * For vacuum, if the whole page will become frozen, we consider
+ * opportunistically freezing tuples. Dead tuples which will be removed by
+ * the end of vacuuming should not preclude us from opportunistically
+ * freezing. We will not be able to freeze the whole page if there are
+ * tuples present which are not visible to everyone or if there are dead
+ * tuples which are not yet removable. We need all_visible to be false if
+ * LP_DEAD tuples remain after pruning so that we do not incorrectly
+ * update the visibility map or page hint bit. So, we will update
+ * presult->all_visible to reflect the presence of LP_DEAD items while
+ * pruning and keep all_visible_except_removable to permit freezing if the
+ * whole page will eventually become all visible after removing tuples.
+ */
+ presult->all_visible_except_removable = presult->all_visible;
+
/* Scan the page */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -565,10 +664,14 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
/*
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now. We don't increment ndeleted here since
- * the LP was already marked dead.
+ * the LP was already marked dead. If it will not be marked
+ * LP_UNUSED, it will remain LP_DEAD, making the page not
+ * all_visible.
*/
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
+ else
+ presult->all_visible = false;
break;
}
@@ -705,7 +808,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
}
@@ -718,7 +821,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect item. We can clean up by setting the redirect item to
* DEAD state or LP_UNUSED if the caller indicated.
*/
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
return ndeleted;
@@ -755,13 +858,20 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
-heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+
+ /*
+ * Setting the line pointer LP_DEAD means the page will definitely not be
+ * all_visible.
+ */
+ presult->all_visible = false;
}
/*
@@ -771,7 +881,8 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
* pointers LP_DEAD if mark_unused_now is true.
*/
static void
-heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
@@ -782,7 +893,7 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
- heap_prune_record_dead(prstate, offnum);
+ heap_prune_record_dead(prstate, offnum, presult);
}
/* Record line pointer to be marked unused */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 69ec7150000..591c7db08fe 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1422,9 +1422,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_visible,
- all_frozen;
- TransactionId visibility_cutoff_xid;
+ bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1465,17 +1463,16 @@ lazy_scan_prune(LVRelState *vacrel,
&presult, &vacrel->offnum);
/*
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Keep track of whether or not the page is all_visible and
- * all_frozen and use this information to update the VM. all_visible
- * implies 0 lpdead_items, but don't trust all_frozen result unless
- * all_visible is also set to true.
+ * Now scan the page to collect LP_DEAD items and check for tuples
+ * requiring freezing among remaining tuples with storage. We will update
+ * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
+ * have determined whether or not the page is all_visible. Keep track of
+ * whether or not the page is all_frozen and use this information to
+ * update the VM. all_visible implies lpdead_items == 0, but don't trust
+ * all_frozen result unless all_visible is also set to true.
*
- * Also keep track of the visibility cutoff xid for recovery conflicts.
*/
- all_visible = true;
all_frozen = true;
- visibility_cutoff_xid = InvalidTransactionId;
/*
* Now scan the page to collect LP_DEAD items and update the variables set
@@ -1516,11 +1513,6 @@ lazy_scan_prune(LVRelState *vacrel,
* will only happen every other VACUUM, at most. Besides, VACUUM
* must treat hastup/nonempty_pages as provisional no matter how
* LP_DEAD items are handled (handled here, or handled later on).
- *
- * Also deliberately delay unsetting all_visible until just before
- * we return to lazy_scan_heap caller, as explained in full below.
- * (This is another case where it's useful to anticipate that any
- * LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
*/
deadoffsets[lpdead_items++] = offnum;
continue;
@@ -1558,46 +1550,6 @@ lazy_scan_prune(LVRelState *vacrel,
* what acquire_sample_rows() does.
*/
live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A
- * FrozenTransactionId is seen as committed to everyone.
- * Otherwise, we check if there is a snapshot that
- * considers this xid to still be running, and if so, we
- * don't consider the page all-visible.
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (xmin != FrozenTransactionId &&
- !GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
- {
- all_visible = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
break;
case HEAPTUPLE_RECENTLY_DEAD:
@@ -1607,7 +1559,6 @@ lazy_scan_prune(LVRelState *vacrel,
* pruning.)
*/
recently_dead_tuples++;
- all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1618,16 +1569,13 @@ lazy_scan_prune(LVRelState *vacrel,
* results. This assumption is a bit shaky, but it is what
* acquire_sample_rows() does, so be consistent.
*/
- all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
- all_visible = false;
/*
- * Count such rows as live. As above, we assume the deleting
- * transaction will commit and update the counters after we
- * report.
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
*/
live_tuples++;
break;
@@ -1670,7 +1618,7 @@ lazy_scan_prune(LVRelState *vacrel,
* page all-frozen afterwards (might not happen until final heap pass).
*/
if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (all_visible && all_frozen &&
+ (presult.all_visible_except_removable && all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1703,16 +1651,16 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->frozen_pages++;
/*
- * We can use visibility_cutoff_xid as our cutoff for conflicts
+ * We can use frz_conflict_horizon as our cutoff for conflicts
* when the whole page is eligible to become all-frozen in the VM
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (all_visible && all_frozen)
+ if (presult.all_visible_except_removable && all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = visibility_cutoff_xid;
- visibility_cutoff_xid = InvalidTransactionId;
+ snapshotConflictHorizon = presult.frz_conflict_horizon;
+ presult.frz_conflict_horizon = InvalidTransactionId;
}
else
{
@@ -1748,17 +1696,19 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && lpdead_items == 0)
+ if (presult.all_visible)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
+ Assert(lpdead_items == 0);
+
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
Assert(false);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == visibility_cutoff_xid);
+ debug_cutoff == presult.frz_conflict_horizon);
}
#endif
@@ -1783,19 +1733,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(dead_items->num_items <= dead_items->max_items);
pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
dead_items->num_items);
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on
- * to make the choice of whether or not to freeze the page unaffected
- * by the short-term presence of LP_DEAD items. These LP_DEAD items
- * were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible. It needs
- * to reflect the present state of things, as expected by our caller.
- */
- all_visible = false;
}
/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1812,20 +1749,20 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (lpdead_items > 0);
- Assert(!all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_visible || !(*has_lpdead_items));
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (all_frozen)
{
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1845,7 +1782,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
+ vmbuffer, presult.frz_conflict_horizon,
flags);
}
@@ -1893,7 +1830,7 @@ lazy_scan_prune(LVRelState *vacrel,
* it as all-frozen. Note that all_frozen is only valid if all_visible is
* true, so we must check both all_visible and all_frozen.
*/
- else if (all_visible_according_to_vm && all_visible &&
+ else if (all_visible_according_to_vm && presult.all_visible &&
all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
@@ -1910,11 +1847,11 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Set the page all-frozen (and all-visible) in the VM.
*
- * We can pass InvalidTransactionId as our visibility_cutoff_xid,
- * since a snapshotConflictHorizon sufficient to make everything safe
- * for REDO was logged when the page's tuples were frozen.
+ * We can pass InvalidTransactionId as our frz_conflict_horizon, since
+ * a snapshotConflictHorizon sufficient to make everything safe for
+ * REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ca6ddab91ea..dca572384ff 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,6 +198,9 @@ typedef struct PruneResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ bool all_visible; /* Whether or not the page is all visible */
+ bool all_visible_except_removable;
+ TransactionId frz_conflict_horizon; /* Newest xmin on the page */
/*
* Tuple visibility is only computed once for each tuple, for correctness
--
2.39.2
v5-0012-Add-reference-to-VacuumCutoffs-in-HeapPageFreeze.patchtext/x-patch; charset=UTF-8; name=v5-0012-Add-reference-to-VacuumCutoffs-in-HeapPageFreeze.patchDownload
From 5dfa4c2dfc5ad1be368781886d4b7c49b9f1467d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 16:22:17 -0500
Subject: [PATCH v5 12/26] Add reference to VacuumCutoffs in HeapPageFreeze
Future commits will move opportunistic freezing into the main path of
pruning in heap_page_prune(). Because on-access pruning will not do
opportunistic freezing, it is cleaner to keep the visibility information
required for calling heap_prepare_freeze_tuple() inside of the
HeapPageFreeze structure itself by saving a reference to VacuumCutoffs.
---
src/backend/access/heap/heapam.c | 67 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 3 +-
src/include/access/heapam.h | 2 +-
3 files changed, 36 insertions(+), 36 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b8d21ddd4dd..a663cce9f86 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6020,7 +6020,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
*/
static TransactionId
FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
- const struct VacuumCutoffs *cutoffs, uint16 *flags,
+ uint16 *flags,
HeapPageFreeze *pagefrz)
{
TransactionId newxmax;
@@ -6046,12 +6046,12 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
pagefrz->freeze_required = true;
return InvalidTransactionId;
}
- else if (MultiXactIdPrecedes(multi, cutoffs->relminmxid))
+ else if (MultiXactIdPrecedes(multi, pagefrz->cutoffs->relminmxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found multixact %u from before relminmxid %u",
- multi, cutoffs->relminmxid)));
- else if (MultiXactIdPrecedes(multi, cutoffs->OldestMxact))
+ multi, pagefrz->cutoffs->relminmxid)));
+ else if (MultiXactIdPrecedes(multi, pagefrz->cutoffs->OldestMxact))
{
TransactionId update_xact;
@@ -6066,7 +6066,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u from before multi freeze cutoff %u found to be still running",
- multi, cutoffs->OldestMxact)));
+ multi, pagefrz->cutoffs->OldestMxact)));
if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
{
@@ -6077,13 +6077,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
/* replace multi with single XID for its updater? */
update_xact = MultiXactIdGetUpdateXid(multi, t_infomask);
- if (TransactionIdPrecedes(update_xact, cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(update_xact, pagefrz->cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains update XID %u from before relfrozenxid %u",
multi, update_xact,
- cutoffs->relfrozenxid)));
- else if (TransactionIdPrecedes(update_xact, cutoffs->OldestXmin))
+ pagefrz->cutoffs->relfrozenxid)));
+ else if (TransactionIdPrecedes(update_xact, pagefrz->cutoffs->OldestXmin))
{
/*
* Updater XID has to have aborted (otherwise the tuple would have
@@ -6095,7 +6095,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains committed update XID %u from before removable cutoff %u",
multi, update_xact,
- cutoffs->OldestXmin)));
+ pagefrz->cutoffs->OldestXmin)));
*flags |= FRM_INVALIDATE_XMAX;
pagefrz->freeze_required = true;
return InvalidTransactionId;
@@ -6147,9 +6147,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
{
TransactionId xid = members[i].xid;
- Assert(!TransactionIdPrecedes(xid, cutoffs->relfrozenxid));
+ Assert(!TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid));
- if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->FreezeLimit))
{
/* Can't violate the FreezeLimit postcondition */
need_replace = true;
@@ -6161,7 +6161,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
/* Can't violate the MultiXactCutoff postcondition, either */
if (!need_replace)
- need_replace = MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff);
+ need_replace = MultiXactIdPrecedes(multi, pagefrz->cutoffs->MultiXactCutoff);
if (!need_replace)
{
@@ -6200,7 +6200,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
TransactionId xid = members[i].xid;
MultiXactStatus mstatus = members[i].status;
- Assert(!TransactionIdPrecedes(xid, cutoffs->relfrozenxid));
+ Assert(!TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid));
if (!ISUPDATE_from_mxstatus(mstatus))
{
@@ -6211,12 +6211,12 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdIsCurrentTransactionId(xid) ||
TransactionIdIsInProgress(xid))
{
- if (TransactionIdPrecedes(xid, cutoffs->OldestXmin))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains running locker XID %u from before removable cutoff %u",
multi, xid,
- cutoffs->OldestXmin)));
+ pagefrz->cutoffs->OldestXmin)));
newmembers[nnewmembers++] = members[i];
has_lockers = true;
}
@@ -6274,11 +6274,11 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* We determined that updater must be kept -- add it to pending new
* members list
*/
- if (TransactionIdPrecedes(xid, cutoffs->OldestXmin))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains committed update XID %u from before removable cutoff %u",
- multi, xid, cutoffs->OldestXmin)));
+ multi, xid, pagefrz->cutoffs->OldestXmin)));
newmembers[nnewmembers++] = members[i];
}
@@ -6370,7 +6370,6 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
*/
bool
heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen)
{
@@ -6398,14 +6397,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
xmin_already_frozen = true;
else
{
- if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found xmin %u from before relfrozenxid %u",
- xid, cutoffs->relfrozenxid)));
+ xid, pagefrz->cutoffs->relfrozenxid)));
/* Will set freeze_xmin flags in freeze plan below */
- freeze_xmin = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
+ freeze_xmin = TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin);
/* Verify that xmin committed if and when freeze plan is executed */
if (freeze_xmin)
@@ -6419,8 +6418,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
xid = HeapTupleHeaderGetXvac(tuple);
if (TransactionIdIsNormal(xid))
{
- Assert(TransactionIdPrecedesOrEquals(cutoffs->relfrozenxid, xid));
- Assert(TransactionIdPrecedes(xid, cutoffs->OldestXmin));
+ Assert(TransactionIdPrecedesOrEquals(pagefrz->cutoffs->relfrozenxid, xid));
+ Assert(TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin));
/*
* For Xvac, we always freeze proactively. This allows totally_frozen
@@ -6445,8 +6444,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* perform no-op xmax processing. The only constraint is that the
* FreezeLimit/MultiXactCutoff postcondition must never be violated.
*/
- newxmax = FreezeMultiXactId(xid, tuple->t_infomask, cutoffs,
- &flags, pagefrz);
+ newxmax = FreezeMultiXactId(xid, tuple->t_infomask, &flags, pagefrz);
if (flags & FRM_NOOP)
{
@@ -6469,7 +6467,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* (This repeats work from FreezeMultiXactId, but allows "no
* freeze" tracker maintenance to happen in only one place.)
*/
- Assert(!MultiXactIdPrecedes(newxmax, cutoffs->MultiXactCutoff));
+ Assert(!MultiXactIdPrecedes(newxmax, pagefrz->cutoffs->MultiXactCutoff));
Assert(MultiXactIdIsValid(newxmax) && xid == newxmax);
}
else if (flags & FRM_RETURN_IS_XID)
@@ -6478,7 +6476,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* xmax will become an updater Xid (original MultiXact's updater
* member Xid will be carried forward as a simple Xid in Xmax).
*/
- Assert(!TransactionIdPrecedes(newxmax, cutoffs->OldestXmin));
+ Assert(!TransactionIdPrecedes(newxmax, pagefrz->cutoffs->OldestXmin));
/*
* NB -- some of these transformations are only valid because we
@@ -6502,7 +6500,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* xmax is an old MultiXactId that we have to replace with a new
* MultiXactId, to carry forward two or more original member XIDs.
*/
- Assert(!MultiXactIdPrecedes(newxmax, cutoffs->OldestMxact));
+ Assert(!MultiXactIdPrecedes(newxmax, pagefrz->cutoffs->OldestMxact));
/*
* We can't use GetMultiXactIdHintBits directly on the new multi
@@ -6537,14 +6535,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
else if (TransactionIdIsNormal(xid))
{
/* Raw xmax is normal XID */
- if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found xmax %u from before relfrozenxid %u",
- xid, cutoffs->relfrozenxid)));
+ xid, pagefrz->cutoffs->relfrozenxid)));
/* Will set freeze_xmax flags in freeze plan below */
- freeze_xmax = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
+ freeze_xmax = TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin);
/*
* Verify that xmax aborted if and when freeze plan is executed,
@@ -6624,7 +6622,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* Does this tuple force caller to freeze the entire page?
*/
pagefrz->freeze_required =
- heap_tuple_should_freeze(tuple, cutoffs,
+ heap_tuple_should_freeze(tuple, pagefrz->cutoffs,
&pagefrz->NoFreezePageRelfrozenXid,
&pagefrz->NoFreezePageRelminMxid);
}
@@ -6783,8 +6781,9 @@ heap_freeze_tuple(HeapTupleHeader tuple,
pagefrz.NoFreezePageRelfrozenXid = FreezeLimit;
pagefrz.NoFreezePageRelminMxid = MultiXactCutoff;
- do_freeze = heap_prepare_freeze_tuple(tuple, &cutoffs,
- &pagefrz, &frz, &totally_frozen);
+ pagefrz.cutoffs = &cutoffs;
+
+ do_freeze = heap_prepare_freeze_tuple(tuple, &pagefrz, &frz, &totally_frozen);
/*
* Note that because this is not a WAL-logged operation, we don't need to
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 591c7db08fe..7214e5c3b55 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1442,6 +1442,7 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ pagefrz.cutoffs = &vacrel->cutoffs;
tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
@@ -1587,7 +1588,7 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
+ if (heap_prepare_freeze_tuple(htup, &pagefrz,
&frozen[tuples_frozen], &totally_frozen))
{
/* Save prepared freeze plan for later */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index dca572384ff..fd28d6bca8a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -189,6 +189,7 @@ typedef struct HeapPageFreeze
TransactionId NoFreezePageRelfrozenXid;
MultiXactId NoFreezePageRelminMxid;
+ struct VacuumCutoffs *cutoffs;
} HeapPageFreeze;
/*
@@ -295,7 +296,6 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
--
2.39.2
v5-0013-still-use-a-local-cutoffs-variable.patchtext/x-patch; charset=UTF-8; name=v5-0013-still-use-a-local-cutoffs-variable.patchDownload
From d36138b5bf0a93557273b5e47f8cd5ea089057c7 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 11:47:42 +0200
Subject: [PATCH v5 13/26] still use a local 'cutoffs' variable
Given how often 'cutoffs' is used in the function, I think it still
makes sense to have a local variable for it, just to keep the source
lines shorter.
---
src/backend/access/heap/heapam.c | 59 ++++++++++++++++----------------
1 file changed, 30 insertions(+), 29 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a663cce9f86..8779fd04305 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6020,9 +6020,9 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
*/
static TransactionId
FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
- uint16 *flags,
- HeapPageFreeze *pagefrz)
+ uint16 *flags, HeapPageFreeze *pagefrz)
{
+ const struct VacuumCutoffs *cutoffs = pagefrz->cutoffs;
TransactionId newxmax;
MultiXactMember *members;
int nmembers;
@@ -6046,12 +6046,12 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
pagefrz->freeze_required = true;
return InvalidTransactionId;
}
- else if (MultiXactIdPrecedes(multi, pagefrz->cutoffs->relminmxid))
+ else if (MultiXactIdPrecedes(multi, cutoffs->relminmxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found multixact %u from before relminmxid %u",
- multi, pagefrz->cutoffs->relminmxid)));
- else if (MultiXactIdPrecedes(multi, pagefrz->cutoffs->OldestMxact))
+ multi, cutoffs->relminmxid)));
+ else if (MultiXactIdPrecedes(multi, cutoffs->OldestMxact))
{
TransactionId update_xact;
@@ -6066,7 +6066,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u from before multi freeze cutoff %u found to be still running",
- multi, pagefrz->cutoffs->OldestMxact)));
+ multi, cutoffs->OldestMxact)));
if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
{
@@ -6077,13 +6077,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
/* replace multi with single XID for its updater? */
update_xact = MultiXactIdGetUpdateXid(multi, t_infomask);
- if (TransactionIdPrecedes(update_xact, pagefrz->cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(update_xact, cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains update XID %u from before relfrozenxid %u",
multi, update_xact,
- pagefrz->cutoffs->relfrozenxid)));
- else if (TransactionIdPrecedes(update_xact, pagefrz->cutoffs->OldestXmin))
+ cutoffs->relfrozenxid)));
+ else if (TransactionIdPrecedes(update_xact, cutoffs->OldestXmin))
{
/*
* Updater XID has to have aborted (otherwise the tuple would have
@@ -6095,7 +6095,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains committed update XID %u from before removable cutoff %u",
multi, update_xact,
- pagefrz->cutoffs->OldestXmin)));
+ cutoffs->OldestXmin)));
*flags |= FRM_INVALIDATE_XMAX;
pagefrz->freeze_required = true;
return InvalidTransactionId;
@@ -6147,9 +6147,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
{
TransactionId xid = members[i].xid;
- Assert(!TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid));
+ Assert(!TransactionIdPrecedes(xid, cutoffs->relfrozenxid));
- if (TransactionIdPrecedes(xid, pagefrz->cutoffs->FreezeLimit))
+ if (TransactionIdPrecedes(xid, cutoffs->FreezeLimit))
{
/* Can't violate the FreezeLimit postcondition */
need_replace = true;
@@ -6161,7 +6161,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
/* Can't violate the MultiXactCutoff postcondition, either */
if (!need_replace)
- need_replace = MultiXactIdPrecedes(multi, pagefrz->cutoffs->MultiXactCutoff);
+ need_replace = MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff);
if (!need_replace)
{
@@ -6200,7 +6200,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
TransactionId xid = members[i].xid;
MultiXactStatus mstatus = members[i].status;
- Assert(!TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid));
+ Assert(!TransactionIdPrecedes(xid, cutoffs->relfrozenxid));
if (!ISUPDATE_from_mxstatus(mstatus))
{
@@ -6211,12 +6211,12 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdIsCurrentTransactionId(xid) ||
TransactionIdIsInProgress(xid))
{
- if (TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin))
+ if (TransactionIdPrecedes(xid, cutoffs->OldestXmin))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains running locker XID %u from before removable cutoff %u",
multi, xid,
- pagefrz->cutoffs->OldestXmin)));
+ cutoffs->OldestXmin)));
newmembers[nnewmembers++] = members[i];
has_lockers = true;
}
@@ -6274,11 +6274,11 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* We determined that updater must be kept -- add it to pending new
* members list
*/
- if (TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin))
+ if (TransactionIdPrecedes(xid, cutoffs->OldestXmin))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("multixact %u contains committed update XID %u from before removable cutoff %u",
- multi, xid, pagefrz->cutoffs->OldestXmin)));
+ multi, xid, cutoffs->OldestXmin)));
newmembers[nnewmembers++] = members[i];
}
@@ -6373,6 +6373,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen)
{
+ const struct VacuumCutoffs *cutoffs = pagefrz->cutoffs;
bool xmin_already_frozen = false,
xmax_already_frozen = false;
bool freeze_xmin = false,
@@ -6397,14 +6398,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
xmin_already_frozen = true;
else
{
- if (TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found xmin %u from before relfrozenxid %u",
- xid, pagefrz->cutoffs->relfrozenxid)));
+ xid, cutoffs->relfrozenxid)));
/* Will set freeze_xmin flags in freeze plan below */
- freeze_xmin = TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin);
+ freeze_xmin = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
/* Verify that xmin committed if and when freeze plan is executed */
if (freeze_xmin)
@@ -6418,8 +6419,8 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
xid = HeapTupleHeaderGetXvac(tuple);
if (TransactionIdIsNormal(xid))
{
- Assert(TransactionIdPrecedesOrEquals(pagefrz->cutoffs->relfrozenxid, xid));
- Assert(TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin));
+ Assert(TransactionIdPrecedesOrEquals(cutoffs->relfrozenxid, xid));
+ Assert(TransactionIdPrecedes(xid, cutoffs->OldestXmin));
/*
* For Xvac, we always freeze proactively. This allows totally_frozen
@@ -6467,7 +6468,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* (This repeats work from FreezeMultiXactId, but allows "no
* freeze" tracker maintenance to happen in only one place.)
*/
- Assert(!MultiXactIdPrecedes(newxmax, pagefrz->cutoffs->MultiXactCutoff));
+ Assert(!MultiXactIdPrecedes(newxmax, cutoffs->MultiXactCutoff));
Assert(MultiXactIdIsValid(newxmax) && xid == newxmax);
}
else if (flags & FRM_RETURN_IS_XID)
@@ -6476,7 +6477,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* xmax will become an updater Xid (original MultiXact's updater
* member Xid will be carried forward as a simple Xid in Xmax).
*/
- Assert(!TransactionIdPrecedes(newxmax, pagefrz->cutoffs->OldestXmin));
+ Assert(!TransactionIdPrecedes(newxmax, cutoffs->OldestXmin));
/*
* NB -- some of these transformations are only valid because we
@@ -6500,7 +6501,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* xmax is an old MultiXactId that we have to replace with a new
* MultiXactId, to carry forward two or more original member XIDs.
*/
- Assert(!MultiXactIdPrecedes(newxmax, pagefrz->cutoffs->OldestMxact));
+ Assert(!MultiXactIdPrecedes(newxmax, cutoffs->OldestMxact));
/*
* We can't use GetMultiXactIdHintBits directly on the new multi
@@ -6535,14 +6536,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
else if (TransactionIdIsNormal(xid))
{
/* Raw xmax is normal XID */
- if (TransactionIdPrecedes(xid, pagefrz->cutoffs->relfrozenxid))
+ if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("found xmax %u from before relfrozenxid %u",
- xid, pagefrz->cutoffs->relfrozenxid)));
+ xid, cutoffs->relfrozenxid)));
/* Will set freeze_xmax flags in freeze plan below */
- freeze_xmax = TransactionIdPrecedes(xid, pagefrz->cutoffs->OldestXmin);
+ freeze_xmax = TransactionIdPrecedes(xid, cutoffs->OldestXmin);
/*
* Verify that xmax aborted if and when freeze plan is executed,
--
2.39.2
v5-0014-Prepare-freeze-tuples-in-heap_page_prune.patchtext/x-patch; charset=UTF-8; name=v5-0014-Prepare-freeze-tuples-in-heap_page_prune.patchDownload
From e5b4b3649cbe573215374ca91cf86ce474349c1a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 18 Mar 2024 20:01:38 -0400
Subject: [PATCH v5 14/26] Prepare freeze tuples in heap_page_prune()
In order to combine the freeze and prune records, we must determine
which tuples are freezable before actually executing pruning. All of the
page modifications should be made in the same critical section along
with emitting the combined WAL. Determine whether or not tuples should
or must be frozen and whether or not the page will be all frozen as a
consequence during pruning.
---
src/backend/access/heap/pruneheap.c | 41 +++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 68 +++++++---------------------
src/include/access/heapam.h | 12 +++++
3 files changed, 66 insertions(+), 55 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 624984457d3..94c5c7b80f0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -153,7 +153,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false,
+ heap_page_prune(relation, buffer, vistest, false, NULL,
&presult, NULL);
/*
@@ -201,6 +201,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED
* during pruning.
*
+ * pagefrz contains both input and output parameters used if the caller is
+ * interested in potentially freezing tuples on the page.
+ *
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
* heap_page_prune() is responsible for initializing it.
@@ -212,6 +215,7 @@ void
heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
OffsetNumber *off_loc)
{
@@ -246,11 +250,16 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ presult->nfrozen = 0;
/*
- * Keep track of whether or not the page is all_visible in case the caller
- * wants to use this information to update the VM.
+ * Caller will update the VM after pruning, collecting LP_DEAD items, and
+ * freezing tuples. Keep track of whether or not the page is all_visible
+ * and all_frozen and use this information to update the VM. all_visible
+ * implies lpdead_items == 0, but don't trust all_frozen result unless
+ * all_visible is also set to true.
*/
+ presult->all_frozen = true;
presult->all_visible = true;
/* for recovery conflicts */
presult->frz_conflict_horizon = InvalidTransactionId;
@@ -380,6 +389,32 @@ heap_page_prune(Relation relation, Buffer buffer,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
+
+ /*
+ * Consider freezing any normal tuples which will not be removed
+ */
+ if (presult->htsv[offnum] != HEAPTUPLE_DEAD && pagefrz)
+ {
+ bool totally_frozen;
+
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, pagefrz,
+ &presult->frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ presult->frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to
+ * become totally frozen (according to its freeze plan), then the
+ * page definitely cannot be set all-frozen in the visibility map
+ * later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+ }
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7214e5c3b55..27b57d68dae 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1416,16 +1416,13 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff;
ItemId itemid;
PruneResult presult;
- int tuples_frozen,
- lpdead_items,
+ int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1443,7 +1440,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
recently_dead_tuples = 0;
@@ -1461,31 +1457,20 @@ lazy_scan_prune(LVRelState *vacrel,
* false otherwise.
*/
heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
- &presult, &vacrel->offnum);
+ &pagefrz, &presult, &vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and check for tuples
* requiring freezing among remaining tuples with storage. We will update
* the VM after collecting LP_DEAD items and freezing tuples. Pruning will
- * have determined whether or not the page is all_visible. Keep track of
- * whether or not the page is all_frozen and use this information to
- * update the VM. all_visible implies lpdead_items == 0, but don't trust
- * all_frozen result unless all_visible is also set to true.
+ * have determined whether or not the page is all_visible and able to
+ * become all_frozen.
*
*/
- all_frozen = true;
-
- /*
- * Now scan the page to collect LP_DEAD items and update the variables set
- * just above.
- */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
- HeapTupleHeader htup;
- bool totally_frozen;
-
/*
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
@@ -1521,8 +1506,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(ItemIdIsNormal(itemid));
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
/*
* The criteria for counting a tuple as live in this block need to
* match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
@@ -1587,29 +1570,8 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
- /* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &pagefrz,
- &frozen[tuples_frozen], &totally_frozen))
- {
- /* Save prepared freeze plan for later */
- frozen[tuples_frozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the page
- * definitely cannot be set all-frozen in the visibility map later on
- */
- if (!totally_frozen)
- all_frozen = false;
}
- /*
- * We have now divided every item on the page into either an LP_DEAD item
- * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
- * that remains and needs to be considered for freezing now (LP_UNUSED and
- * LP_REDIRECT items also remain, but are of no further interest to us).
- */
vacrel->offnum = InvalidOffsetNumber;
/*
@@ -1618,8 +1580,8 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (presult.all_visible_except_removable && all_frozen &&
+ if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ (presult.all_visible_except_removable && presult.all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1629,7 +1591,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (tuples_frozen == 0)
+ if (presult.nfrozen == 0)
{
/*
* We have no freeze plans to execute, so there's no added cost
@@ -1657,7 +1619,7 @@ lazy_scan_prune(LVRelState *vacrel,
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (presult.all_visible_except_removable && all_frozen)
+ if (presult.all_visible_except_removable && presult.all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
snapshotConflictHorizon = presult.frz_conflict_horizon;
@@ -1673,7 +1635,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(vacrel->rel, buf,
snapshotConflictHorizon,
- frozen, tuples_frozen);
+ presult.frozen, presult.nfrozen);
}
}
else
@@ -1684,8 +1646,8 @@ lazy_scan_prune(LVRelState *vacrel,
*/
vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- all_frozen = false;
- tuples_frozen = 0; /* avoid miscounts in instrumentation */
+ presult.all_frozen = false;
+ presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1708,6 +1670,8 @@ lazy_scan_prune(LVRelState *vacrel,
&debug_cutoff, &debug_all_frozen))
Assert(false);
+ Assert(presult.all_frozen == debug_all_frozen);
+
Assert(!TransactionIdIsValid(debug_cutoff) ||
debug_cutoff == presult.frz_conflict_horizon);
}
@@ -1738,7 +1702,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += tuples_frozen;
+ vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
vacrel->live_tuples += live_tuples;
vacrel->recently_dead_tuples += recently_dead_tuples;
@@ -1761,7 +1725,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (all_frozen)
+ if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1832,7 +1796,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && presult.all_visible &&
- all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index fd28d6bca8a..08debc09cb5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -201,6 +201,11 @@ typedef struct PruneResult
int nnewlpdead; /* Number of newly LP_DEAD items */
bool all_visible; /* Whether or not the page is all visible */
bool all_visible_except_removable;
+ /* Whether or not the page can be set all frozen in the VM */
+ bool all_frozen;
+
+ /* Number of newly frozen tuples */
+ int nfrozen;
TransactionId frz_conflict_horizon; /* Newest xmin on the page */
/*
@@ -213,6 +218,12 @@ typedef struct PruneResult
* 1. Otherwise every access would need to subtract 1.
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneResult;
/*
@@ -324,6 +335,7 @@ extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
--
2.39.2
v5-0015-lazy_scan_prune-reorder-freeze-execution-logic.patchtext/x-patch; charset=UTF-8; name=v5-0015-lazy_scan_prune-reorder-freeze-execution-logic.patchDownload
From 8236bbd369c39b2ef4db623a1f366a89ebe834de Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 19 Mar 2024 19:30:16 -0400
Subject: [PATCH v5 15/26] lazy_scan_prune reorder freeze execution logic
To combine the prune and freeze records, freezing must be done before a
pruning WAL record is emitted. We will move the freeze execution into
heap_page_prune() in future commits. lazy_scan_prune() currently
executes freezing, updates vacrel->NewRelfrozenXid and
vacrel->NewRelminMxid, and resets the snapshotConflictHorizon that the
visibility map update record may use in the same block of if statements.
This commit starts reordering that logic so that the freeze execution
can be separated from the other updates which should not be done in
pruning.
---
src/backend/access/heap/vacuumlazy.c | 92 +++++++++++++++-------------
1 file changed, 49 insertions(+), 43 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 27b57d68dae..d9be88aceea 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1421,6 +1421,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
+ bool do_freeze;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1580,10 +1581,15 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ do_freeze = pagefrz.freeze_required ||
(presult.all_visible_except_removable && presult.all_frozen &&
- fpi_before != pgWalUsage.wal_fpi))
+ presult.nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+
+ if (do_freeze)
{
+ TransactionId snapshotConflictHorizon;
+
/*
* We're freezing the page. Our final NewRelfrozenXid doesn't need to
* be affected by the XIDs that are just about to be frozen anyway.
@@ -1591,52 +1597,52 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (presult.nfrozen == 0)
- {
- /*
- * We have no freeze plans to execute, so there's no added cost
- * from following the freeze path. That's why it was chosen. This
- * is important in the case where the page only contains totally
- * frozen tuples at this point (perhaps only following pruning).
- * Such pages can be marked all-frozen in the VM by our caller,
- * even though none of its tuples were newly frozen here (note
- * that the "no freeze" path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter
- * here, since it only counts pages with newly frozen tuples
- * (don't confuse that with pages newly set all-frozen in VM).
- */
- }
+ vacrel->frozen_pages++;
+
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts when
+ * the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin.
+ */
+ if (presult.all_visible_except_removable && presult.all_frozen)
+ snapshotConflictHorizon = presult.frz_conflict_horizon;
else
{
- TransactionId snapshotConflictHorizon;
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ snapshotConflictHorizon = pagefrz.cutoffs->OldestXmin;
+ TransactionIdRetreat(snapshotConflictHorizon);
+ }
- vacrel->frozen_pages++;
+ /* Using same cutoff when setting VM is now unnecessary */
+ if (presult.all_visible_except_removable && presult.all_frozen)
+ presult.frz_conflict_horizon = InvalidTransactionId;
- /*
- * We can use frz_conflict_horizon as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM
- * once we're done with it. Otherwise we generate a conservative
- * cutoff by stepping back from OldestXmin.
- */
- if (presult.all_visible_except_removable && presult.all_frozen)
- {
- /* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = presult.frz_conflict_horizon;
- presult.frz_conflict_horizon = InvalidTransactionId;
- }
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(vacrel->rel, buf,
+ snapshotConflictHorizon,
+ presult.frozen, presult.nfrozen);
+ }
+ else if (presult.all_frozen && presult.nfrozen == 0)
+ {
+ /* Page should be all visible except to-be-removed tuples */
+ Assert(presult.all_visible_except_removable);
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
+ /*
+ * We have no freeze plans to execute, so there's no added cost from
+ * following the freeze path. That's why it was chosen. This is
+ * important in the case where the page only contains totally frozen
+ * tuples at this point (perhaps only following pruning). Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here (note that the "no freeze"
+ * path never sets pages all-frozen).
+ *
+ * We never increment the frozen_pages instrumentation counter here,
+ * since it only counts pages with newly frozen tuples (don't confuse
+ * that with pages newly set all-frozen in VM).
+ */
+ vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
+ vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
}
else
{
--
2.39.2
v5-0016-Execute-freezing-in-heap_page_prune.patchtext/x-patch; charset=UTF-8; name=v5-0016-Execute-freezing-in-heap_page_prune.patchDownload
From c54416b87e983467049ec0b77740b3dd375f75b2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 8 Mar 2024 16:45:57 -0500
Subject: [PATCH v5 16/26] Execute freezing in heap_page_prune()
As a step toward combining the prune and freeze WAL records, execute
freezing in heap_page_prune(). The logic to determine whether or not to
execute freeze plans was moved from lazy_scan_prune() over to
heap_page_prune() with little modification.
---
src/backend/access/heap/heapam_handler.c | 2 +-
src/backend/access/heap/pruneheap.c | 139 ++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 134 ++++++----------------
src/backend/storage/ipc/procarray.c | 6 +-
src/include/access/heapam.h | 39 ++++---
src/tools/pgindent/typedefs.list | 2 +-
6 files changed, 174 insertions(+), 148 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 680a50bf8b1..5e522f5b0ba 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1046,7 +1046,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
* We ignore unused and redirect line pointers. DEAD line pointers
* should be counted as dead, because we need vacuum to run to get rid
* of them. Note that this rule agrees with the way that
- * heap_page_prune() counts things.
+ * heap_page_prune_and_freeze() counts things.
*/
if (!ItemIdIsNormal(itemid))
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 94c5c7b80f0..e56ae0f7296 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -17,16 +17,19 @@
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
+#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
+#include "commands/vacuum.h"
#include "access/xloginsert.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
-/* Working data for heap_page_prune and subroutines */
+/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
/* tuple visibility test, initialized for the relation */
@@ -51,6 +54,11 @@ typedef struct
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneState;
/* Local functions */
@@ -59,14 +67,15 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- PruneState *prstate, PruneResult *presult);
+ PruneState *prstate, PruneFreezeResult *presult);
+
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult);
+ PruneFreezeResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult);
+ PruneFreezeResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -146,15 +155,15 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
- PruneResult presult;
+ PruneFreezeResult presult;
/*
* For now, pass mark_unused_now as false regardless of whether or
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false, NULL,
- &presult, NULL);
+ heap_page_prune_and_freeze(relation, buffer, vistest, false, NULL,
+ &presult, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -188,7 +197,12 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
/*
- * Prune and repair fragmentation in the specified page.
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * If the page can be marked all-frozen in the visibility map, we may
+ * opportunistically freeze tuples on the page if either its tuples are old
+ * enough or freezing will be cheap enough.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -201,23 +215,24 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED
* during pruning.
*
- * pagefrz contains both input and output parameters used if the caller is
- * interested in potentially freezing tuples on the page.
+ * pagefrz is an input parameter containing visibility cutoff information and
+ * the current relfrozenxid and relminmxids used if the caller is interested in
+ * freezing tuples on the page.
*
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
- * heap_page_prune() is responsible for initializing it.
+ * heap_page_prune_and_freeze() is responsible for initializing it.
*
* off_loc is the offset location required by the caller to use in error
* callback.
*/
void
-heap_page_prune(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
- PruneResult *presult,
- OffsetNumber *off_loc)
+heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ GlobalVisState *vistest,
+ bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
+ PruneFreezeResult *presult,
+ OffsetNumber *off_loc)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -225,6 +240,8 @@ heap_page_prune(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
+ bool do_freeze;
+ int64 fpi_before = pgWalUsage.wal_fpi;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -264,6 +281,10 @@ heap_page_prune(Relation relation, Buffer buffer,
/* for recovery conflicts */
presult->frz_conflict_horizon = InvalidTransactionId;
+ /* For advancing relfrozenxid and relminmxid */
+ presult->new_relfrozenxid = InvalidTransactionId;
+ presult->new_relminmxid = InvalidMultiXactId;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -399,11 +420,11 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Tuple with storage -- consider need to freeze */
if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &presult->frozen[presult->nfrozen],
+ &prstate.frozen[presult->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- presult->frozen[presult->nfrozen++].offset = offnum;
+ prstate.frozen[presult->nfrozen++].offset = offnum;
}
/*
@@ -529,6 +550,72 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ */
+ if (pagefrz)
+ do_freeze = pagefrz->freeze_required ||
+ (presult->all_visible_except_removable && presult->all_frozen &&
+ presult->nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+ else
+ do_freeze = false;
+
+ if (do_freeze)
+ {
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts when
+ * the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin.
+ */
+ if (!(presult->all_visible_except_removable && presult->all_frozen))
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(presult->frz_conflict_horizon);
+ }
+
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(relation, buffer,
+ presult->frz_conflict_horizon,
+ prstate.frozen, presult->nfrozen);
+ }
+ else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
+ {
+ /*
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all frozen and there
+ * will be no newly frozen tuples.
+ */
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+
+ /* Caller won't update new_relfrozenxid and new_relminmxid */
+ if (!pagefrz)
+ return;
+
+ /*
+ * If we will freeze tuples on the page or, even if we don't freeze tuples
+ * on the page, if we will set the page all-frozen in the visibility map,
+ * we can advance relfrozenxid and relminmxid to the values in
+ * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
+ */
+ if (presult->all_frozen || presult->nfrozen > 0)
+ {
+ presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
+ }
+ else
+ {
+ presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
+ }
}
@@ -586,7 +673,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- PruneState *prstate, PruneResult *presult)
+ PruneState *prstate, PruneFreezeResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -851,10 +938,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
/*
* We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune caused us to
- * visit the dead successor of a redirect item before visiting the
- * redirect item. We can clean up by setting the redirect item to
- * DEAD state or LP_UNUSED if the caller indicated.
+ * item. This can happen if the loop in heap_page_prune_and_freeze()
+ * caused us to visit the dead successor of a redirect item before
+ * visiting the redirect item. We can clean up by setting the
+ * redirect item to DEAD state or LP_UNUSED if the caller indicated.
*/
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
@@ -894,7 +981,7 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult)
+ PruneFreezeResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
@@ -917,7 +1004,7 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
*/
static void
heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult)
+ PruneFreezeResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d9be88aceea..fe7751493e2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,12 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* as an upper bound on the XIDs stored in the pages we'll actually scan
* (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
*
- * Next acquire vistest, a related cutoff that's used in heap_page_prune.
- * We expect vistest will always make heap_page_prune remove any deleted
- * tuple whose xmax is < OldestXmin. lazy_scan_prune must never become
- * confused about whether a tuple should be frozen or removed. (In the
- * future we might want to teach lazy_scan_prune to recompute vistest from
- * time to time, to increase the number of dead tuples it can prune away.)
+ * Next acquire vistest, a related cutoff that's used in
+ * heap_page_prune_and_freeze(). We expect vistest will always make
+ * heap_page_prune_and_freeze() remove any deleted tuple whose xmax is <
+ * OldestXmin. (In the future we might want to teach lazy_scan_prune to
+ * recompute vistest from time to time, to increase the number of dead
+ * tuples it can prune away.)
*/
vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
@@ -1378,21 +1378,21 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
- * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
- * whether or not a tuple should be considered DEAD. This happened when an
- * inserting transaction concurrently aborted (after our heap_page_prune()
- * call, before our HeapTupleSatisfiesVacuum() call). There was rather a lot
- * of complexity just so we could deal with tuples that were DEAD to VACUUM,
- * but nevertheless were left with storage after pruning.
+ * Prior to PostgreSQL 14 there were very rare cases where
+ * heap_page_prune_and_freeze() was allowed to disagree with our
+ * HeapTupleSatisfiesVacuum() call about whether or not a tuple should be
+ * considered DEAD. This happened when an inserting transaction concurrently
+ * aborted (after our heap_page_prune_and_freeze() call, before our
+ * HeapTupleSatisfiesVacuum() call). There was rather a lot of complexity just
+ * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
+ * left with storage after pruning.
*
* As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune()'s visibility check. Without the second call to
- * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
- * disagreement. We'll just handle such tuples as if they had become fully dead
- * right after this operation completes instead of in the middle of it. Note that
- * any tuple that becomes dead after the call to heap_page_prune() can't need to
- * be frozen, because it was visible to another session when vacuum started.
+ * result of heap_page_prune_and_freeze()'s visibility check. Without the
+ * second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
+ * there can be no disagreement. We'll just handle such tuples as if they had
+ * become fully dead right after this operation completes instead of in the
+ * middle of it.
*
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
@@ -1415,26 +1415,24 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
- PruneResult presult;
+ PruneFreezeResult presult;
int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool do_freeze;
- int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
/*
* maxoff might be reduced following line pointer array truncation in
- * heap_page_prune. That's safe for us to ignore, since the reclaimed
- * space will continue to look like LP_UNUSED items below.
+ * heap_page_prune_and_freeze(). That's safe for us to ignore, since the
+ * reclaimed space will continue to look like LP_UNUSED items below.
*/
maxoff = PageGetMaxOffsetNumber(page);
- /* Initialize (or reset) page-level state */
+ /* Initialize pagefrz */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
@@ -1446,7 +1444,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples = 0;
/*
- * Prune all HOT-update chains in this page.
+ * Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
* in presult.ndeleted. It should not be confused with lpdead_items;
@@ -1457,8 +1455,8 @@ lazy_scan_prune(LVRelState *vacrel,
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
* false otherwise.
*/
- heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
- &pagefrz, &presult, &vacrel->offnum);
+ heap_page_prune_and_freeze(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+ &pagefrz, &presult, &vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and check for tuples
@@ -1575,85 +1573,23 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = InvalidOffsetNumber;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- do_freeze = pagefrz.freeze_required ||
- (presult.all_visible_except_removable && presult.all_frozen &&
- presult.nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
+ Assert(MultiXactIdIsValid(presult.new_relminmxid));
+ vacrel->NewRelfrozenXid = presult.new_relfrozenxid;
+ Assert(TransactionIdIsValid(presult.new_relfrozenxid));
+ vacrel->NewRelminMxid = presult.new_relminmxid;
- if (do_freeze)
+ if (presult.nfrozen > 0)
{
- TransactionId snapshotConflictHorizon;
-
/*
- * We're freezing the page. Our final NewRelfrozenXid doesn't need to
- * be affected by the XIDs that are just about to be frozen anyway.
+ * We never increment the frozen_pages instrumentation counter when
+ * nfrozen == 0, since it only counts pages with newly frozen tuples
+ * (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
-
vacrel->frozen_pages++;
- /*
- * We can use frz_conflict_horizon as our cutoff for conflicts when
- * the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin.
- */
- if (presult.all_visible_except_removable && presult.all_frozen)
- snapshotConflictHorizon = presult.frz_conflict_horizon;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = pagefrz.cutoffs->OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
-
/* Using same cutoff when setting VM is now unnecessary */
- if (presult.all_visible_except_removable && presult.all_frozen)
+ if (presult.all_frozen)
presult.frz_conflict_horizon = InvalidTransactionId;
-
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
- else if (presult.all_frozen && presult.nfrozen == 0)
- {
- /* Page should be all visible except to-be-removed tuples */
- Assert(presult.all_visible_except_removable);
-
- /*
- * We have no freeze plans to execute, so there's no added cost from
- * following the freeze path. That's why it was chosen. This is
- * important in the case where the page only contains totally frozen
- * tuples at this point (perhaps only following pruning). Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here (note that the "no freeze"
- * path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter here,
- * since it only counts pages with newly frozen tuples (don't confuse
- * that with pages newly set all-frozen in VM).
- */
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- }
- else
- {
- /*
- * Page requires "no freeze" processing. It might be set all-visible
- * in the visibility map, but it can never be set all-frozen.
- */
- vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- presult.all_frozen = false;
- presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index b3cd248fb64..88a6d504dff 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1715,9 +1715,9 @@ TransactionIdIsActive(TransactionId xid)
* Note: the approximate horizons (see definition of GlobalVisState) are
* updated by the computations done here. That's currently required for
* correctness and a small optimization. Without doing so it's possible that
- * heap vacuum's call to heap_page_prune() uses a more conservative horizon
- * than later when deciding which tuples can be removed - which the code
- * doesn't expect (breaking HOT).
+ * heap vacuum's call to heap_page_prune_and_freeze() uses a more conservative
+ * horizon than later when deciding which tuples can be removed - which the
+ * code doesn't expect (breaking HOT).
*/
static void
ComputeXidHorizons(ComputeXidHorizonsResult *h)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 08debc09cb5..9b24dae6a9e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -195,7 +195,7 @@ typedef struct HeapPageFreeze
/*
* Per-page state returned from pruning
*/
-typedef struct PruneResult
+typedef struct PruneFreezeResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
@@ -210,9 +210,10 @@ typedef struct PruneResult
/*
* Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
*
* This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
* 1. Otherwise every access would need to subtract 1.
@@ -220,17 +221,18 @@ typedef struct PruneResult
int8 htsv[MaxHeapTuplesPerPage + 1];
- /*
- * One entry for every tuple that we may freeze.
- */
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
-} PruneResult;
+ /* New value of relfrozenxid found by heap_page_prune_and_freeze() */
+ TransactionId new_relfrozenxid;
+
+ /* New value of relminmxid found by heap_page_prune_and_freeze() */
+ MultiXactId new_relminmxid;
+} PruneFreezeResult;
/*
* Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneResult.htsv for details. This helper function is meant to
- * guard against examining visibility status array members which have not yet
- * been computed.
+ * of int8. See PruneFreezeResult.htsv for details. This helper function is
+ * meant to guard against examining visibility status array members which have
+ * not yet been computed.
*/
static inline HTSV_Result
htsv_get_valid_status(int status)
@@ -306,6 +308,7 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
Buffer *buffer, struct TM_FailureData *tmfd);
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
@@ -332,12 +335,12 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
- PruneResult *presult,
- OffsetNumber *off_loc);
+extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ struct GlobalVisState *vistest,
+ bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
+ PruneFreezeResult *presult,
+ OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6a46b34c5ca..c28467919df 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2180,7 +2180,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
-PruneResult
+PruneFreezeResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.39.2
v5-0017-Make-opp-freeze-heuristic-compatible-with-prune-f.patchtext/x-patch; charset=UTF-8; name=v5-0017-Make-opp-freeze-heuristic-compatible-with-prune-f.patchDownload
From ddf5e63917dec08908a76abc466f32f44cebfea2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 16:11:35 -0500
Subject: [PATCH v5 17/26] Make opp freeze heuristic compatible with
prune+freeze record
Once the prune and freeze records are combined, we will no longer be
able to use a test of whether or not pruning emitted an FPI to decide
whether or not to opportunistically freeze a freezable page.
While this heuristic should be improved, for now, approximate the
previous logic by keeping track of whether or not a hint bit FPI was
emitted during visibility checks (when checksums are on) and combine
that with checking XLogCheckBufferNeedsBackup(). If we just finished
deciding whether or not to prune and the current buffer seems to need an
FPI after modification, it is likely that pruning would have emitted an
FPI.
---
src/backend/access/heap/pruneheap.c | 57 +++++++++++++++++++++--------
1 file changed, 42 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e56ae0f7296..d919f243c03 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -241,6 +241,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PruneState prstate;
HeapTupleData tup;
bool do_freeze;
+ bool do_prune;
+ bool whole_page_freezable;
+ bool hint_bit_fpi;
+ bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
/*
@@ -438,6 +442,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
}
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted. Then reset fpi_before for no prune case.
+ */
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ fpi_before = pgWalUsage.wal_fpi;
+
/*
* For vacuum, if the whole page will become frozen, we consider
* opportunistically freezing tuples. Dead tuples which will be removed by
@@ -482,11 +493,41 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = InvalidOffsetNumber;
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
+ /*
+ * Only incur overhead of checking if we will do an FPI if we might use
+ * the information.
+ */
+ if (do_prune && pagefrz)
+ prune_fpi = XLogCheckBufferNeedsBackup(buffer);
+
+ /* Is the whole page freezable? And is there something to freeze */
+ whole_page_freezable = presult->all_visible_except_removable &&
+ presult->all_frozen;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and prune
+ * records are combined, this heuristic couldn't be used anymore. The
+ * opportunistic freeze heuristic must be improved; however, for now, try
+ * to approximate it.
+ */
+ do_freeze = pagefrz &&
+ (pagefrz->freeze_required ||
+ (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
/* Have we found any prunable items? */
- if (prstate.nredirected > 0 || prstate.ndead > 0 || prstate.nunused > 0)
+ if (do_prune)
{
/*
* Apply the planned item changes, then repair page fragmentation, and
@@ -551,20 +592,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- if (pagefrz)
- do_freeze = pagefrz->freeze_required ||
- (presult->all_visible_except_removable && presult->all_frozen &&
- presult->nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
- else
- do_freeze = false;
-
if (do_freeze)
{
/*
--
2.39.2
v5-0018-Separate-tuple-pre-freeze-checks-and-invoke-earli.patchtext/x-patch; charset=UTF-8; name=v5-0018-Separate-tuple-pre-freeze-checks-and-invoke-earli.patchDownload
From a5fe00defc0551f453d44ac50cf807dd667cbf7e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 16:53:45 -0500
Subject: [PATCH v5 18/26] Separate tuple pre freeze checks and invoke earlier
When combining the prune and freeze records their critical sections will
have to be combined. heap_freeze_execute_prepared() does a set of pre
freeze validations before starting its critical section. Move these
validations into a helper function, heap_pre_freeze_checks(), and invoke
it in heap_page_prune() before the pruning critical section.
Also move up the calculation of the freeze snapshot conflict horizon.
---
src/backend/access/heap/heapam.c | 58 ++++++++++++++++-------------
src/backend/access/heap/pruneheap.c | 31 ++++++++-------
src/include/access/heapam.h | 3 ++
3 files changed, 54 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8779fd04305..3c1be5e78c1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6657,35 +6657,19 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
- */
+* Perform xmin/xmax XID status sanity checks before calling
+* heap_freeze_execute_prepared().
+*
+* heap_prepare_freeze_tuple doesn't perform these checks directly because
+* pg_xact lookups are relatively expensive. They shouldn't be repeated
+* by successive VACUUMs that each decide against freezing the same page.
+*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- /*
- * Perform xmin/xmax XID status sanity checks before critical section.
- *
- * heap_prepare_freeze_tuple doesn't perform these checks directly because
- * pg_xact lookups are relatively expensive. They shouldn't be repeated
- * by successive VACUUMs that each decide against freezing the same page.
- */
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6724,6 +6708,30 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xmax)));
}
}
+}
+
+/*
+ * heap_freeze_execute_prepared
+ *
+ * Executes freezing of one or more heap tuples on a page on behalf of caller.
+ * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
+ * Caller must set 'offset' in each plan for us. Note that we destructively
+ * sort caller's tuples array in-place, so caller had better be done with it.
+ *
+ * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
+ * later on without any risk of unsafe pg_xact lookups, even following a hard
+ * crash (or when querying from a standby). We represent freezing by setting
+ * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
+ * See section on buffer access rules in src/backend/storage/buffer/README.
+ */
+void
+heap_freeze_execute_prepared(Relation rel, Buffer buffer,
+ TransactionId snapshotConflictHorizon,
+ HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
+
+ Assert(ntuples > 0);
START_CRIT_SECTION();
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d919f243c03..81c926cc6e1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -523,6 +523,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
(pagefrz->freeze_required ||
(whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+ if (do_freeze)
+ {
+ heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
+
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts when
+ * the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin.
+ */
+ if (!(presult->all_visible_except_removable && presult->all_frozen))
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(presult->frz_conflict_horizon);
+ }
+ }
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -594,19 +612,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- /*
- * We can use frz_conflict_horizon as our cutoff for conflicts when
- * the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin.
- */
- if (!(presult->all_visible_except_removable && presult->all_frozen))
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
- TransactionIdRetreat(presult->frz_conflict_horizon);
- }
-
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(relation, buffer,
presult->frz_conflict_horizon,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9b24dae6a9e..340ac813a0f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -312,6 +312,9 @@ extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
+
+extern void heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
TransactionId snapshotConflictHorizon,
HeapTupleFreeze *tuples, int ntuples);
--
2.39.2
v5-0019-Remove-heap_freeze_execute_prepared.patchtext/x-patch; charset=UTF-8; name=v5-0019-Remove-heap_freeze_execute_prepared.patchDownload
From 79596f7971533cb17dfde0751cdb1f479a448469 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 17:03:17 -0500
Subject: [PATCH v5 19/26] Remove heap_freeze_execute_prepared()
In order to merge freeze and prune records, the execution of tuple
freezing and the WAL logging of the changes to the page must be
separated so that the WAL logging can be combined with prune WAL
logging. This commit makes a helper for the tuple freezing and then
inlines the contents of heap_freeze_execute_prepared() where it is
called in heap_page_prune().
---
src/backend/access/heap/heapam.c | 47 +++++++----------------------
src/backend/access/heap/pruneheap.c | 22 +++++++++++---
src/include/access/heapam.h | 28 +++++++++--------
3 files changed, 44 insertions(+), 53 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3c1be5e78c1..c80655e2a53 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6340,9 +6340,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* XIDs or MultiXactIds that will need to be processed by a future VACUUM.
*
* VACUUM caller must assemble HeapTupleFreeze freeze plan entries for every
- * tuple that we returned true for, and call heap_freeze_execute_prepared to
- * execute freezing. Caller must initialize pagefrz fields for page as a
- * whole before first call here for each heap page.
+ * tuple that we returned true for, and then execute freezing. Caller must
+ * initialize pagefrz fields for page as a whole before first call here for
+ * each heap page.
*
* VACUUM caller decides on whether or not to freeze the page as a whole.
* We'll often prepare freeze plans for a page that caller just discards.
@@ -6657,8 +6657,8 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
-* Perform xmin/xmax XID status sanity checks before calling
-* heap_freeze_execute_prepared().
+* Perform xmin/xmax XID status sanity checks before actually executing freeze
+* plans.
*
* heap_prepare_freeze_tuple doesn't perform these checks directly because
* pg_xact lookups are relatively expensive. They shouldn't be repeated
@@ -6711,30 +6711,17 @@ heap_pre_freeze_checks(Buffer buffer,
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
+ * Helper which executes freezing of one or more heap tuples on a page on
+ * behalf of caller. Caller passes an array of tuple plans from
+ * heap_prepare_freeze_tuple. Caller must set 'offset' in each plan for us.
+ * Must be called in a critical section that also marks the buffer dirty and,
+ * if needed, emits WAL.
*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- START_CRIT_SECTION();
-
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6746,18 +6733,6 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
}
MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(rel))
- {
- log_heap_prune_and_freeze(rel, buffer, snapshotConflictHorizon, false,
- tuples, ntuples,
- NULL, 0, /* redirected */
- NULL, 0, /* dead */
- NULL, 0); /* unused */
- }
-
- END_CRIT_SECTION();
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 81c926cc6e1..029d792ed49 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -612,10 +612,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(relation, buffer,
- presult->frz_conflict_horizon,
- prstate.frozen, presult->nfrozen);
+ START_CRIT_SECTION();
+
+ Assert(presult->nfrozen > 0);
+
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+
+ MarkBufferDirty(buffer);
+
+ /* Now WAL-log freezing if necessary */
+ if (RelationNeedsWAL(relation))
+ log_heap_prune_and_freeze(relation, buffer,
+ presult->frz_conflict_horizon, false,
+ prstate.frozen, presult->nfrozen,
+ NULL, 0, /* redirected */
+ NULL, 0, /* dead */
+ NULL, 0); /* unused */
+
+ END_CRIT_SECTION();
}
else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
{
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 340ac813a0f..cfa4b07433b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -14,6 +14,7 @@
#ifndef HEAPAM_H
#define HEAPAM_H
+#include "access/heapam_xlog.h"
#include "access/relation.h" /* for backward compatibility */
#include "access/relscan.h"
#include "access/sdir.h"
@@ -101,8 +102,8 @@ typedef enum
} HTSV_Result;
/*
- * heap_prepare_freeze_tuple may request that heap_freeze_execute_prepared
- * check any tuple's to-be-frozen xmin and/or xmax status using pg_xact
+ * heap_prepare_freeze_tuple may request that any tuple's to-be-frozen xmin
+ * and/or xmax status is checked using pg_xact during freezing execution.
*/
#define HEAP_FREEZE_CHECK_XMIN_COMMITTED 0x01
#define HEAP_FREEZE_CHECK_XMAX_ABORTED 0x02
@@ -154,14 +155,14 @@ typedef struct HeapPageFreeze
/*
* "Freeze" NewRelfrozenXid/NewRelminMxid trackers.
*
- * Trackers used when heap_freeze_execute_prepared freezes, or when there
- * are zero freeze plans for a page. It is always valid for vacuumlazy.c
- * to freeze any page, by definition. This even includes pages that have
- * no tuples with storage to consider in the first place. That way the
- * 'totally_frozen' results from heap_prepare_freeze_tuple can always be
- * used in the same way, even when no freeze plans need to be executed to
- * "freeze the page". Only the "freeze" path needs to consider the need
- * to set pages all-frozen in the visibility map under this scheme.
+ * Trackers used when tuples will be frozen, or when there are zero freeze
+ * plans for a page. It is always valid for vacuumlazy.c to freeze any
+ * page, by definition. This even includes pages that have no tuples with
+ * storage to consider in the first place. That way the 'totally_frozen'
+ * results from heap_prepare_freeze_tuple can always be used in the same
+ * way, even when no freeze plans need to be executed to "freeze the
+ * page". Only the "freeze" path needs to consider the need to set pages
+ * all-frozen in the visibility map under this scheme.
*
* When we freeze a page, we generally freeze all XIDs < OldestXmin, only
* leaving behind XIDs that are ineligible for freezing, if any. And so
@@ -315,12 +316,13 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
extern void heap_pre_freeze_checks(Buffer buffer,
HeapTupleFreeze *tuples, int ntuples);
-extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples);
+
+extern void heap_freeze_prepared_tuples(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern bool heap_freeze_tuple(HeapTupleHeader tuple,
TransactionId relfrozenxid, TransactionId relminmxid,
TransactionId FreezeLimit, TransactionId MultiXactCutoff);
+
extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
TransactionId *NoFreezePageRelfrozenXid,
--
2.39.2
v5-0020-Merge-prune-and-freeze-records.patchtext/x-patch; charset=UTF-8; name=v5-0020-Merge-prune-and-freeze-records.patchDownload
From 9a81ca1da2792a797bed71c1b1336b792fef7d0d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sun, 7 Jan 2024 17:55:31 -0500
Subject: [PATCH v5 20/26] Merge prune and freeze records
When both pruning and freezing is done, this means a single, combined
WAL record is emitted for both operations. This will reduce the number
of WAL records emitted.
When there are only tuples to freeze present, we can avoid taking a full
cleanup lock when replaying the record.
---
src/backend/access/heap/pruneheap.c | 230 +++++++++++++++-------------
1 file changed, 121 insertions(+), 109 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 029d792ed49..04e4a2fdeb4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -242,9 +242,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
+ bool do_hint;
bool whole_page_freezable;
bool hint_bit_fpi;
- bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
/*
@@ -444,10 +444,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
- * an FPI to be emitted. Then reset fpi_before for no prune case.
+ * an FPI to be emitted.
*/
hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
- fpi_before = pgWalUsage.wal_fpi;
/*
* For vacuum, if the whole page will become frozen, we consider
@@ -497,14 +496,18 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /* Record number of newly-set-LP_DEAD items for caller */
+ presult->nnewlpdead = prstate.ndead;
+
/*
- * Only incur overhead of checking if we will do an FPI if we might use
- * the information.
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
*/
- if (do_prune && pagefrz)
- prune_fpi = XLogCheckBufferNeedsBackup(buffer);
+ do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
- /* Is the whole page freezable? And is there something to freeze */
+ /* Is the whole page freezable? And is there something to freeze? */
whole_page_freezable = presult->all_visible_except_removable &&
presult->all_frozen;
@@ -519,43 +522,51 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* opportunistic freeze heuristic must be improved; however, for now, try
* to approximate it.
*/
- do_freeze = pagefrz &&
- (pagefrz->freeze_required ||
- (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+ do_freeze = false;
+ if (pagefrz)
+ {
+ if (pagefrz->freeze_required)
+ do_freeze = true;
+ else if (whole_page_freezable && presult->nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. In this case, we will
+ * freeze if we have already emitted an FPI or will do so anyway.
+ * Be sure only to incur the overhead of checking if we will do an
+ * FPI if we may use that information.
+ */
+ if (hint_bit_fpi ||
+ ((do_prune || do_hint) && XLogCheckBufferNeedsBackup(buffer)))
+ {
+ do_freeze = true;
+ }
+ }
+ }
+ /*
+ * Validate the tuples we are considering freezing. We do this even if
+ * pruning and hint bit setting have not emitted an FPI so far because we
+ * still may emit an FPI while setting the page hint bit later. But we
+ * want to avoid doing the pre-freeze checks in a critical section.
+ */
if (do_freeze)
- {
heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
+ if (!do_freeze && (!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
+ {
/*
- * We can use frz_conflict_horizon as our cutoff for conflicts when
- * the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin.
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all-frozen and there
+ * will be no newly frozen tuples.
*/
- if (!(presult->all_visible_except_removable && presult->all_frozen))
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
- TransactionIdRetreat(presult->frz_conflict_horizon);
- }
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
}
- /* Any error while applying the changes is critical */
START_CRIT_SECTION();
- /* Have we found any prunable items? */
- if (do_prune)
+ if (do_hint)
{
- /*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
- */
- heap_page_prune_execute(buffer, false,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
-
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
* XID of any soon-prunable tuple.
@@ -563,108 +574,109 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
/*
- * Also clear the "page is full" flag, since there's no point in
- * repeating the prune/defrag process until something else happens to
- * the page.
+ * Clear the "page is full" flag if it is set since there's no point
+ * in repeating the prune/defrag process until something else happens
+ * to the page.
*/
PageClearFull(page);
- MarkBufferDirty(buffer);
-
/*
- * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ * We only needed to update pd_prune_xid and clear the page-is-full
+ * hint bit, this is a non-WAL-logged hint. If we will also freeze or
+ * prune the page, we will mark the buffer dirty below.
*/
- if (RelationNeedsWAL(relation))
- {
- log_heap_prune_and_freeze(relation, buffer,
- prstate.snapshotConflictHorizon,
- false,
- NULL, 0,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
- }
+ if (!do_freeze && !do_prune)
+ MarkBufferDirtyHint(buffer, true);
}
- else
+
+ if (do_prune || do_freeze)
{
/*
- * If we didn't prune anything, but have found a new value for the
- * pd_prune_xid field, update it and mark the buffer dirty. This is
- * treated as a non-WAL-logged hint.
- *
- * Also clear the "page is full" flag if it is set, since there's no
- * point in repeating the prune/defrag process until something else
- * happens to the page.
+ * Apply the planned item changes, then repair page fragmentation, and
+ * update the page's hint bit about whether it has free line pointers.
*/
- if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page))
+ if (do_prune)
{
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
- PageClearFull(page);
- MarkBufferDirtyHint(buffer, true);
+ heap_page_prune_execute(buffer, false,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
}
- }
-
- END_CRIT_SECTION();
- /* Record number of newly-set-LP_DEAD items for caller */
- presult->nnewlpdead = prstate.ndead;
-
- if (do_freeze)
- {
- START_CRIT_SECTION();
-
- Assert(presult->nfrozen > 0);
-
- heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+ if (do_freeze)
+ {
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts
+ * when the whole page is eligible to become all-frozen in the VM
+ * once we're done with it. Otherwise we generate a conservative
+ * cutoff by stepping back from OldestXmin. This avoids false
+ * conflicts when hot_standby_feedback is in use.
+ */
+ if (!(presult->all_visible_except_removable && presult->all_frozen))
+ {
+ presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(presult->frz_conflict_horizon);
+ }
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+ }
MarkBufferDirty(buffer);
- /* Now WAL-log freezing if necessary */
+ /*
+ * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ */
if (RelationNeedsWAL(relation))
+ {
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ TransactionId conflict_xid;
+
+ if (do_freeze)
+ conflict_xid = Max(prstate.snapshotConflictHorizon,
+ presult->frz_conflict_horizon);
+ else
+ conflict_xid = prstate.snapshotConflictHorizon;
+
log_heap_prune_and_freeze(relation, buffer,
- presult->frz_conflict_horizon, false,
+ conflict_xid,
+ false,
prstate.frozen, presult->nfrozen,
- NULL, 0, /* redirected */
- NULL, 0, /* dead */
- NULL, 0); /* unused */
-
- END_CRIT_SECTION();
- }
- else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
- {
- /*
- * If we will neither freeze tuples on the page nor set the page all
- * frozen in the visibility map, the page is not all frozen and there
- * will be no newly frozen tuples.
- */
- presult->all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
+ }
}
- /* Caller won't update new_relfrozenxid and new_relminmxid */
- if (!pagefrz)
- return;
+ END_CRIT_SECTION();
/*
- * If we will freeze tuples on the page or, even if we don't freeze tuples
- * on the page, if we will set the page all-frozen in the visibility map,
- * we can advance relfrozenxid and relminmxid to the values in
- * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
+ * If we froze tuples on the page, the caller can advance relfrozenxid and
+ * relminmxid to the values in pagefrz->FreezePageRelfrozenXid and
+ * pagefrz->FreezePageRelminMxid. Otherwise, it is only safe to advance to
+ * the values in pagefrz->NoFreezePage[RelfrozenXid|RelminMxid]
*/
- if (presult->all_frozen || presult->nfrozen > 0)
- {
- presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
- presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
- }
- else
+ if (pagefrz)
{
- presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
- presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
+ if (presult->nfrozen > 0)
+ {
+ presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
+ }
+ else
+ {
+ presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
+ }
}
}
-
/*
* Perform visibility checks for heap pruning.
*/
--
2.39.2
v5-0021-move-whole_page_freezable-to-tighter-scope.patchtext/x-patch; charset=UTF-8; name=v5-0021-move-whole_page_freezable-to-tighter-scope.patchDownload
From 913617ed98cfddd678a6f620db7dee68d1d61c89 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 10:51:13 +0200
Subject: [PATCH v5 21/26] move whole_page_freezable to tighter scope
---
src/backend/access/heap/pruneheap.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 04e4a2fdeb4..3821f489aad 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -243,7 +243,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint;
- bool whole_page_freezable;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
@@ -507,10 +506,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
- /* Is the whole page freezable? And is there something to freeze? */
- whole_page_freezable = presult->all_visible_except_removable &&
- presult->all_frozen;
-
/*
* Freeze the page when heap_prepare_freeze_tuple indicates that at least
* one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
@@ -525,6 +520,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
do_freeze = false;
if (pagefrz)
{
+ bool whole_page_freezable;
+
+ /* Is the whole page freezable? And is there something to freeze? */
+ whole_page_freezable = presult->all_visible_except_removable &&
+ presult->all_frozen;
+
if (pagefrz->freeze_required)
do_freeze = true;
else if (whole_page_freezable && presult->nfrozen > 0)
--
2.39.2
v5-0022-make-all_visible_except_removable-local.patchtext/x-patch; charset=UTF-8; name=v5-0022-make-all_visible_except_removable-local.patchDownload
From e2b50f9b64f7e4255f4f764e2a348e1b446573dc Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 11:43:31 +0200
Subject: [PATCH v5 22/26] make 'all_visible_except_removable' local
The caller doesn't need it, so it doesn't belong in PruneFreezeResult
---
src/backend/access/heap/pruneheap.c | 22 ++++++++++++----------
src/include/access/heapam.h | 1 -
2 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3821f489aad..adf6406b880 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -245,6 +245,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_hint;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ bool all_visible_except_removable;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -268,6 +269,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* presult->htsv is not initialized here because all ntuple spots in the
* array will be set either to a valid HTSV_Result value or -1.
*/
+
presult->ndeleted = 0;
presult->nnewlpdead = 0;
presult->nfrozen = 0;
@@ -280,7 +282,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all_visible is also set to true.
*/
presult->all_frozen = true;
- presult->all_visible = true;
/* for recovery conflicts */
presult->frz_conflict_horizon = InvalidTransactionId;
@@ -311,6 +312,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* prefetching efficiency significantly / decreases the number of cache
* misses.
*/
+ all_visible_except_removable = true;
for (offnum = maxoff;
offnum >= FirstOffsetNumber;
offnum = OffsetNumberPrev(offnum))
@@ -367,13 +369,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* asynchronously. See SetHintBits for more info. Check that
* the tuple is hinted xmin-committed because of that.
*/
- if (presult->all_visible)
+ if (all_visible_except_removable)
{
TransactionId xmin;
if (!HeapTupleHeaderXminCommitted(htup))
{
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
}
@@ -389,7 +391,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (xmin != FrozenTransactionId &&
!GlobalVisTestIsRemovableXid(vistest, xmin))
{
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
}
@@ -400,14 +402,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
/* This is an expected case during concurrent vacuum */
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
default:
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
@@ -460,7 +462,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* pruning and keep all_visible_except_removable to permit freezing if the
* whole page will eventually become all visible after removing tuples.
*/
- presult->all_visible_except_removable = presult->all_visible;
+ presult->all_visible = all_visible_except_removable;
/* Scan the page */
for (offnum = FirstOffsetNumber;
@@ -523,7 +525,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool whole_page_freezable;
/* Is the whole page freezable? And is there something to freeze? */
- whole_page_freezable = presult->all_visible_except_removable &&
+ whole_page_freezable = all_visible_except_removable &&
presult->all_frozen;
if (pagefrz->freeze_required)
@@ -613,7 +615,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* cutoff by stepping back from OldestXmin. This avoids false
* conflicts when hot_standby_feedback is in use.
*/
- if (!(presult->all_visible_except_removable && presult->all_frozen))
+ if (!(all_visible_except_removable && presult->all_frozen))
{
presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
TransactionIdRetreat(presult->frz_conflict_horizon);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index cfa4b07433b..7a5bc018088 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -201,7 +201,6 @@ typedef struct PruneFreezeResult
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
bool all_visible; /* Whether or not the page is all visible */
- bool all_visible_except_removable;
/* Whether or not the page can be set all frozen in the VM */
bool all_frozen;
--
2.39.2
v5-0023-Set-hastup-in-heap_page_prune.patchtext/x-patch; charset=UTF-8; name=v5-0023-Set-hastup-in-heap_page_prune.patchDownload
From d2f59fcb90b206ba9b631bab4dcc1414f17a80af Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 18 Mar 2024 20:12:18 -0400
Subject: [PATCH v5 23/26] Set hastup in heap_page_prune
lazy_scan_prune() loops through the line pointers and tuple visibility
information for each tuple on a page, setting hastup to true if there
are any LP_REDIRECT line pointers or tuples with storage which will not
be removed. We want to remove this extra loop from lazy_scan_prune(),
and we know about non-removable tuples during heap_page_prune() anyway.
Set hastup when recording LP_REDIRECT line pointers in
heap_prune_chain() and when LP_NORMAL line pointers refer to tuples
whose visibility status is not HEAPTUPLE_DEAD.
---
src/backend/access/heap/pruneheap.c | 64 ++++++++++++++++++----------
src/backend/access/heap/vacuumlazy.c | 24 +----------
src/include/access/heapam.h | 2 +
3 files changed, 45 insertions(+), 45 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index adf6406b880..b44dc149376 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -71,7 +71,8 @@ static int heap_prune_chain(Buffer buffer,
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum);
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneFreezeResult *presult);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
PruneFreezeResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
@@ -274,6 +275,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->nnewlpdead = 0;
presult->nfrozen = 0;
+ presult->hastup = false;
+
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
* freezing tuples. Keep track of whether or not the page is all_visible
@@ -416,30 +419,42 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
}
- /*
- * Consider freezing any normal tuples which will not be removed
- */
- if (presult->htsv[offnum] != HEAPTUPLE_DEAD && pagefrz)
+ if (presult->htsv[offnum] != HEAPTUPLE_DEAD)
{
- bool totally_frozen;
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the
+ * soft assumption that any LP_DEAD items encountered here will
+ * become LP_UNUSED later on, before count_nondeletable_pages is
+ * reached. If we don't make this assumption then rel truncation
+ * will only happen every other VACUUM, at most. Besides, VACUUM
+ * must treat hastup/nonempty_pages as provisional no matter how
+ * LP_DEAD items are handled (handled here, or handled later on).
+ */
+ presult->hastup = true;
- /* Tuple with storage -- consider need to freeze */
- if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &prstate.frozen[presult->nfrozen],
- &totally_frozen)))
+ /* Consider freezing any normal tuples which will not be removed */
+ if (pagefrz)
{
- /* Save prepared freeze plan for later */
- prstate.frozen[presult->nfrozen++].offset = offnum;
- }
+ bool totally_frozen;
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the
- * page definitely cannot be set all-frozen in the visibility map
- * later on
- */
- if (!totally_frozen)
- presult->all_frozen = false;
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, pagefrz,
+ &prstate.frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ prstate.frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or
+ * eligible to become totally frozen (according to its freeze
+ * plan), then the page definitely cannot be set all-frozen in
+ * the visibility map later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+ }
}
}
@@ -993,7 +1008,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (i >= nchain)
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
- heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
+ heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], presult);
}
else if (nchain < 2 && ItemIdIsRedirected(rootlp))
{
@@ -1027,7 +1042,8 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
/* Record line pointer to be redirected */
static void
heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum)
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneFreezeResult *presult)
{
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
@@ -1037,6 +1053,8 @@ heap_prune_record_redirect(PruneState *prstate,
prstate->marked[offnum] = true;
Assert(!prstate->marked[rdoffnum]);
prstate->marked[rdoffnum] = true;
+
+ presult->hastup = true;
}
/* Record line pointer to be marked dead */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fe7751493e2..c87ab76c78e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1420,7 +1420,6 @@ lazy_scan_prune(LVRelState *vacrel,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
- bool hastup = false;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1477,28 +1476,12 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- if (!ItemIdIsUsed(itemid))
- continue;
-
/* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid))
- {
- /* page makes rel truncation unsafe */
- hastup = true;
+ if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
continue;
- }
if (ItemIdIsDead(itemid))
{
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- */
deadoffsets[lpdead_items++] = offnum;
continue;
}
@@ -1566,9 +1549,6 @@ lazy_scan_prune(LVRelState *vacrel,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
-
- hastup = true; /* page makes rel truncation unsafe */
-
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1650,7 +1630,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->recently_dead_tuples += recently_dead_tuples;
/* Can't truncate this page */
- if (hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 7a5bc018088..94573a59dd3 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -201,6 +201,8 @@ typedef struct PruneFreezeResult
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
bool all_visible; /* Whether or not the page is all visible */
+ bool hastup; /* Does page make rel truncation unsafe */
+
/* Whether or not the page can be set all frozen in the VM */
bool all_frozen;
--
2.39.2
v5-0024-Count-tuples-for-vacuum-logging-in-heap_page_prun.patchtext/x-patch; charset=UTF-8; name=v5-0024-Count-tuples-for-vacuum-logging-in-heap_page_prun.patchDownload
From 44a0347315c6cb43769193b24a0c70c5e2661a80 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Jan 2024 17:25:56 -0500
Subject: [PATCH v5 24/26] Count tuples for vacuum logging in heap_page_prune
lazy_scan_prune() loops through all of the tuple visibility information
that was recorded in heap_page_prune() and then counts live and recently
dead tuples. That information is available in heap_page_prune(), so just
record it there. Add live and recently dead tuple counters to the
PruneResult. Doing this counting in heap_page_prune() eliminates the
need for saving the tuple visibility status information in the
PruneResult. Instead, save it in the PruneState where it can be
referenced by heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 99 ++++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 93 +-------------------------
src/include/access/heapam.h | 29 +-------
3 files changed, 93 insertions(+), 128 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b44dc149376..1bfa522ecfd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -55,6 +55,18 @@ typedef struct
*/
bool marked[MaxHeapTuplesPerPage + 1];
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
+
/*
* One entry for every tuple that we may freeze.
*/
@@ -69,6 +81,7 @@ static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
PruneState *prstate, PruneFreezeResult *presult);
+static inline HTSV_Result htsv_get_valid_status(int status);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
@@ -267,7 +280,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
memset(prstate.marked, 0, sizeof(prstate.marked));
/*
- * presult->htsv is not initialized here because all ntuple spots in the
+ * prstate.htsv is not initialized here because all ntuple spots in the
* array will be set either to a valid HTSV_Result value or -1.
*/
@@ -277,6 +290,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = false;
+ presult->live_tuples = 0;
+ presult->recently_dead_tuples = 0;
+
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
* freezing tuples. Keep track of whether or not the page is all_visible
@@ -326,7 +342,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsNormal(itemid))
{
- presult->htsv[offnum] = -1;
+ prstate.htsv[offnum] = -1;
continue;
}
@@ -342,9 +358,29 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
- presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
- switch (presult->htsv[offnum])
+ prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
+
+ /*
+ * The criteria for counting a tuple as live in this block need to
+ * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
+ * and ANALYZE may produce wildly different reltuples values, e.g.
+ * when there are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as
+ * VACUUM can't run inside a transaction block, which makes some cases
+ * impossible (e.g. in-progress insert from the same transaction).
+ *
+ * We treat LP_DEAD items (which are the closest thing to DEAD tuples
+ * that might be seen here) differently, too: we assume that they'll
+ * become LP_UNUSED before VACUUM finishes. This difference is only
+ * superficial. VACUUM effectively agrees with ANALYZE about DEAD
+ * items, in the end. VACUUM won't remember LP_DEAD items, but only
+ * because they're not supposed to be left behind when it is done.
+ * (Cases where we bypass index vacuuming will violate this optimistic
+ * assumption, but the overall impact of that should be negligible.)
+ */
+ switch (prstate.htsv[offnum])
{
case HEAPTUPLE_DEAD:
@@ -364,6 +400,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
case HEAPTUPLE_LIVE:
+ /*
+ * Count it as live. Not only is this natural, but it's also
+ * what acquire_sample_rows() does.
+ */
+ presult->live_tuples++;
+
/*
* Is the tuple definitely visible to all transactions?
*
@@ -405,13 +447,34 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
+
+ /*
+ * If tuple is recently dead then we must not remove it from
+ * the relation. (We only remove items that are LP_DEAD from
+ * pruning.)
+ */
+ presult->recently_dead_tuples++;
all_visible_except_removable = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and
+ * we assume that will happen only after we report our
+ * results. This assumption is a bit shaky, but it is what
+ * acquire_sample_rows() does, so be consistent.
+ */
all_visible_except_removable = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
+
+ /*
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
+ */
+ presult->live_tuples++;
all_visible_except_removable = false;
break;
default:
@@ -419,7 +482,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
}
- if (presult->htsv[offnum] != HEAPTUPLE_DEAD)
+ if (prstate.htsv[offnum] != HEAPTUPLE_DEAD)
{
/*
* Deliberately don't set hastup for LP_DEAD items. We make the
@@ -716,10 +779,24 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
}
+/*
+ * Pruning calculates tuple visibility once and saves the results in an array
+ * of int8. See PruneState.htsv for details. This helper function is meant to
+ * guard against examining visibility status array members which have not yet
+ * been computed.
+ */
+static inline HTSV_Result
+htsv_get_valid_status(int status)
+{
+ Assert(status >= HEAPTUPLE_DEAD &&
+ status <= HEAPTUPLE_DELETE_IN_PROGRESS);
+ return (HTSV_Result) status;
+}
+
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in presult->htsv.
+ * Tuple visibility information is provided in prstate->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -770,7 +847,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(presult->htsv[rootoffnum] != -1);
+ Assert(prstate->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -793,7 +870,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -894,7 +971,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(presult->htsv[offnum]))
+ switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c87ab76c78e..2e4072fe2c1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1378,22 +1378,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where
- * heap_page_prune_and_freeze() was allowed to disagree with our
- * HeapTupleSatisfiesVacuum() call about whether or not a tuple should be
- * considered DEAD. This happened when an inserting transaction concurrently
- * aborted (after our heap_page_prune_and_freeze() call, before our
- * HeapTupleSatisfiesVacuum() call). There was rather a lot of complexity just
- * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
- * left with storage after pruning.
- *
- * As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune_and_freeze()'s visibility check. Without the
- * second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
- * there can be no disagreement. We'll just handle such tuples as if they had
- * become fully dead right after this operation completes instead of in the
- * middle of it.
- *
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
@@ -1415,10 +1399,8 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
+ int lpdead_items = 0;
PruneFreezeResult presult;
- int lpdead_items,
- live_tuples,
- recently_dead_tuples;
HeapPageFreeze pagefrz;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1438,9 +1420,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- lpdead_items = 0;
- live_tuples = 0;
- recently_dead_tuples = 0;
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1476,9 +1455,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- /* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
- continue;
if (ItemIdIsDead(itemid))
{
@@ -1486,69 +1462,6 @@ lazy_scan_prune(LVRelState *vacrel,
continue;
}
- Assert(ItemIdIsNormal(itemid));
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (htsv_get_valid_status(presult.htsv[offnum]))
- {
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- live_tuples++;
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- recently_dead_tuples++;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This an expected case during concurrent vacuum. Count such
- * rows as live. As above, we assume the deleting transaction
- * will commit and update the counters after we report.
- */
- live_tuples++;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1626,8 +1539,8 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
+ vacrel->live_tuples += presult.live_tuples;
+ vacrel->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 94573a59dd3..c2919012020 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,6 +198,8 @@ typedef struct HeapPageFreeze
*/
typedef struct PruneFreezeResult
{
+ int live_tuples;
+ int recently_dead_tuples;
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
bool all_visible; /* Whether or not the page is all visible */
@@ -210,19 +212,6 @@ typedef struct PruneFreezeResult
int nfrozen;
TransactionId frz_conflict_horizon; /* Newest xmin on the page */
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
- * details. This is of type int8[], instead of HTSV_Result[], so we can
- * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
- * items.
- *
- * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
- * 1. Otherwise every access would need to subtract 1.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
-
-
/* New value of relfrozenxid found by heap_page_prune_and_freeze() */
TransactionId new_relfrozenxid;
@@ -230,20 +219,6 @@ typedef struct PruneFreezeResult
MultiXactId new_relminmxid;
} PruneFreezeResult;
-/*
- * Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneFreezeResult.htsv for details. This helper function is
- * meant to guard against examining visibility status array members which have
- * not yet been computed.
- */
-static inline HTSV_Result
-htsv_get_valid_status(int status)
-{
- Assert(status >= HEAPTUPLE_DEAD &&
- status <= HEAPTUPLE_DELETE_IN_PROGRESS);
- return (HTSV_Result) status;
-}
-
/* ----------------
* function prototypes for heap access method
*
--
2.39.2
v5-0025-Save-dead-tuple-offsets-during-heap_page_prune.patchtext/x-patch; charset=UTF-8; name=v5-0025-Save-dead-tuple-offsets-during-heap_page_prune.patchDownload
From ce5b791dde683742173cafc3d151366f9bc015ca Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 22 Jan 2024 16:55:28 -0500
Subject: [PATCH v5 25/26] Save dead tuple offsets during heap_page_prune
After heap_page_prune() returned, lazy_scan_prune() looped through all
of the offsets of LP_DEAD items which it later added to
LVRelState->dead_items. Instead take care of this when marking a line
pointer or when an existing non-removable LP_DEAD item is encountered in
heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 7 ++++
src/backend/access/heap/vacuumlazy.c | 61 ++++++----------------------
src/include/access/heapam.h | 2 +
3 files changed, 22 insertions(+), 48 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1bfa522ecfd..2b5f8ef1e80 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -292,6 +292,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->live_tuples = 0;
presult->recently_dead_tuples = 0;
+ presult->lpdead_items = 0;
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
@@ -946,7 +947,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
+ {
presult->all_visible = false;
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
+ }
break;
}
@@ -1150,6 +1154,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
* all_visible.
*/
presult->all_visible = false;
+
+ /* Record the dead offset for vacuum */
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2e4072fe2c1..c3da64102cf 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1396,23 +1396,11 @@ lazy_scan_prune(LVRelState *vacrel,
bool *has_lpdead_items)
{
Relation rel = vacrel->rel;
- OffsetNumber offnum,
- maxoff;
- ItemId itemid;
- int lpdead_items = 0;
PruneFreezeResult presult;
HeapPageFreeze pagefrz;
- OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
- /*
- * maxoff might be reduced following line pointer array truncation in
- * heap_page_prune_and_freeze(). That's safe for us to ignore, since the
- * reclaimed space will continue to look like LP_UNUSED items below.
- */
- maxoff = PageGetMaxOffsetNumber(page);
-
/* Initialize pagefrz */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
@@ -1425,9 +1413,9 @@ lazy_scan_prune(LVRelState *vacrel,
* Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
- * in presult.ndeleted. It should not be confused with lpdead_items;
- * lpdead_items's final value can be thought of as the number of tuples
- * that were deleted from indexes.
+ * in presult.ndeleted. It should not be confused with
+ * presult.lpdead_items; presult.lpdead_items's final value can be thought
+ * of as the number of tuples that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
@@ -1437,33 +1425,10 @@ lazy_scan_prune(LVRelState *vacrel,
&pagefrz, &presult, &vacrel->offnum);
/*
- * Now scan the page to collect LP_DEAD items and check for tuples
- * requiring freezing among remaining tuples with storage. We will update
- * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
- * have determined whether or not the page is all_visible and able to
- * become all_frozen.
- *
+ * We will update the VM after collecting LP_DEAD items and freezing
+ * tuples. Pruning will have determined whether or not the page is
+ * all_visible.
*/
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
- {
- /*
- * Set the offset number so that we can display it along with any
- * error that occurred while processing this tuple.
- */
- vacrel->offnum = offnum;
- itemid = PageGetItemId(page, offnum);
-
-
- if (ItemIdIsDead(itemid))
- {
- deadoffsets[lpdead_items++] = offnum;
- continue;
- }
-
- }
-
vacrel->offnum = InvalidOffsetNumber;
Assert(MultiXactIdIsValid(presult.new_relminmxid));
@@ -1499,7 +1464,7 @@ lazy_scan_prune(LVRelState *vacrel,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(lpdead_items == 0);
+ Assert(presult.lpdead_items == 0);
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
@@ -1515,7 +1480,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
- if (lpdead_items > 0)
+ if (presult.lpdead_items > 0)
{
VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
@@ -1524,9 +1489,9 @@ lazy_scan_prune(LVRelState *vacrel,
ItemPointerSetBlockNumber(&tmp, blkno);
- for (int i = 0; i < lpdead_items; i++)
+ for (int i = 0; i < presult.lpdead_items; i++)
{
- ItemPointerSetOffsetNumber(&tmp, deadoffsets[i]);
+ ItemPointerSetOffsetNumber(&tmp, presult.deadoffsets[i]);
dead_items->items[dead_items->num_items++] = tmp;
}
@@ -1538,7 +1503,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += lpdead_items;
+ vacrel->lpdead_items += presult.lpdead_items;
vacrel->live_tuples += presult.live_tuples;
vacrel->recently_dead_tuples += presult.recently_dead_tuples;
@@ -1547,7 +1512,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
- *has_lpdead_items = (lpdead_items > 0);
+ *has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
@@ -1615,7 +1580,7 @@ lazy_scan_prune(LVRelState *vacrel,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (lpdead_items > 0 && PageIsAllVisible(page))
+ else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c2919012020..ee0eca8ae02 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -217,6 +217,8 @@ typedef struct PruneFreezeResult
/* New value of relminmxid found by heap_page_prune_and_freeze() */
MultiXactId new_relminmxid;
+ int lpdead_items; /* includes existing LP_DEAD items */
+ OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
/* ----------------
--
2.39.2
v5-0026-reorder-PruneFreezeResult-fields.patchtext/x-patch; charset=UTF-8; name=v5-0026-reorder-PruneFreezeResult-fields.patchDownload
From e993e0d98cd0ef1ecbd229f6ddbe23d59a427e3a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 11:40:34 +0200
Subject: [PATCH v5 26/26] reorder PruneFreezeResult fields
---
src/include/access/heapam.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ee0eca8ae02..b2015f5a1ac 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -202,14 +202,17 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ int nfrozen;
+
bool all_visible; /* Whether or not the page is all visible */
bool hastup; /* Does page make rel truncation unsafe */
+ /* The following fields are only set if freezing */
+
/* Whether or not the page can be set all frozen in the VM */
bool all_frozen;
/* Number of newly frozen tuples */
- int nfrozen;
TransactionId frz_conflict_horizon; /* Newest xmin on the page */
/* New value of relfrozenxid found by heap_page_prune_and_freeze() */
--
2.39.2
On Wed, Mar 20, 2024 at 03:15:49PM +0200, Heikki Linnakangas wrote:
On 20/03/2024 03:36, Melanie Plageman wrote:
On Mon, Mar 18, 2024 at 01:15:21AM +0200, Heikki Linnakangas wrote:
On 15/03/2024 02:56, Melanie Plageman wrote:
Okay, so I was going to start using xl_heap_prune for vacuum here too,
but I realized it would be bigger because of the
snapshotConflictHorizon. Do you think there is a non-terrible way to
make the snapshotConflictHorizon optional? Like with a flag?Yeah, another flag would do the trick.
Okay, I've done this in attached v4 (including removing
XLOG_HEAP2_VACUUM). I had to put the snapshot conflict horizon in the
"main chunk" of data available at replay regardless of whether or not
the record ended up including an FPI.I made it its own sub-record (xlhp_conflict_horizon) less to help with
alignment (though we can use all the help we can get there) and more to
keep it from getting lost. When you look at heapam_xlog.h, you can see
what a XLOG_HEAP2_PRUNE record will contain starting with the
xl_heap_prune struct and then all the sub-record types.Ok, now that I look at this, I wonder if we're being overly cautious about
the WAL size. We probably could just always include the snapshot field, and
set it to InvalidTransactionId and waste 4 bytes when it's not needed. For
the sake of simplicity. I don't feel strongly either way though, the flag is
pretty simple too.
That will mean that all vacuum records are at least 3 bytes bigger than
before -- which makes it somewhat less defensible to get rid of
xl_heap_vacuum.
That being said, I ended up doing an unaligned access when I
packed it and made it optional, so maybe it is less user-friendly.
But I also think that making it optional is more clear for vacuum which
will never use it.
I realized that the WAL record format changes are pretty independent from
the rest of the patches. They could be applied before the rest. Without the
rest of the changes, we'll still write two WAL records per page in vacuum,
one to prune and another one to freeze, but it's another meaningful
incremental step. So I reshuffled the patches, so that the WAL format is
changed first, before the rest of the changes.
Ah, great idea! That eliminates the issue of preliminary commits having
larger WAL records that then get streamlined.
0001-0008: These are the WAL format changes. There's some comment cleanup
needed, but as far as the code goes, I think these are pretty much ready to
be squashed & committed.
My review in this email is *only* for 0001-0008. I have not looked at
the rest yet.
From 06d5ff5349a8aa95cbfd06a8043fe503b7b1bf7b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 14:50:14 +0200
Subject: [PATCH v5 01/26] Merge prune, freeze and vacuum WAL record formatsThe new combined WAL record is now used for pruning, freezing and 2nd
pass of vacuum. This is in preparation of changing vacuuming to write
a combined prune+freeze record per page, instead of separate two
records. The new WAL record format now supports that, but the code
still always writes separate records for pruning and freezing.
Attached patch changes-for-0001.patch has a bunch of updated comments --
especially for heapam_xlog.h (including my promised diagram). And a few
suggestions (mostly changes that I should have made before).
XXX I tried to lift-and-shift the code from v4 patch set as unchanged
as possible, for easier review, but some noteworthy changes:
In the final commit message, I think it is worth calling out that all of
those freeze logging functions heap_log_freeze_eq/cmp/etc are lifted and
shifted from one file to another. When I am reviewing a big diff, it is
always helpful to know where I need to review line-by-line.
- Instead of passing PruneState and PageFreezeResult to
log_heap_prune_and_freeze(), pass the arrays of frozen, redirected
et al offsets directly. That way, it can be called from other places.
good idea.
- moved heap_xlog_deserialize_prune_and_freeze() from xactdesc.c to
heapdesc.c. (Because that's clearly where it belongs)
:)
From cd6cdaebb362b014733e99ecd868896caf0fb3aa Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 13:45:01 +0200
Subject: [PATCH v5 02/26] Keep the original numbers for existing WAL recordsDoesn't matter much because the WAL format is not compatible across
major versions anyway. But still seems nice to keep the identifiers
unchanged when we can. (There's some precedence for this if you search
the git history for "is free, was").
sounds good.
From d3207bb557aa1d2868a50d357a06318a6c0cb5cd Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 13:48:29 +0200
Subject: [PATCH v5 03/26] Rename record to XLOG_HEAP2_PRUNE_FREEZETo clarify that it also freezes now, and to make it clear that it's
significantly different from the old XLOG_HEAP2_PRUNE format.
+1
From 5d6fc2ffbdd839e0b69242af16446a46cf6a2dc7 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 13:49:59 +0200
Subject: [PATCH v5 04/26] 'nplans' is a pointerI'm surprised the compiler didn't warn about this
oops. while looking at this, I noticed that the asserts I added that
nredirected > 0, ndead > 0, and nunused > 0 have the same problem.
---
src/backend/access/rmgrdesc/heapdesc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c index 8b94c869faf..9ef8a745982 100644 --- a/src/backend/access/rmgrdesc/heapdesc.c +++ b/src/backend/access/rmgrdesc/heapdesc.c @@ -155,8 +155,7 @@ heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags, cursor += sizeof(OffsetNumber) * *nunused; }- if (nplans > 0)
From 59f3f80f82ed7a63d86c991d0cb025e4cde2caec Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 13:36:41 +0200
Subject: [PATCH v5 06/26] Fix logging snapshot conflict horizon.- it was accessed without proper alignment, which won't work on
architectures that are strict about alignment. Use memcpy.
wow, oops. thanks for fixing this!
- in heap_xlog_prune_freeze, the code tried to access the xid with
"(xlhp_conflict_horizon *) (xlrec + SizeOfHeapPrune);" But 'xlrec'
was "xl_heap_prune *" rather than "char *". That happened to work,
because sizeof(xl_heap_prune) == 1, but make it more robust by
adding a cast to char *.
good catch.
- remove xlhp_conflict_horizon and store a TransactionId directly. A
separate struct would make sense if we needed to store anything else
there, but for now it just seems like more code.
makes sense. I just want to make sure heapam_xlog.h makes it extra clear
that this is happening. I see your comment addition. If you combine it
with my comment additions in the attached patch for 0001, hopefully that
makes it clear enough.
From 8af186ee9dd8c7dc20f37a69b34cab7b95faa43b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 14:03:06 +0200
Subject: [PATCH v5 07/26] Add comment to log_heap_prune_and_freeze().XXX: This should be rewritten, but I tried to at least list some
important points.
Are you thinking that it needs to mention more things or that the things
it mentions need more detail?
From b26e36ba8614d907a6e15810ed4f684f8f628dd2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 14:53:31 +0200
Subject: [PATCH v5 08/26] minor refactoring in log_heap_prune_and_freeze()Mostly to make local variables more tightly-scoped.
So, I don't think you can move those sub-records into the tighter scope.
If you run tests with this applied, you'll see it crashes and a lot of
the data in the record is wrong. If you move the sub-record declarations
out to a wider scope, the tests pass.
The WAL record data isn't actually copied into the buffer until
recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE_FREEZE);
after registering everything.
So all of those sub-records you made are out of scope the time it tries
to copy from them.
I spent the better part of a day last week trying to figure out what was
happening after I did the exact same thing. I must say that I found the
xloginsert API incredibly unhelpful on this point.
I would like to review the rest of the suggested changes in this patch
after we fix the issue I just mentioned.
- Melanie
Attachments:
changes-for-0001.patchtext/x-diff; charset=us-asciiDownload
From 93d2790fac9c66c67165555d541410777ec9ad3b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 20 Mar 2024 14:13:33 -0400
Subject: [PATCH 2/9] comments on 0001
---
src/backend/access/heap/heapam.c | 4 +-
src/backend/access/heap/pruneheap.c | 7 ++-
src/include/access/heapam_xlog.h | 88 ++++++++++++++++++++---------
3 files changed, 68 insertions(+), 31 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e6cfffd9f3e..17b733fd706 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8706,6 +8706,8 @@ heap_xlog_prune(XLogReaderState *record)
MarkBufferDirty(buffer);
}
+ // TODO: should we avoid this if we only froze? heap_xlog_freeze() doesn't
+ // do it
if (BufferIsValid(buffer))
{
Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
@@ -8713,7 +8715,7 @@ heap_xlog_prune(XLogReaderState *record)
UnlockReleaseBuffer(buffer);
/*
- * After modifying records on a page, it's useful to update the FSM
+ * After modifying tuples on a page, it's useful to update the FSM
* about it, as it may cause the page become target for insertions
* later even if vacuum decides not to visit it (which is possible if
* gets marked all-visible.)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9773681868c..6fc5c22a22d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1294,7 +1294,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
int nplans = 0;
xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
- bool do_freeze = (nfrozen > 0);
+ bool do_freeze = (nfrozen > 0); // don't need these parantheses
+ // actually probably just lose this variable
xlrec.flags = 0;
@@ -1311,8 +1312,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlrec.flags |= XLHP_HAS_CONFLICT_HORIZON;
/*
- * Prepare deduplicated representation for use in WAL record Destructively
- * sorts tuples array in-place.
+ * Prepare deduplicated representation for use in WAL record. This
+ * destructively sorts frozen tuples array in-place.
*/
if (do_freeze)
nplans = heap_log_freeze_plan(frozen, nfrozen, plans, frz_offsets);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index dfeb703d136..14e0e49e539 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -225,22 +225,32 @@ typedef struct xl_heap_update
#define SizeOfHeapUpdate (offsetof(xl_heap_update, new_offnum) + sizeof(OffsetNumber))
/*
- * This is what we need to know about page pruning and freezing, both during
- * VACUUM and during opportunistic pruning.
+ * These structures and flags encode VACUUM pruning and freezing and on-access
+ * pruning page modifications.
*
- * If XLPH_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, or XLHP_HAS_NOW_UNUSED is set,
- * acquires a full cleanup lock. Otherwise an ordinary exclusive lock is
- * enough. This can happen if freezing was the only modification to the page.
+ * xl_heap_prune is the main record. The XLHP_HAS_* flags indicate which
+ * "sub-records" are included and the other XLHP_* flags provide additional
+ * information about the conditions for replay.
*
- * The data for block reference 0 contains "sub-records" depending on which
- * of the XLHP_HAS_* flags are set. See xlhp_* struct definitions below.
- * The layout is in the same order as the XLHP_* flags.
+ * The data for block reference 0 contains "sub-records" depending on which of
+ * the XLHP_HAS_* flags are set. Offset numbers are in the block reference data
+ * following each sub-record. See xlhp_* struct definitions below. The layout
+ * is in the same order as the XLHP_* flags.
*
- * OFFSET NUMBERS are in the block reference 0
- *
- * If only unused item offsets are included because the record is constructed
- * during vacuum's second pass (marking LP_DEAD items LP_UNUSED) then only an
- * ordinary exclusive lock is required to replay.
+ * An example record with every sub-record included.
+ *-----------------------------------------------------------------------------
+ * uint8 flags (begin xl_heap_prune)
+ * TransactionId snapshot_conflict_horizon
+ * uint16 nplans (begin xlhp_freeze)
+ * xl_heap_freeze_plan plans[nplans]
+ * uint16 nredirected (begin xlhp_prune_items)
+ * OffsetNumber redirected[2 * nredirected]
+ * uint16 ndead (begin xlhp_prune_items)
+ * OffsetNumber nowdead[ndead]
+ * uint16 nunused (begin xlhp_prune_items)
+ * OffsetNumber nowunused[nunused]
+ * OffsetNumber frz_offsets[sum([plan.ntuples for plan in plans])]
+ *-----------------------------------------------------------------------------
*/
typedef struct xl_heap_prune
{
@@ -251,20 +261,42 @@ typedef struct xl_heap_prune
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
- * During vacuum's second pass which sets LP_DEAD items LP_UNUSED, we will only
- * truncate the line pointer array, not call PageRepairFragmentation. We need
- * this flag to differentiate what kind of lock (exclusive or cleanup) to take
- * on the buffer and whether to call PageTruncateLinePointerArray() or
- * PageRepairFragementation().
+ * Vacuum's second pass sets LP_DEAD items LP_UNUSED and truncates the line
+ * pointer array with PageTruncateLinePointerArray(). It will emit a WAL
+ * record with XLHP_LP_TRUNCATE_ONLY set to indicate that only an ordinary
+ * exclusive lock is needed to replay the record. When XLHP_LP_TRUNCATE_ONLY is
+ * unset, we take a cleanup lock and call PageRepairFragementation().
*/
#define XLHP_LP_TRUNCATE_ONLY (1 << 2)
/*
* Vacuum's first pass and on-access pruning may need to include a snapshot
- * conflict horizon.
+ * conflict horizon. The snapshot conflict horizon is needed regardless of
+ * whether or not a full-page image was emitted, so the
+ * snapshot_conflict_horizon is located in the "main chunk" of the WAL record,
+ * available at replay with XLogRecGetData(), while all of the sub-records are
+ * located in the block reference data, available at replay with
+ * XLogRecGetBlockData().
*/
#define XLHP_HAS_CONFLICT_HORIZON (1 << 3)
+
+/*
+ * Indicates that an xlhp_freeze sub-record and one or more xl_heap_freeze_plan
+ * sub-records are present. If XLHP_HAS_FREEZE_PLANS is set and no other page
+ * modifications will be made, an ordinary exclusive lock on the buffer is
+ * sufficient to replay the record.
+ */
#define XLHP_HAS_FREEZE_PLANS (1 << 4)
+
+/*
+ * XLPH_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, and XLHP_HAS_NOW_UNUSED indicate
+ * that xlhp_prune_items sub-records with redirected, dead, and unused item
+ * offsets are present in the record. If XLHP_HAS_REDIRECTIONS or
+ * XLHP_HAS_DEAD_ITEMS is set or if XLHP_HAS_NOW_UNUSED items is set and
+ * XLHP_LP_TRUNCATE_ONLY is not set, a full cleanup lock on the buffer is
+ * needed to replay the record. Otherwise, an ordinary exclusive lock is
+ * sufficient.
+ */
#define XLHP_HAS_REDIRECTIONS (1 << 5)
#define XLHP_HAS_DEAD_ITEMS (1 << 6)
#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 7)
@@ -299,15 +331,16 @@ typedef struct xl_heap_freeze_plan
} xl_heap_freeze_plan;
/*
- * As of Postgres 17, XLOG_HEAP2_PRUNE records replace
- * XLOG_HEAP2_FREEZE_PAGE records.
+ * As of Postgres 17, XLOG_HEAP2_PRUNE records replace XLOG_HEAP2_FREEZE_PAGE
+ * records.
*
* This is what we need to know about a block being frozen during vacuum
*
- * Backup block 0's data contains an array of xl_heap_freeze_plan structs
- * (with nplans elements), followed by one or more page offset number arrays.
- * Each such page offset number array corresponds to a single freeze plan
- * (REDO routine freezes corresponding heap tuples using freeze plan).
+ * The backup block's data contains an array of xl_heap_freeze_plan structs
+ * (with nplans elements). The individual item offsets are located in an array
+ * at the end of the entire record with with nplans * (each plan's ntuples)
+ * members. Those offsets are in the same order as the plans. The REDO routine
+ * uses the offsets to freeze the corresponding heap tuples.
*/
typedef struct xlhp_freeze
{
@@ -316,8 +349,9 @@ typedef struct xlhp_freeze
} xlhp_freeze;
/*
- * Sub-record type contained in block reference 0 of a prune record if
- * XLHP_HAS_REDIRECTIONS/XLHP_HAS_DEAD_ITEMS/XLHP_HAS_NOW_UNUSED_ITEMS is set.
+ * Generic sub-record type contained in block reference 0 of an xl_heap_prune
+ * record and used for redirect, dead, and unused items if any of
+ * XLHP_HAS_REDIRECTIONS/XLHP_HAS_DEAD_ITEMS/XLHP_HAS_NOW_UNUSED_ITEMS are set.
* Note that in the XLHP_HAS_REDIRECTIONS variant, there are actually 2 *
* length number of OffsetNumbers in the data.
*/
--
2.40.1
On Wed, Mar 20, 2024 at 9:15 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I made it its own sub-record (xlhp_conflict_horizon) less to help with
alignment (though we can use all the help we can get there) and more to
keep it from getting lost. When you look at heapam_xlog.h, you can see
what a XLOG_HEAP2_PRUNE record will contain starting with the
xl_heap_prune struct and then all the sub-record types.Ok, now that I look at this, I wonder if we're being overly cautious
about the WAL size. We probably could just always include the snapshot
field, and set it to InvalidTransactionId and waste 4 bytes when it's
not needed. For the sake of simplicity. I don't feel strongly either way
though, the flag is pretty simple too.
What about the issue of cleanup locks, which aren't needed and aren't
taken with the current heapam VACUUM record type? Will you preserve
that aspect of the existing design?
--
Peter Geoghegan
On Wed, Mar 20, 2024 at 4:04 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, Mar 20, 2024 at 9:15 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I made it its own sub-record (xlhp_conflict_horizon) less to help with
alignment (though we can use all the help we can get there) and more to
keep it from getting lost. When you look at heapam_xlog.h, you can see
what a XLOG_HEAP2_PRUNE record will contain starting with the
xl_heap_prune struct and then all the sub-record types.Ok, now that I look at this, I wonder if we're being overly cautious
about the WAL size. We probably could just always include the snapshot
field, and set it to InvalidTransactionId and waste 4 bytes when it's
not needed. For the sake of simplicity. I don't feel strongly either way
though, the flag is pretty simple too.What about the issue of cleanup locks, which aren't needed and aren't
taken with the current heapam VACUUM record type? Will you preserve
that aspect of the existing design?
Yep, we have a flag to indicate whether or not a cleanup lock is needed.
- Melanie
On Wed, Mar 20, 2024 at 4:06 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
What about the issue of cleanup locks, which aren't needed and aren't
taken with the current heapam VACUUM record type? Will you preserve
that aspect of the existing design?Yep, we have a flag to indicate whether or not a cleanup lock is needed.
Thanks for confirming.
I realize that this is fairly obvious, but thought it better to ask.
--
Peter Geoghegan
On Wed, Mar 20, 2024 at 03:15:49PM +0200, Heikki Linnakangas wrote:
0009-: The rest of the v4 patches, rebased over the WAL format changes. I
also added a few small commits for little cleanups that caught my eye, let
me know if you disagree with those.
This review is just of the patches containing changes you made in
0009-0026.
From d36138b5bf0a93557273b5e47f8cd5ea089057c7 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 11:47:42 +0200
Subject: [PATCH v5 13/26] still use a local 'cutoffs' variableGiven how often 'cutoffs' is used in the function, I think it still
makes sense to have a local variable for it, just to keep the source
lines shorter.
Works for me.
From 913617ed98cfddd678a6f620db7dee68d1d61c89 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 10:51:13 +0200
Subject: [PATCH v5 21/26] move whole_page_freezable to tighter scope
Works for me.
From e2b50f9b64f7e4255f4f764e2a348e1b446573dc Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 11:43:31 +0200
Subject: [PATCH v5 22/26] make 'all_visible_except_removable' localThe caller doesn't need it, so it doesn't belong in PruneFreezeResult
Makes sense to me.
From e993e0d98cd0ef1ecbd229f6ddbe23d59a427e3a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 11:40:34 +0200
Subject: [PATCH v5 26/26] reorder PruneFreezeResult fields---
src/include/access/heapam.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index ee0eca8ae02..b2015f5a1ac 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -202,14 +202,17 @@ typedef struct PruneFreezeResult int recently_dead_tuples; int ndeleted; /* Number of tuples deleted from the page */ int nnewlpdead; /* Number of newly LP_DEAD items */ + int nfrozen;
Let's add a comment after int nfrozen like
/* Number of newly frozen tuples */
+
bool all_visible; /* Whether or not the page is all visible */
bool hastup; /* Does page make rel truncation unsafe */+ /* The following fields are only set if freezing */
So, all_frozen will be set correctly if we are even considering freezing
(if pagefrz is passed). all_frozen will be true even if we didn't freeze
anything if the page is all-frozen and can be set as such in the VM. And
it will be false if the page is not all-frozen. So, maybe we say
"considering freezing".
But, I'm glad you thought to call out which of these fields will make
sense to the caller.
Also, maybe we should just name the members to which this applies. It is
a bit confusing that I can't tell if the comment also refers to the
other members following it (lpdead_items and deadoffsets) -- which it
doesn't.
+
/* Whether or not the page can be set all frozen in the VM */
bool all_frozen;/* Number of newly frozen tuples */
- int nfrozen;
TransactionId frz_conflict_horizon; /* Newest xmin on the page *//* New value of relfrozenxid found by heap_page_prune_and_freeze() */
--
2.39.2
- Melanie
On 20/03/2024 23:03, Melanie Plageman wrote:
On Wed, Mar 20, 2024 at 03:15:49PM +0200, Heikki Linnakangas wrote:
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index ee0eca8ae02..b2015f5a1ac 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -202,14 +202,17 @@ typedef struct PruneFreezeResult int recently_dead_tuples; int ndeleted; /* Number of tuples deleted from the page */ int nnewlpdead; /* Number of newly LP_DEAD items */ + int nfrozen;Let's add a comment after int nfrozen like
/* Number of newly frozen tuples */+
bool all_visible; /* Whether or not the page is all visible */
bool hastup; /* Does page make rel truncation unsafe */+ /* The following fields are only set if freezing */
So, all_frozen will be set correctly if we are even considering freezing
(if pagefrz is passed). all_frozen will be true even if we didn't freeze
anything if the page is all-frozen and can be set as such in the VM. And
it will be false if the page is not all-frozen. So, maybe we say
"considering freezing".But, I'm glad you thought to call out which of these fields will make
sense to the caller.Also, maybe we should just name the members to which this applies. It is
a bit confusing that I can't tell if the comment also refers to the
other members following it (lpdead_items and deadoffsets) -- which it
doesn't.
Right, sorry, I spotted the general issue that it's not clear which
fields are valid when. I added that comment to remind about that, but I
then forgot about it.
In heap_page_prune_and_freeze(), we now do some extra work on each live
tuple, to set the all_visible_except_removable correctly. And also to
update live_tuples, recently_dead_tuples and hastup. When we're not
freezing, that's a waste of cycles, the caller doesn't care. I hope it's
enough that it doesn't matter, but is it?
The first commit (after the WAL format changes) changes the all-visible
check to use GlobalVisTestIsRemovableXid. The commit message says that
it's because we don't have 'cutoffs' available, but we only care about
that when we're freezing, and when we're freezing, we actually do have
'cutoffs' in HeapPageFreeze. Using GlobalVisTestIsRemovableXid seems
sensible anyway, because that's what we use in
heap_prune_satisfies_vacuum() too, but just wanted to point that out.
The 'frz_conflict_horizon' stuff is still fuzzy to me. (Not necessarily
these patches's fault). This at least is wrong, because Max(a, b)
doesn't handle XID wraparound correctly:
if (do_freeze)
conflict_xid = Max(prstate.snapshotConflictHorizon,
presult->frz_conflict_horizon);
else
conflict_xid = prstate.snapshotConflictHorizon;
Then there's this in lazy_scan_prune():
/* Using same cutoff when setting VM is now unnecessary */
if (presult.all_frozen)
presult.frz_conflict_horizon = InvalidTransactionId;
This does the right thing in the end, but if all the tuples are frozen
shouldn't frz_conflict_horizon already be InvalidTransactionId? The
comment says it's "newest xmin on the page", and if everything was
frozen, all xmins are FrozenTransactionId. In other words, that should
be moved to heap_page_prune_and_freeze() so that it doesn't lie to its
caller. Also, frz_conflict_horizon is only set correctly if
'all_frozen==true', would be good to mention that in the comments too.
--
Heikki Linnakangas
Neon (https://neon.tech)
On 20/03/2024 21:17, Melanie Plageman wrote:
Attached patch changes-for-0001.patch has a bunch of updated comments --
especially for heapam_xlog.h (including my promised diagram). And a few
suggestions (mostly changes that I should have made before).
New version of these WAL format changes attached. Squashed to one patch now.
+ // TODO: should we avoid this if we only froze? heap_xlog_freeze() doesn't + // do it
Ah yes, that makes sense, did that.
In the final commit message, I think it is worth calling out that all of
those freeze logging functions heap_log_freeze_eq/cmp/etc are lifted and
shifted from one file to another. When I am reviewing a big diff, it is
always helpful to know where I need to review line-by-line.
Done.
From 5d6fc2ffbdd839e0b69242af16446a46cf6a2dc7 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 13:49:59 +0200
Subject: [PATCH v5 04/26] 'nplans' is a pointerI'm surprised the compiler didn't warn about this
oops. while looking at this, I noticed that the asserts I added that
nredirected > 0, ndead > 0, and nunused > 0 have the same problem.
Good catch, fixed.
- remove xlhp_conflict_horizon and store a TransactionId directly. A
separate struct would make sense if we needed to store anything else
there, but for now it just seems like more code.makes sense. I just want to make sure heapam_xlog.h makes it extra clear
that this is happening. I see your comment addition. If you combine it
with my comment additions in the attached patch for 0001, hopefully that
makes it clear enough.
Thanks. I spent more time on the comments throughout the patch. And one
notable code change: I replaced the XLHP_LP_TRUNCATE_ONLY flag with
XLHP_CLEANUP_LOCK. XLHP_CLEANUP_LOCK directly indicates if you need a
cleanup lock to replay the record. It must always be set when
XLHP_HAS_REDIRECTIONS or XLHP_HAS_DEAD_ITEMS is set, because replaying
those always needs a cleanup lock. That felt easier to document and
understand than XLHP_LP_TRUNCATE_ONLY.
From 8af186ee9dd8c7dc20f37a69b34cab7b95faa43b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 14:03:06 +0200
Subject: [PATCH v5 07/26] Add comment to log_heap_prune_and_freeze().XXX: This should be rewritten, but I tried to at least list some
important points.Are you thinking that it needs to mention more things or that the things
it mentions need more detail?
I previously just quickly jotted down things that seemed worth
mentioning in the comment. It was not so bad actually, but I reworded it
a little.
From b26e36ba8614d907a6e15810ed4f684f8f628dd2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 14:53:31 +0200
Subject: [PATCH v5 08/26] minor refactoring in log_heap_prune_and_freeze()Mostly to make local variables more tightly-scoped.
So, I don't think you can move those sub-records into the tighter scope.
If you run tests with this applied, you'll see it crashes and a lot of
the data in the record is wrong. If you move the sub-record declarations
out to a wider scope, the tests pass.The WAL record data isn't actually copied into the buffer until
recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE_FREEZE);
after registering everything.
So all of those sub-records you made are out of scope the time it tries
to copy from them.I spent the better part of a day last week trying to figure out what was
happening after I did the exact same thing. I must say that I found the
xloginsert API incredibly unhelpful on this point.
Oops. I had that in mind and that was actually why I moved the
XLogRegisterData() call to the end of the function, because I found it
confusing to register the struct before filling it in completely, even
though it works perfectly fine. But then I missed it anyway when I moved
the local variables. I added a brief comment on that.
I would like to review the rest of the suggested changes in this patch
after we fix the issue I just mentioned.
Thanks, review is appreciated. I feel this is ready now, so barring any
big new issues, I plan to commit this early next week.
There is another patch in the commitfest that touches this area:
"Recording whether Heap2/PRUNE records are from VACUUM or from
opportunistic pruning" [1]/messages/by-id/CAH2-Wzmsevhox+HJpFmQgCxWWDgNzP0R9F+VBnpOK6TgxMPrRw@mail.gmail.com. That actually goes in the opposite direction
than this patch. That patch wants to add more information, to show
whether a record was emitted by VACUUM or on-access pruning, while this
patch makes the freezing and VACUUM's 2nd phase records also look the
same. We could easily add more flags to xl_heap_prune to distinguish
them. Or assign different xl_info code for them, like that other patch
proposed. But I don't think that needs to block this patch, that can be
added as a separate patch.
[1]: /messages/by-id/CAH2-Wzmsevhox+HJpFmQgCxWWDgNzP0R9F+VBnpOK6TgxMPrRw@mail.gmail.com
/messages/by-id/CAH2-Wzmsevhox+HJpFmQgCxWWDgNzP0R9F+VBnpOK6TgxMPrRw@mail.gmail.com
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
v6-0001-Merge-prune-freeze-and-vacuum-WAL-record-formats.patchtext/x-patch; charset=UTF-8; name=v6-0001-Merge-prune-freeze-and-vacuum-WAL-record-formats.patchDownload
From 042185d3de14dcb7088bbe50e9c64e365ac42c2a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 22 Mar 2024 23:10:22 +0200
Subject: [PATCH v6] Merge prune, freeze and vacuum WAL record formats
The new combined WAL record is now used for pruning, freezing and 2nd
pass of vacuum. This is in preparation of changing vacuuming to write
a combined prune+freeze record per page, instead of separate two
records. The new WAL record format now supports that, but the code
still always writes separate records for pruning and freezing.
The function to emit the new WAL record, log_heap_prune_and_freeze(),
is in pruneheap.c. The existing heap_log_freeze_plan() and its
subroutines are moved to pruneheap.c without changes, to keep them
together with log_heap_prune_and_freeze().
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAAKRu_azf-zH%3DDgVbquZ3tFWjMY1w5pO8m-TXJaMdri8z3933g@mail.gmail.com
---
src/backend/access/gist/gistxlog.c | 8 +-
src/backend/access/hash/hash_xlog.c | 8 +-
src/backend/access/heap/heapam.c | 464 +++++------------------
src/backend/access/heap/pruneheap.c | 384 ++++++++++++++++---
src/backend/access/heap/vacuumlazy.c | 21 +-
src/backend/access/rmgrdesc/heapdesc.c | 226 +++++++----
src/backend/replication/logical/decode.c | 4 +-
src/include/access/heapam.h | 9 +-
src/include/access/heapam_xlog.h | 214 +++++++----
src/include/access/xlog_internal.h | 3 +-
src/tools/pgindent/typedefs.list | 6 +-
11 files changed, 748 insertions(+), 599 deletions(-)
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index fafd9f1c94f..588cade585b 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -183,10 +183,10 @@ gistRedoDeleteRecord(XLogReaderState *record)
*
* GiST delete records can conflict with standby queries. You might think
* that vacuum records would conflict as well, but we've handled that
- * already. XLOG_HEAP2_PRUNE records provide the highest xid cleaned by
- * the vacuum of the heap and so we can resolve any conflicts just once
- * when that arrives. After that we know that no conflicts exist from
- * individual gist vacuum records on that index.
+ * already. XLOG_HEAP2_PRUNE_FREEZE records provide the highest xid
+ * cleaned by the vacuum of the heap and so we can resolve any conflicts
+ * just once when that arrives. After that we know that no conflicts
+ * exist from individual gist vacuum records on that index.
*/
if (InHotStandby)
{
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index 4e05a1b4632..883915fd1da 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -992,10 +992,10 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
* Hash index records that are marked as LP_DEAD and being removed during
* hash index tuple insertion can conflict with standby queries. You might
* think that vacuum records would conflict as well, but we've handled
- * that already. XLOG_HEAP2_PRUNE records provide the highest xid cleaned
- * by the vacuum of the heap and so we can resolve any conflicts just once
- * when that arrives. After that we know that no conflicts exist from
- * individual hash index vacuum records on that index.
+ * that already. XLOG_HEAP2_PRUNE_FREEZE records provide the highest xid
+ * cleaned by the vacuum of the heap and so we can resolve any conflicts
+ * just once when that arrives. After that we know that no conflicts
+ * exist from individual hash index vacuum records on that index.
*/
if (InHotStandby)
{
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34bc60f625f..a09ef75ac37 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -91,9 +91,6 @@ static void compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
static TM_Result heap_lock_updated_tuple(Relation rel, HeapTuple tuple,
ItemPointer ctid, TransactionId xid,
LockTupleMode mode);
-static int heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
- xl_heap_freeze_plan *plans_out,
- OffsetNumber *offsets_out);
static void GetMultiXactIdHintBits(MultiXactId multi, uint16 *new_infomask,
uint16 *new_infomask2);
static TransactionId MultiXactIdGetUpdateXid(TransactionId xmax,
@@ -6746,179 +6743,16 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
/* Now WAL-log freezing if necessary */
if (RelationNeedsWAL(rel))
{
- xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
- OffsetNumber offsets[MaxHeapTuplesPerPage];
- int nplans;
- xl_heap_freeze_page xlrec;
- XLogRecPtr recptr;
-
- /* Prepare deduplicated representation for use in WAL record */
- nplans = heap_log_freeze_plan(tuples, ntuples, plans, offsets);
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(rel);
- xlrec.nplans = nplans;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapFreezePage);
-
- /*
- * The freeze plan array and offset array are not actually in the
- * buffer, but pretend that they are. When XLogInsert stores the
- * whole buffer, the arrays need not be stored too.
- */
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- XLogRegisterBufData(0, (char *) plans,
- nplans * sizeof(xl_heap_freeze_plan));
- XLogRegisterBufData(0, (char *) offsets,
- ntuples * sizeof(OffsetNumber));
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_FREEZE_PAGE);
-
- PageSetLSN(page, recptr);
+ log_heap_prune_and_freeze(rel, buffer, snapshotConflictHorizon, false,
+ tuples, ntuples,
+ NULL, 0, /* redirected */
+ NULL, 0, /* dead */
+ NULL, 0); /* unused */
}
END_CRIT_SECTION();
}
-/*
- * Comparator used to deduplicate XLOG_HEAP2_FREEZE_PAGE freeze plans
- */
-static int
-heap_log_freeze_cmp(const void *arg1, const void *arg2)
-{
- HeapTupleFreeze *frz1 = (HeapTupleFreeze *) arg1;
- HeapTupleFreeze *frz2 = (HeapTupleFreeze *) arg2;
-
- if (frz1->xmax < frz2->xmax)
- return -1;
- else if (frz1->xmax > frz2->xmax)
- return 1;
-
- if (frz1->t_infomask2 < frz2->t_infomask2)
- return -1;
- else if (frz1->t_infomask2 > frz2->t_infomask2)
- return 1;
-
- if (frz1->t_infomask < frz2->t_infomask)
- return -1;
- else if (frz1->t_infomask > frz2->t_infomask)
- return 1;
-
- if (frz1->frzflags < frz2->frzflags)
- return -1;
- else if (frz1->frzflags > frz2->frzflags)
- return 1;
-
- /*
- * heap_log_freeze_eq would consider these tuple-wise plans to be equal.
- * (So the tuples will share a single canonical freeze plan.)
- *
- * We tiebreak on page offset number to keep each freeze plan's page
- * offset number array individually sorted. (Unnecessary, but be tidy.)
- */
- if (frz1->offset < frz2->offset)
- return -1;
- else if (frz1->offset > frz2->offset)
- return 1;
-
- Assert(false);
- return 0;
-}
-
-/*
- * Compare fields that describe actions required to freeze tuple with caller's
- * open plan. If everything matches then the frz tuple plan is equivalent to
- * caller's plan.
- */
-static inline bool
-heap_log_freeze_eq(xl_heap_freeze_plan *plan, HeapTupleFreeze *frz)
-{
- if (plan->xmax == frz->xmax &&
- plan->t_infomask2 == frz->t_infomask2 &&
- plan->t_infomask == frz->t_infomask &&
- plan->frzflags == frz->frzflags)
- return true;
-
- /* Caller must call heap_log_freeze_new_plan again for frz */
- return false;
-}
-
-/*
- * Start new plan initialized using tuple-level actions. At least one tuple
- * will have steps required to freeze described by caller's plan during REDO.
- */
-static inline void
-heap_log_freeze_new_plan(xl_heap_freeze_plan *plan, HeapTupleFreeze *frz)
-{
- plan->xmax = frz->xmax;
- plan->t_infomask2 = frz->t_infomask2;
- plan->t_infomask = frz->t_infomask;
- plan->frzflags = frz->frzflags;
- plan->ntuples = 1; /* for now */
-}
-
-/*
- * Deduplicate tuple-based freeze plans so that each distinct set of
- * processing steps is only stored once in XLOG_HEAP2_FREEZE_PAGE records.
- * Called during original execution of freezing (for logged relations).
- *
- * Return value is number of plans set in *plans_out for caller. Also writes
- * an array of offset numbers into *offsets_out output argument for caller
- * (actually there is one array per freeze plan, but that's not of immediate
- * concern to our caller).
- */
-static int
-heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
- xl_heap_freeze_plan *plans_out,
- OffsetNumber *offsets_out)
-{
- int nplans = 0;
-
- /* Sort tuple-based freeze plans in the order required to deduplicate */
- qsort(tuples, ntuples, sizeof(HeapTupleFreeze), heap_log_freeze_cmp);
-
- for (int i = 0; i < ntuples; i++)
- {
- HeapTupleFreeze *frz = tuples + i;
-
- if (i == 0)
- {
- /* New canonical freeze plan starting with first tup */
- heap_log_freeze_new_plan(plans_out, frz);
- nplans++;
- }
- else if (heap_log_freeze_eq(plans_out, frz))
- {
- /* tup matches open canonical plan -- include tup in it */
- Assert(offsets_out[i - 1] < frz->offset);
- plans_out->ntuples++;
- }
- else
- {
- /* Tup doesn't match current plan -- done with it now */
- plans_out++;
-
- /* New canonical freeze plan starting with this tup */
- heap_log_freeze_new_plan(plans_out, frz);
- nplans++;
- }
-
- /*
- * Save page offset number in dedicated buffer in passing.
- *
- * REDO routine relies on the record's offset numbers array grouping
- * offset numbers by freeze plan. The sort order within each grouping
- * is ascending offset number order, just to keep things tidy.
- */
- offsets_out[i] = frz->offset;
- }
-
- Assert(nplans > 0 && nplans <= ntuples);
-
- return nplans;
-}
-
/*
* heap_freeze_tuple
* Freeze tuple in place, without WAL logging.
@@ -7892,10 +7726,10 @@ heap_index_delete_tuples(Relation rel, TM_IndexDeleteOp *delstate)
* must have considered the original tuple header as part of
* generating its own snapshotConflictHorizon value.
*
- * Relying on XLOG_HEAP2_PRUNE records like this is the same
- * strategy that index vacuuming uses in all cases. Index VACUUM
- * WAL records don't even have a snapshotConflictHorizon field of
- * their own for this reason.
+ * Relying on XLOG_HEAP2_PRUNE_FREEZE records like this is the
+ * same strategy that index vacuuming uses in all cases. Index
+ * VACUUM WAL records don't even have a snapshotConflictHorizon
+ * field of their own for this reason.
*/
if (!ItemIdIsNormal(lp))
break;
@@ -8753,162 +8587,146 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
}
/*
- * Handles XLOG_HEAP2_PRUNE record type.
- *
- * Acquires a full cleanup lock.
+ * Replay XLOG_HEAP2_PRUNE_FREEZE record.
*/
static void
-heap_xlog_prune(XLogReaderState *record)
+heap_xlog_prune_freeze(XLogReaderState *record)
{
XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_prune *xlrec = (xl_heap_prune *) XLogRecGetData(record);
+ char *ptr;
+ xl_heap_prune *xlrec;
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
XLogRedoAction action;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
+ ptr = XLogRecGetData(record);
+ xlrec = (xl_heap_prune *) ptr;
+ ptr += SizeOfHeapPrune;
/*
- * We're about to remove tuples. In Hot Standby mode, ensure that there's
- * no queries running for which the removed tuples are still visible.
+ * We will take an ordinary exclusive lock or a cleanup lock depending on
+ * whether the XLHP_CLEANUP_LOCK flag is set. With an ordinary exclusive
+ * lock, we better not be doing anything that requires moving existing
+ * tuple data.
*/
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->isCatalogRel,
+ Assert((xlrec->flags & XLHP_CLEANUP_LOCK) != 0 ||
+ (xlrec->flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
+
+ /*
+ * We are about to remove and/or freeze tuples. In Hot Standby mode,
+ * ensure that there are no queries running for which the removed tuples
+ * are still visible or which still consider the frozen xids as running.
+ * The conflict horizon XID comes after xl_heap_prune.
+ */
+ if (InHotStandby && (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
+ {
+ TransactionId snapshot_conflict_horizon;
+
+ memcpy(&snapshot_conflict_horizon, ptr, sizeof(TransactionId));
+ ResolveRecoveryConflictWithSnapshot(snapshot_conflict_horizon,
+ (xlrec->flags & XLHP_IS_CATALOG_REL) != 0,
rlocator);
+ }
/*
- * If we have a full-page image, restore it (using a cleanup lock) and
- * we're done.
+ * If we have a full-page image, restore it and we're done.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, true,
+ action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec->flags & XLHP_CLEANUP_LOCK) != 0,
&buffer);
if (action == BLK_NEEDS_REDO)
{
Page page = (Page) BufferGetPage(buffer);
- OffsetNumber *end;
OffsetNumber *redirected;
OffsetNumber *nowdead;
OffsetNumber *nowunused;
int nredirected;
int ndead;
int nunused;
+ int nplans;
Size datalen;
+ xlhp_freeze_plan *plans;
+ OffsetNumber *frz_offsets;
+ char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
- redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
-
- nredirected = xlrec->nredirected;
- ndead = xlrec->ndead;
- end = (OffsetNumber *) ((char *) redirected + datalen);
- nowdead = redirected + (nredirected * 2);
- nowunused = nowdead + ndead;
- nunused = (end - nowunused);
- Assert(nunused >= 0);
-
- /* Update all line pointers per the record, and repair fragmentation */
- heap_page_prune_execute(buffer,
- redirected, nredirected,
- nowdead, ndead,
- nowunused, nunused);
+ heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec->flags,
+ &nredirected, &redirected,
+ &ndead, &nowdead,
+ &nunused, &nowunused,
+ &nplans, &plans, &frz_offsets);
/*
- * Note: we don't worry about updating the page's prunability hints.
- * At worst this will cause an extra prune cycle to occur soon.
+ * Update all line pointers per the record, and repair fragmentation
+ * if needed.
*/
+ if (nredirected > 0 || ndead > 0 || nunused > 0)
+ heap_page_prune_execute(buffer,
+ (xlrec->flags & XLHP_CLEANUP_LOCK) == 0,
+ redirected, nredirected,
+ nowdead, ndead,
+ nowunused, nunused);
+
+ /* Freeze tuples */
+ for (int p = 0; p < nplans; p++)
+ {
+ HeapTupleFreeze frz;
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
- }
+ /*
+ * Convert freeze plan representation from WAL record into
+ * per-tuple format used by heap_execute_freeze_tuple
+ */
+ frz.xmax = plans[p].xmax;
+ frz.t_infomask2 = plans[p].t_infomask2;
+ frz.t_infomask = plans[p].t_infomask;
+ frz.frzflags = plans[p].frzflags;
+ frz.offset = InvalidOffsetNumber; /* unused, but be tidy */
- if (BufferIsValid(buffer))
- {
- Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+ for (int i = 0; i < plans[p].ntuples; i++)
+ {
+ OffsetNumber offset = *(frz_offsets++);
+ ItemId lp;
+ HeapTupleHeader tuple;
- UnlockReleaseBuffer(buffer);
+ lp = PageGetItemId(page, offset);
+ tuple = (HeapTupleHeader) PageGetItem(page, lp);
+ heap_execute_freeze_tuple(tuple, &frz);
+ }
+ }
+
+ /* There should be no more data */
+ Assert((char *) frz_offsets == dataptr + datalen);
/*
- * After pruning records from a page, it's useful to update the FSM
- * about it, as it may cause the page become target for insertions
- * later even if vacuum decides not to visit it (which is possible if
- * gets marked all-visible.)
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
+ * Note: we don't worry about updating the page's prunability hints.
+ * At worst this will cause an extra prune cycle to occur soon.
*/
- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
- }
-}
-
-/*
- * Handles XLOG_HEAP2_VACUUM record type.
- *
- * Acquires an ordinary exclusive lock only.
- */
-static void
-heap_xlog_vacuum(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_vacuum *xlrec = (xl_heap_vacuum *) XLogRecGetData(record);
- Buffer buffer;
- BlockNumber blkno;
- XLogRedoAction action;
-
- /*
- * If we have a full-page image, restore it (without using a cleanup lock)
- * and we're done.
- */
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL, false,
- &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- Page page = (Page) BufferGetPage(buffer);
- OffsetNumber *nowunused;
- Size datalen;
- OffsetNumber *offnum;
-
- nowunused = (OffsetNumber *) XLogRecGetBlockData(record, 0, &datalen);
-
- /* Shouldn't be a record unless there's something to do */
- Assert(xlrec->nunused > 0);
-
- /* Update all now-unused line pointers */
- offnum = nowunused;
- for (int i = 0; i < xlrec->nunused; i++)
- {
- OffsetNumber off = *offnum++;
- ItemId lp = PageGetItemId(page, off);
-
- Assert(ItemIdIsDead(lp) && !ItemIdHasStorage(lp));
- ItemIdSetUnused(lp);
- }
-
- /* Attempt to truncate line pointer array now */
- PageTruncateLinePointerArray(page);
PageSetLSN(page, lsn);
MarkBufferDirty(buffer);
}
+ /*
+ * If we released any space or line pointers, update the free space map.
+ *
+ * Do this regardless of a full-page image being applied, since the FSM
+ * data is not in the page anyway.
+ */
if (BufferIsValid(buffer))
{
- Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
- RelFileLocator rlocator;
-
- XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
+ if (xlrec->flags & (XLHP_HAS_REDIRECTIONS |
+ XLHP_HAS_DEAD_ITEMS |
+ XLHP_HAS_NOW_UNUSED_ITEMS))
+ {
+ Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
- UnlockReleaseBuffer(buffer);
+ UnlockReleaseBuffer(buffer);
- /*
- * After vacuuming LP_DEAD items from a page, it's useful to update
- * the FSM about it, as it may cause the page become target for
- * insertions later even if vacuum decides not to visit it (which is
- * possible if gets marked all-visible.)
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+ XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+ }
+ else
+ UnlockReleaseBuffer(buffer);
}
}
@@ -9049,74 +8867,6 @@ heap_xlog_visible(XLogReaderState *record)
UnlockReleaseBuffer(vmbuffer);
}
-/*
- * Replay XLOG_HEAP2_FREEZE_PAGE records
- */
-static void
-heap_xlog_freeze_page(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_freeze_page *xlrec = (xl_heap_freeze_page *) XLogRecGetData(record);
- Buffer buffer;
-
- /*
- * In Hot Standby mode, ensure that there's no queries running which still
- * consider the frozen xids as running.
- */
- if (InHotStandby)
- {
- RelFileLocator rlocator;
-
- XLogRecGetBlockTag(record, 0, &rlocator, NULL, NULL);
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->isCatalogRel,
- rlocator);
- }
-
- if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
- {
- Page page = BufferGetPage(buffer);
- xl_heap_freeze_plan *plans;
- OffsetNumber *offsets;
- int curoff = 0;
-
- plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
- offsets = (OffsetNumber *) ((char *) plans +
- (xlrec->nplans *
- sizeof(xl_heap_freeze_plan)));
- for (int p = 0; p < xlrec->nplans; p++)
- {
- HeapTupleFreeze frz;
-
- /*
- * Convert freeze plan representation from WAL record into
- * per-tuple format used by heap_execute_freeze_tuple
- */
- frz.xmax = plans[p].xmax;
- frz.t_infomask2 = plans[p].t_infomask2;
- frz.t_infomask = plans[p].t_infomask;
- frz.frzflags = plans[p].frzflags;
- frz.offset = InvalidOffsetNumber; /* unused, but be tidy */
-
- for (int i = 0; i < plans[p].ntuples; i++)
- {
- OffsetNumber offset = offsets[curoff++];
- ItemId lp;
- HeapTupleHeader tuple;
-
- lp = PageGetItemId(page, offset);
- tuple = (HeapTupleHeader) PageGetItem(page, lp);
- heap_execute_freeze_tuple(tuple, &frz);
- }
- }
-
- PageSetLSN(page, lsn);
- MarkBufferDirty(buffer);
- }
- if (BufferIsValid(buffer))
- UnlockReleaseBuffer(buffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -10017,14 +9767,8 @@ heap2_redo(XLogReaderState *record)
switch (info & XLOG_HEAP_OPMASK)
{
- case XLOG_HEAP2_PRUNE:
- heap_xlog_prune(record);
- break;
- case XLOG_HEAP2_VACUUM:
- heap_xlog_vacuum(record);
- break;
- case XLOG_HEAP2_FREEZE_PAGE:
- heap_xlog_freeze_page(record);
+ case XLOG_HEAP2_PRUNE_FREEZE:
+ heap_xlog_prune_freeze(record);
break;
case XLOG_HEAP2_VISIBLE:
heap_xlog_visible(record);
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 69332b0d25c..4c656a21f91 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -338,7 +338,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* Apply the planned item changes, then repair page fragmentation, and
* update the page's hint bit about whether it has free line pointers.
*/
- heap_page_prune_execute(buffer,
+ heap_page_prune_execute(buffer, false,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
@@ -359,44 +359,17 @@ heap_page_prune(Relation relation, Buffer buffer,
MarkBufferDirty(buffer);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
+ * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
*/
if (RelationNeedsWAL(relation))
{
- xl_heap_prune xlrec;
- XLogRecPtr recptr;
-
- xlrec.isCatalogRel = RelationIsAccessibleInLogicalDecoding(relation);
- xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
- xlrec.nredirected = prstate.nredirected;
- xlrec.ndead = prstate.ndead;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
-
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
-
- /*
- * The OffsetNumber arrays are not actually in the buffer, but we
- * pretend that they are. When XLogInsert stores the whole
- * buffer, the offset arrays need not be stored too.
- */
- if (prstate.nredirected > 0)
- XLogRegisterBufData(0, (char *) prstate.redirected,
- prstate.nredirected *
- sizeof(OffsetNumber) * 2);
-
- if (prstate.ndead > 0)
- XLogRegisterBufData(0, (char *) prstate.nowdead,
- prstate.ndead * sizeof(OffsetNumber));
-
- if (prstate.nunused > 0)
- XLogRegisterBufData(0, (char *) prstate.nowunused,
- prstate.nunused * sizeof(OffsetNumber));
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
-
- PageSetLSN(BufferGetPage(buffer), recptr);
+ log_heap_prune_and_freeze(relation, buffer,
+ prstate.snapshotConflictHorizon,
+ true,
+ NULL, 0,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
}
}
else
@@ -827,11 +800,16 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
/*
* Perform the actual page changes needed by heap_page_prune.
- * It is expected that the caller has a full cleanup lock on the
- * buffer.
+ *
+ * If 'lp_truncate_only' is set, we are merely marking LP_DEAD line pointers
+ * as unused, not redirecting or removing anything else. The
+ * PageRepairFragmentation() call is skipped in that case.
+ *
+ * If 'lp_truncate_only' is not set, the caller must hold a cleanup lock on
+ * the buffer. If it is set, an ordinary exclusive lock suffices.
*/
void
-heap_page_prune_execute(Buffer buffer,
+heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
OffsetNumber *nowunused, int nunused)
@@ -843,6 +821,9 @@ heap_page_prune_execute(Buffer buffer,
/* Shouldn't be called unless there's something to do */
Assert(nredirected > 0 || ndead > 0 || nunused > 0);
+ /* If 'lp_truncate_only', we can only remove already-dead line pointers */
+ Assert(!lp_truncate_only || (nredirected == 0 && ndead == 0));
+
/* Update all redirected line pointers */
offnum = redirected;
for (int i = 0; i < nredirected; i++)
@@ -941,23 +922,29 @@ heap_page_prune_execute(Buffer buffer,
#ifdef USE_ASSERT_CHECKING
- /*
- * When heap_page_prune() was called, mark_unused_now may have been
- * passed as true, which allows would-be LP_DEAD items to be made
- * LP_UNUSED instead. This is only possible if the relation has no
- * indexes. If there are any dead items, then mark_unused_now was not
- * true and every item being marked LP_UNUSED must refer to a
- * heap-only tuple.
- */
- if (ndead > 0)
+ if (lp_truncate_only)
{
- Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
- htup = (HeapTupleHeader) PageGetItem(page, lp);
- Assert(HeapTupleHeaderIsHeapOnly(htup));
+ /* Setting LP_DEAD to LP_UNUSED in vacuum's second pass */
+ Assert(ItemIdIsDead(lp) && !ItemIdHasStorage(lp));
}
else
{
- Assert(ItemIdIsUsed(lp));
+ /*
+ * When heap_page_prune() was called, mark_unused_now may have
+ * been passed as true, which allows would-be LP_DEAD items to be
+ * made LP_UNUSED instead. This is only possible if the relation
+ * has no indexes. If there are any dead items, then
+ * mark_unused_now was not true and every item being marked
+ * LP_UNUSED must refer to a heap-only tuple.
+ */
+ if (ndead > 0)
+ {
+ Assert(ItemIdHasStorage(lp) && ItemIdIsNormal(lp));
+ htup = (HeapTupleHeader) PageGetItem(page, lp);
+ Assert(HeapTupleHeaderIsHeapOnly(htup));
+ }
+ else
+ Assert(ItemIdIsUsed(lp));
}
#endif
@@ -965,17 +952,22 @@ heap_page_prune_execute(Buffer buffer,
ItemIdSetUnused(lp);
}
- /*
- * Finally, repair any fragmentation, and update the page's hint bit about
- * whether it has free pointers.
- */
- PageRepairFragmentation(page);
+ if (lp_truncate_only)
+ PageTruncateLinePointerArray(page);
+ else
+ {
+ /*
+ * Finally, repair any fragmentation, and update the page's hint bit
+ * about whether it has free pointers.
+ */
+ PageRepairFragmentation(page);
- /*
- * Now that the page has been modified, assert that redirect items still
- * point to valid targets.
- */
- page_verify_redirects(page);
+ /*
+ * Now that the page has been modified, assert that redirect items
+ * still point to valid targets.
+ */
+ page_verify_redirects(page);
+ }
}
@@ -1144,3 +1136,271 @@ heap_get_root_tuples(Page page, OffsetNumber *root_offsets)
}
}
}
+
+
+/*
+ * Compare fields that describe actions required to freeze tuple with caller's
+ * open plan. If everything matches then the frz tuple plan is equivalent to
+ * caller's plan.
+ */
+static inline bool
+heap_log_freeze_eq(xlhp_freeze_plan *plan, HeapTupleFreeze *frz)
+{
+ if (plan->xmax == frz->xmax &&
+ plan->t_infomask2 == frz->t_infomask2 &&
+ plan->t_infomask == frz->t_infomask &&
+ plan->frzflags == frz->frzflags)
+ return true;
+
+ /* Caller must call heap_log_freeze_new_plan again for frz */
+ return false;
+}
+
+/*
+ * Comparator used to deduplicate XLOG_HEAP2_FREEZE_PAGE freeze plans
+ */
+static int
+heap_log_freeze_cmp(const void *arg1, const void *arg2)
+{
+ HeapTupleFreeze *frz1 = (HeapTupleFreeze *) arg1;
+ HeapTupleFreeze *frz2 = (HeapTupleFreeze *) arg2;
+
+ if (frz1->xmax < frz2->xmax)
+ return -1;
+ else if (frz1->xmax > frz2->xmax)
+ return 1;
+
+ if (frz1->t_infomask2 < frz2->t_infomask2)
+ return -1;
+ else if (frz1->t_infomask2 > frz2->t_infomask2)
+ return 1;
+
+ if (frz1->t_infomask < frz2->t_infomask)
+ return -1;
+ else if (frz1->t_infomask > frz2->t_infomask)
+ return 1;
+
+ if (frz1->frzflags < frz2->frzflags)
+ return -1;
+ else if (frz1->frzflags > frz2->frzflags)
+ return 1;
+
+ /*
+ * heap_log_freeze_eq would consider these tuple-wise plans to be equal.
+ * (So the tuples will share a single canonical freeze plan.)
+ *
+ * We tiebreak on page offset number to keep each freeze plan's page
+ * offset number array individually sorted. (Unnecessary, but be tidy.)
+ */
+ if (frz1->offset < frz2->offset)
+ return -1;
+ else if (frz1->offset > frz2->offset)
+ return 1;
+
+ Assert(false);
+ return 0;
+}
+
+/*
+ * Start new plan initialized using tuple-level actions. At least one tuple
+ * will have steps required to freeze described by caller's plan during REDO.
+ */
+static inline void
+heap_log_freeze_new_plan(xlhp_freeze_plan *plan, HeapTupleFreeze *frz)
+{
+ plan->xmax = frz->xmax;
+ plan->t_infomask2 = frz->t_infomask2;
+ plan->t_infomask = frz->t_infomask;
+ plan->frzflags = frz->frzflags;
+ plan->ntuples = 1; /* for now */
+}
+
+/*
+ * Deduplicate tuple-based freeze plans so that each distinct set of
+ * processing steps is only stored once in XLOG_HEAP2_FREEZE_PAGE records.
+ * Called during original execution of freezing (for logged relations).
+ *
+ * Return value is number of plans set in *plans_out for caller. Also writes
+ * an array of offset numbers into *offsets_out output argument for caller
+ * (actually there is one array per freeze plan, but that's not of immediate
+ * concern to our caller).
+ */
+static int
+heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
+ xlhp_freeze_plan *plans_out,
+ OffsetNumber *offsets_out)
+{
+ int nplans = 0;
+
+ /* Sort tuple-based freeze plans in the order required to deduplicate */
+ qsort(tuples, ntuples, sizeof(HeapTupleFreeze), heap_log_freeze_cmp);
+
+ for (int i = 0; i < ntuples; i++)
+ {
+ HeapTupleFreeze *frz = tuples + i;
+
+ if (i == 0)
+ {
+ /* New canonical freeze plan starting with first tup */
+ heap_log_freeze_new_plan(plans_out, frz);
+ nplans++;
+ }
+ else if (heap_log_freeze_eq(plans_out, frz))
+ {
+ /* tup matches open canonical plan -- include tup in it */
+ Assert(offsets_out[i - 1] < frz->offset);
+ plans_out->ntuples++;
+ }
+ else
+ {
+ /* Tup doesn't match current plan -- done with it now */
+ plans_out++;
+
+ /* New canonical freeze plan starting with this tup */
+ heap_log_freeze_new_plan(plans_out, frz);
+ nplans++;
+ }
+
+ /*
+ * Save page offset number in dedicated buffer in passing.
+ *
+ * REDO routine relies on the record's offset numbers array grouping
+ * offset numbers by freeze plan. The sort order within each grouping
+ * is ascending offset number order, just to keep things tidy.
+ */
+ offsets_out[i] = frz->offset;
+ }
+
+ Assert(nplans > 0 && nplans <= ntuples);
+
+ return nplans;
+}
+
+/*
+ * Write an XLOG_HEAP2_PRUNE_FREEZE WAL record
+ *
+ * This is used for several different page maintenance operations:
+ *
+ * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * redirected, some marked dead, and some removed altogether.
+ *
+ * - Freezing: Items are marked as 'frozen'.
+ *
+ * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ *
+ * They have enough commonalities that we use a single WAL record for them
+ * all.
+ *
+ * If replaying the record requires a cleanup lock, pass cleanup_lock = true.
+ * Replaying 'redirected' or 'dead' items always requires a cleanup lock, but
+ * replaying 'unused' items depends on whether they were all previously marked
+ * as dead.
+ *
+ * Note: This function scribbles on the 'frozen' array.
+ *
+ * Note: This is called in a critical section, so careful what you do here.
+ */
+void
+log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ TransactionId conflict_xid,
+ bool cleanup_lock,
+ HeapTupleFreeze *frozen, int nfrozen,
+ OffsetNumber *redirected, int nredirected,
+ OffsetNumber *dead, int ndead,
+ OffsetNumber *unused, int nunused)
+{
+ xl_heap_prune xlrec;
+ XLogRecPtr recptr;
+
+ /* The following local variables hold data registered in the WAL record: */
+ xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
+ xlhp_freeze_plans freeze_plans;
+ xlhp_prune_items redirect_items;
+ xlhp_prune_items dead_items;
+ xlhp_prune_items unused_items;
+ OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+
+ xlrec.flags = 0;
+
+ /*
+ * Prepare data for the buffer. The arrays are not actually in the
+ * buffer, but we pretend that they are. When XLogInsert stores a full
+ * page image, the arrays can be omitted.
+ */
+ XLogBeginInsert();
+ XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ if (nfrozen > 0)
+ {
+ int nplans;
+
+ xlrec.flags |= XLHP_HAS_FREEZE_PLANS;
+
+ /*
+ * Prepare deduplicated representation for use in the WAL record. This
+ * destructively sorts frozen tuples array in-place.
+ */
+ nplans = heap_log_freeze_plan(frozen, nfrozen, plans, frz_offsets);
+
+ freeze_plans.nplans = nplans;
+ XLogRegisterBufData(0, (char *) &freeze_plans,
+ offsetof(xlhp_freeze_plans, plans));
+ XLogRegisterBufData(0, (char *) plans,
+ sizeof(xlhp_freeze_plan) * nplans);
+ }
+ if (nredirected > 0)
+ {
+ xlrec.flags |= XLHP_HAS_REDIRECTIONS;
+
+ redirect_items.ntargets = nredirected;
+ XLogRegisterBufData(0, (char *) &redirect_items,
+ offsetof(xlhp_prune_items, data));
+ XLogRegisterBufData(0, (char *) redirected,
+ sizeof(OffsetNumber[2]) * nredirected);
+ }
+ if (ndead > 0)
+ {
+ xlrec.flags |= XLHP_HAS_DEAD_ITEMS;
+
+ dead_items.ntargets = ndead;
+ XLogRegisterBufData(0, (char *) &dead_items,
+ offsetof(xlhp_prune_items, data));
+ XLogRegisterBufData(0, (char *) dead,
+ sizeof(OffsetNumber) * ndead);
+ }
+ if (nunused > 0)
+ {
+ xlrec.flags |= XLHP_HAS_NOW_UNUSED_ITEMS;
+
+ unused_items.ntargets = nunused;
+ XLogRegisterBufData(0, (char *) &unused_items,
+ offsetof(xlhp_prune_items, data));
+ XLogRegisterBufData(0, (char *) unused,
+ sizeof(OffsetNumber) * nunused);
+ }
+ if (nfrozen > 0)
+ XLogRegisterBufData(0, (char *) frz_offsets,
+ sizeof(OffsetNumber) * nfrozen);
+
+ /*
+ * Prepare the main xl_heap_prune record. We already set the XLPH_HAS_*
+ * flag above.
+ */
+ if (RelationIsAccessibleInLogicalDecoding(relation))
+ xlrec.flags |= XLHP_IS_CATALOG_REL;
+ if (TransactionIdIsValid(conflict_xid))
+ xlrec.flags |= XLHP_HAS_CONFLICT_HORIZON;
+ if (cleanup_lock)
+ xlrec.flags |= XLHP_CLEANUP_LOCK;
+ else
+ {
+ Assert(nredirected == 0 && ndead == 0);
+ /* also, any items in 'unused' must've been LP_DEAD previously */
+ }
+ XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
+ if (TransactionIdIsValid(conflict_xid))
+ XLogRegisterData((char *) &conflict_xid, sizeof(TransactionId));
+
+ recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE_FREEZE);
+
+ PageSetLSN(BufferGetPage(buffer), recptr);
+}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 18004907750..5e656776c96 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2546,20 +2546,13 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* XLOG stuff */
if (RelationNeedsWAL(vacrel->rel))
{
- xl_heap_vacuum xlrec;
- XLogRecPtr recptr;
-
- xlrec.nunused = nunused;
-
- XLogBeginInsert();
- XLogRegisterData((char *) &xlrec, SizeOfHeapVacuum);
-
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- XLogRegisterBufData(0, (char *) unused, nunused * sizeof(OffsetNumber));
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VACUUM);
-
- PageSetLSN(page, recptr);
+ log_heap_prune_and_freeze(vacrel->rel, buffer,
+ InvalidTransactionId,
+ false, /* no cleanup lock required */
+ NULL, 0, /* frozen */
+ NULL, 0, /* redirected */
+ NULL, 0, /* dead */
+ unused, nunused);
}
/*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 36a3d83c8c2..d5ddcb14686 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -74,7 +74,7 @@ truncate_flags_desc(StringInfo buf, uint8 flags)
static void
plan_elem_desc(StringInfo buf, void *plan, void *data)
{
- xl_heap_freeze_plan *new_plan = (xl_heap_freeze_plan *) plan;
+ xlhp_freeze_plan *new_plan = (xlhp_freeze_plan *) plan;
OffsetNumber **offsets = data;
appendStringInfo(buf, "{ xmax: %u, infomask: %u, infomask2: %u, ntuples: %u",
@@ -91,6 +91,94 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
appendStringInfoString(buf, " }");
}
+
+/*
+ * Given a MAXALIGNed buffer returned by XLogRecGetBlockData() and pointed to
+ * by cursor and any xl_heap_prune flags, deserialize the arrays of
+ * OffsetNumbers contained in an XLOG_HEAP2_PRUNE_FREEZE record.
+ *
+ * This is in heapdesc.c so it can be shared between heap2_redo and heap2_desc
+ * code, the latter of which is used in frontend (pg_waldump) code.
+ */
+void
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+ int *nredirected, OffsetNumber **redirected,
+ int *ndead, OffsetNumber **nowdead,
+ int *nunused, OffsetNumber **nowunused,
+ int *nplans, xlhp_freeze_plan **plans,
+ OffsetNumber **frz_offsets)
+{
+ if (flags & XLHP_HAS_FREEZE_PLANS)
+ {
+ xlhp_freeze_plans *freeze_plans = (xlhp_freeze_plans *) cursor;
+
+ *nplans = freeze_plans->nplans;
+ Assert(*nplans > 0);
+ *plans = freeze_plans->plans;
+
+ cursor += offsetof(xlhp_freeze_plans, plans);
+ cursor += sizeof(xlhp_freeze_plan) * *nplans;
+ }
+ else
+ {
+ *nplans = 0;
+ *plans = NULL;
+ }
+
+ if (flags & XLHP_HAS_REDIRECTIONS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ *nredirected = subrecord->ntargets;
+ Assert(*nredirected > 0);
+ *redirected = &subrecord->data[0];
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber[2]) * *nredirected;
+ }
+ else
+ {
+ *nredirected = 0;
+ *redirected = NULL;
+ }
+
+ if (flags & XLHP_HAS_DEAD_ITEMS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ *ndead = subrecord->ntargets;
+ Assert(*ndead > 0);
+ *nowdead = subrecord->data;
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber) * *ndead;
+ }
+ else
+ {
+ *ndead = 0;
+ *nowdead = NULL;
+ }
+
+ if (flags & XLHP_HAS_NOW_UNUSED_ITEMS)
+ {
+ xlhp_prune_items *subrecord = (xlhp_prune_items *) cursor;
+
+ *nunused = subrecord->ntargets;
+ Assert(*nunused > 0);
+ *nowunused = subrecord->data;
+
+ cursor += offsetof(xlhp_prune_items, data);
+ cursor += sizeof(OffsetNumber) * *nunused;
+ }
+ else
+ {
+ *nunused = 0;
+ *nowunused = NULL;
+ }
+
+ *frz_offsets = (OffsetNumber *) cursor;
+}
+
void
heap_desc(StringInfo buf, XLogReaderState *record)
{
@@ -175,86 +263,74 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
info &= XLOG_HEAP_OPMASK;
- if (info == XLOG_HEAP2_PRUNE)
+ if (info == XLOG_HEAP2_PRUNE_FREEZE)
{
xl_heap_prune *xlrec = (xl_heap_prune *) rec;
- appendStringInfo(buf, "snapshotConflictHorizon: %u, nredirected: %u, ndead: %u, isCatalogRel: %c",
- xlrec->snapshotConflictHorizon,
- xlrec->nredirected,
- xlrec->ndead,
- xlrec->isCatalogRel ? 'T' : 'F');
-
- if (XLogRecHasBlockData(record, 0))
+ if (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON)
{
- OffsetNumber *end;
- OffsetNumber *redirected;
- OffsetNumber *nowdead;
- OffsetNumber *nowunused;
- int nredirected;
- int nunused;
- Size datalen;
-
- redirected = (OffsetNumber *) XLogRecGetBlockData(record, 0,
- &datalen);
-
- nredirected = xlrec->nredirected;
- end = (OffsetNumber *) ((char *) redirected + datalen);
- nowdead = redirected + (nredirected * 2);
- nowunused = nowdead + xlrec->ndead;
- nunused = (end - nowunused);
- Assert(nunused >= 0);
-
- appendStringInfo(buf, ", nunused: %d", nunused);
-
- appendStringInfoString(buf, ", redirected:");
- array_desc(buf, redirected, sizeof(OffsetNumber) * 2,
- nredirected, &redirect_elem_desc, NULL);
- appendStringInfoString(buf, ", dead:");
- array_desc(buf, nowdead, sizeof(OffsetNumber), xlrec->ndead,
- &offset_elem_desc, NULL);
- appendStringInfoString(buf, ", unused:");
- array_desc(buf, nowunused, sizeof(OffsetNumber), nunused,
- &offset_elem_desc, NULL);
- }
- }
- else if (info == XLOG_HEAP2_VACUUM)
- {
- xl_heap_vacuum *xlrec = (xl_heap_vacuum *) rec;
+ TransactionId conflict_xid;
- appendStringInfo(buf, "nunused: %u", xlrec->nunused);
+ memcpy(&conflict_xid, rec + SizeOfHeapPrune, sizeof(TransactionId));
- if (XLogRecHasBlockData(record, 0))
- {
- OffsetNumber *nowunused;
-
- nowunused = (OffsetNumber *) XLogRecGetBlockData(record, 0, NULL);
-
- appendStringInfoString(buf, ", unused:");
- array_desc(buf, nowunused, sizeof(OffsetNumber), xlrec->nunused,
- &offset_elem_desc, NULL);
+ appendStringInfo(buf, "snapshotConflictHorizon: %u",
+ conflict_xid);
}
- }
- else if (info == XLOG_HEAP2_FREEZE_PAGE)
- {
- xl_heap_freeze_page *xlrec = (xl_heap_freeze_page *) rec;
- appendStringInfo(buf, "snapshotConflictHorizon: %u, nplans: %u, isCatalogRel: %c",
- xlrec->snapshotConflictHorizon, xlrec->nplans,
- xlrec->isCatalogRel ? 'T' : 'F');
+ appendStringInfo(buf, ", isCatalogRel: %c",
+ xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
if (XLogRecHasBlockData(record, 0))
{
- xl_heap_freeze_plan *plans;
- OffsetNumber *offsets;
-
- plans = (xl_heap_freeze_plan *) XLogRecGetBlockData(record, 0, NULL);
- offsets = (OffsetNumber *) ((char *) plans +
- (xlrec->nplans *
- sizeof(xl_heap_freeze_plan)));
- appendStringInfoString(buf, ", plans:");
- array_desc(buf, plans, sizeof(xl_heap_freeze_plan), xlrec->nplans,
- &plan_elem_desc, &offsets);
+ Size datalen;
+ OffsetNumber *redirected;
+ OffsetNumber *nowdead;
+ OffsetNumber *nowunused;
+ int nredirected;
+ int nunused;
+ int ndead;
+ int nplans;
+ xlhp_freeze_plan *plans;
+ OffsetNumber *frz_offsets;
+
+ char *cursor = XLogRecGetBlockData(record, 0, &datalen);
+
+ heap_xlog_deserialize_prune_and_freeze(cursor, xlrec->flags,
+ &nredirected, &redirected,
+ &ndead, &nowdead,
+ &nunused, &nowunused,
+ &nplans, &plans, &frz_offsets);
+
+ appendStringInfo(buf, ", nredirected: %u, ndead: %u, nunused: %u, nplans: %u,",
+ nredirected, ndead, nunused, nplans);
+
+ if (nredirected > 0)
+ {
+ appendStringInfoString(buf, ", redirected:");
+ array_desc(buf, redirected, sizeof(OffsetNumber) * 2,
+ nredirected, &redirect_elem_desc, NULL);
+ }
+
+ if (ndead > 0)
+ {
+ appendStringInfoString(buf, ", dead:");
+ array_desc(buf, nowdead, sizeof(OffsetNumber), ndead,
+ &offset_elem_desc, NULL);
+ }
+
+ if (nunused > 0)
+ {
+ appendStringInfoString(buf, ", unused:");
+ array_desc(buf, nowunused, sizeof(OffsetNumber), nunused,
+ &offset_elem_desc, NULL);
+ }
+
+ if (nplans > 0)
+ {
+ appendStringInfoString(buf, ", plans:");
+ array_desc(buf, plans, sizeof(xlhp_freeze_plan), nplans,
+ &plan_elem_desc, &frz_offsets);
+ }
}
}
else if (info == XLOG_HEAP2_VISIBLE)
@@ -355,14 +431,8 @@ heap2_identify(uint8 info)
switch (info & ~XLR_INFO_MASK)
{
- case XLOG_HEAP2_PRUNE:
- id = "PRUNE";
- break;
- case XLOG_HEAP2_VACUUM:
- id = "VACUUM";
- break;
- case XLOG_HEAP2_FREEZE_PAGE:
- id = "FREEZE_PAGE";
+ case XLOG_HEAP2_PRUNE_FREEZE:
+ id = "PRUNE_FREEZE";
break;
case XLOG_HEAP2_VISIBLE:
id = "VISIBLE";
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index e5ab7b78b78..8c909514381 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -445,9 +445,7 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
* Everything else here is just low level physical stuff we're not
* interested in.
*/
- case XLOG_HEAP2_FREEZE_PAGE:
- case XLOG_HEAP2_PRUNE:
- case XLOG_HEAP2_VACUUM:
+ case XLOG_HEAP2_PRUNE_FREEZE:
case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f68593..ca6ddab91ea 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -323,11 +323,18 @@ extern void heap_page_prune(Relation relation, Buffer buffer,
bool mark_unused_now,
PruneResult *presult,
OffsetNumber *off_loc);
-extern void heap_page_prune_execute(Buffer buffer,
+extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
+extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ TransactionId conflict_xid,
+ bool lp_truncate_only,
+ HeapTupleFreeze *frozen, int nfrozen,
+ OffsetNumber *redirected, int nredirected,
+ OffsetNumber *dead, int ndead,
+ OffsetNumber *unused, int nunused);
/* in heap/vacuumlazy.c */
struct VacuumParams;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 6488dad5e64..0dd6af57e07 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -51,9 +51,9 @@
* these, too.
*/
#define XLOG_HEAP2_REWRITE 0x00
-#define XLOG_HEAP2_PRUNE 0x10
-#define XLOG_HEAP2_VACUUM 0x20
-#define XLOG_HEAP2_FREEZE_PAGE 0x30
+#define XLOG_HEAP2_PRUNE_FREEZE 0x10
+/* 0x20 is free, was XLOG_HEAP2_VACUUM */
+/* 0x30 is free, was XLOG_HEAP2_FREEZE_PAGE */
#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
@@ -227,44 +227,153 @@ typedef struct xl_heap_update
#define SizeOfHeapUpdate (offsetof(xl_heap_update, new_offnum) + sizeof(OffsetNumber))
/*
- * This is what we need to know about page pruning (both during VACUUM and
- * during opportunistic pruning)
+ * These structures and flags encode VACUUM pruning and freezing and on-access
+ * pruning page modifications.
*
- * The array of OffsetNumbers following the fixed part of the record contains:
- * * for each redirected item: the item offset, then the offset redirected to
- * * for each now-dead item: the item offset
- * * for each now-unused item: the item offset
- * The total number of OffsetNumbers is therefore 2*nredirected+ndead+nunused.
- * Note that nunused is not explicitly stored, but may be found by reference
- * to the total record length.
+ * xl_heap_prune is the main record. The XLHP_HAS_* flags indicate which
+ * "sub-records" are included and the other XLHP_* flags provide additional
+ * information about the conditions for replay.
*
- * Acquires a full cleanup lock.
+ * The data for block reference 0 contains "sub-records" depending on which of
+ * the XLHP_HAS_* flags are set. See xlhp_* struct definitions below. The
+ * sub-records appear in the same order as the XLHP_* flags. An example
+ * record with every sub-record included:
+ *
+ *-----------------------------------------------------------------------------
+ * Main data section:
+ *
+ * xl_heap_prune
+ * uint8 flags
+ * TransactionId snapshot_conflict_horizon
+ *
+ * Block 0 data section:
+ *
+ * xlhp_freeze_plans
+ * uint16 nplans
+ * [2 bytes of padding]
+ * xlhp_freeze_plan plans[nplans]
+ *
+ * xlhp_prune_items
+ * uint16 nredirected
+ * OffsetNumber redirected[2 * nredirected]
+ *
+ * xlhp_prune_items
+ * uint16 ndead
+ * OffsetNumber nowdead[ndead]
+ *
+ * xlhp_prune_items
+ * uint16 nunused
+ * OffsetNumber nowunused[nunused]
+ *
+ * OffsetNumber frz_offsets[sum([plan.ntuples for plan in plans])]
+ *-----------------------------------------------------------------------------
+ *
+ * NOTE: because the record data is assembled from many optional parts, we
+ * have to pay close attention to alignment. In the main data section,
+ * 'snapshot_conflict_horizon' is stored unaligned after 'flags', to save
+ * space. In the block 0 data section, the freeze plans appear first, because
+ * they contain TransactionId fields that require 4-byte alignment. All the
+ * other fields require only 2-byte alignment. This is also the reason that
+ * 'frz_offsets' is stored separately from the xlhp_freeze_plan structs.
*/
typedef struct xl_heap_prune
{
- TransactionId snapshotConflictHorizon;
- uint16 nredirected;
- uint16 ndead;
- bool isCatalogRel; /* to handle recovery conflict during logical
- * decoding on standby */
- /* OFFSET NUMBERS are in the block reference 0 */
+ uint8 flags;
+
+ /*
+ * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horzion XID follows,
+ * unaligned
+ */
} xl_heap_prune;
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, isCatalogRel) + sizeof(bool))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+
+/* to handle recovery conflict during logical decoding on standby */
+#define XLHP_IS_CATALOG_REL (1 << 1)
+
+/*
+ * Does replaying the record require a cleanup-lock?
+ *
+ * Pruning, in VACUUM's first pass or when otherwise accessing a page,
+ * requires a cleanup lock. For freezing, and VACUUM's second pass which
+ * marks LP_DEAD line pointers as unused without moving any tuple data, an
+ * ordinary exclusive lock is sufficient.
+ */
+#define XLHP_CLEANUP_LOCK (1 << 2)
+
+/*
+ * If we remove or freeze any entries that contain xids, we need to include a
+ * snapshot conflict horizon. It's used in Hot Standby mode to ensure that
+ * there are no queries running for which the removed tuples are still
+ * visible, or which still consider the frozen XIDs as running.
+ */
+#define XLHP_HAS_CONFLICT_HORIZON (1 << 3)
+
+/*
+ * Indicates that an xlhp_freeze_plans sub-record and one or more
+ * xlhp_freeze_plan sub-records are present.
+ */
+#define XLHP_HAS_FREEZE_PLANS (1 << 4)
+
+/*
+ * XLHP_HAS_REDIRECTIONS, XLHP_HAS_DEAD_ITEMS, and XLHP_HAS_NOW_UNUSED
+ * indicate that xlhp_prune_items sub-records with redirected, dead, and
+ * unused item offsets are present.
+ */
+#define XLHP_HAS_REDIRECTIONS (1 << 5)
+#define XLHP_HAS_DEAD_ITEMS (1 << 6)
+#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 7)
+
+/*
+ * xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
+ * (appears in xl_heap_prune's xlhp_freeze_plans sub-record)
+ */
+/* 0x01 was XLH_FREEZE_XMIN */
+#define XLH_FREEZE_XVAC 0x02
+#define XLH_INVALID_XVAC 0x04
+
+typedef struct xlhp_freeze_plan
+{
+ TransactionId xmax;
+ uint16 t_infomask2;
+ uint16 t_infomask;
+ uint8 frzflags;
+
+ /* Length of individual page offset numbers array for this plan */
+ uint16 ntuples;
+} xlhp_freeze_plan;
/*
- * The vacuum page record is similar to the prune record, but can only mark
- * already LP_DEAD items LP_UNUSED (during VACUUM's second heap pass)
+ * This is what we need to know about a block being frozen during vacuum
*
- * Acquires an ordinary exclusive lock only.
+ * The backup block's data contains an array of xlhp_freeze_plan structs (with
+ * nplans elements). The individual item offsets are located in an array at
+ * the end of the entire record with with nplans * (each plan's ntuples)
+ * members. Those offsets are in the same order as the plans. The REDO
+ * routine uses the offsets to freeze the corresponding heap tuples.
+ *
+ * (As of PostgreSQL 17, XLOG_HEAP2_PRUNE_FREEZE records replace the separate
+ * XLOG_HEAP2_FREEZE_PAGE records.)
+ */
+typedef struct xlhp_freeze_plans
+{
+ uint16 nplans;
+ xlhp_freeze_plan plans[FLEXIBLE_ARRAY_MEMBER];
+} xlhp_freeze_plans;
+
+/*
+ * Generic sub-record type contained in block reference 0 of an xl_heap_prune
+ * record and used for redirect, dead, and unused items if any of
+ * XLHP_HAS_REDIRECTIONS/XLHP_HAS_DEAD_ITEMS/XLHP_HAS_NOW_UNUSED_ITEMS are
+ * set. Note that in the XLHP_HAS_REDIRECTIONS variant, there are actually 2
+ * * length number of OffsetNumbers in the data.
*/
-typedef struct xl_heap_vacuum
+typedef struct xlhp_prune_items
{
- uint16 nunused;
- /* OFFSET NUMBERS are in the block reference 0 */
-} xl_heap_vacuum;
+ uint16 ntargets;
+ OffsetNumber data[FLEXIBLE_ARRAY_MEMBER];
+} xlhp_prune_items;
-#define SizeOfHeapVacuum (offsetof(xl_heap_vacuum, nunused) + sizeof(uint16))
/* flags for infobits_set */
#define XLHL_XMAX_IS_MULTI 0x01
@@ -315,47 +424,6 @@ typedef struct xl_heap_inplace
#define SizeOfHeapInplace (offsetof(xl_heap_inplace, offnum) + sizeof(OffsetNumber))
-/*
- * This struct represents a 'freeze plan', which describes how to freeze a
- * group of one or more heap tuples (appears in xl_heap_freeze_page record)
- */
-/* 0x01 was XLH_FREEZE_XMIN */
-#define XLH_FREEZE_XVAC 0x02
-#define XLH_INVALID_XVAC 0x04
-
-typedef struct xl_heap_freeze_plan
-{
- TransactionId xmax;
- uint16 t_infomask2;
- uint16 t_infomask;
- uint8 frzflags;
-
- /* Length of individual page offset numbers array for this plan */
- uint16 ntuples;
-} xl_heap_freeze_plan;
-
-/*
- * This is what we need to know about a block being frozen during vacuum
- *
- * Backup block 0's data contains an array of xl_heap_freeze_plan structs
- * (with nplans elements), followed by one or more page offset number arrays.
- * Each such page offset number array corresponds to a single freeze plan
- * (REDO routine freezes corresponding heap tuples using freeze plan).
- */
-typedef struct xl_heap_freeze_page
-{
- TransactionId snapshotConflictHorizon;
- uint16 nplans;
- bool isCatalogRel; /* to handle recovery conflict during logical
- * decoding on standby */
-
- /*
- * In payload of blk 0 : FREEZE PLANS and OFFSET NUMBER ARRAY
- */
-} xl_heap_freeze_page;
-
-#define SizeOfHeapFreezePage (offsetof(xl_heap_freeze_page, isCatalogRel) + sizeof(bool))
-
/*
* This is what we need to know about setting a visibility map bit
*
@@ -418,4 +486,12 @@ extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
TransactionId snapshotConflictHorizon,
uint8 vmflags);
+/* in heapdesc.c, so it can be shared between frontend/backend code */
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+ int *nredirected, OffsetNumber **redirected,
+ int *ndead, OffsetNumber **nowdead,
+ int *nunused, OffsetNumber **nowunused,
+ int *nplans, xlhp_freeze_plan **plans,
+ OffsetNumber **frz_offsets);
+
#endif /* HEAPAM_XLOG_H */
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index b88b24f0c1e..fd720d87dbb 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -31,7 +31,8 @@
/*
* Each page of XLOG file has a header like this:
*/
-#define XLOG_PAGE_MAGIC 0xD114 /* can be used as WAL version indicator */
+/* FIXME: make sure this is still larger than on 'master' before committing! */
+#define XLOG_PAGE_MAGIC 0xD115 /* can be used as WAL version indicator */
typedef struct XLogPageHeaderData
{
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e2a0525dd4a..26fc5f250c8 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3968,8 +3968,6 @@ xl_hash_update_meta_page
xl_hash_vacuum_one_page
xl_heap_confirm
xl_heap_delete
-xl_heap_freeze_page
-xl_heap_freeze_plan
xl_heap_header
xl_heap_inplace
xl_heap_insert
@@ -3981,7 +3979,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_vacuum
xl_heap_visible
xl_invalid_page
xl_invalid_page_key
@@ -4021,6 +4018,9 @@ xl_xact_stats_items
xl_xact_subxacts
xl_xact_twophase
xl_xact_xinfo
+xlhp_freeze_plan
+xlhp_freeze_plans
+xlhp_prune_items
xmlBuffer
xmlBufferPtr
xmlChar
--
2.39.2
On Thu, Mar 21, 2024 at 9:28 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
In heap_page_prune_and_freeze(), we now do some extra work on each live
tuple, to set the all_visible_except_removable correctly. And also to
update live_tuples, recently_dead_tuples and hastup. When we're not
freezing, that's a waste of cycles, the caller doesn't care. I hope it's
enough that it doesn't matter, but is it?
Last year on an early version of the patch set I did some pgbench
tpcb-like benchmarks -- since there is a lot of on-access pruning in
that workload -- and I don't remember it being a showstopper. The code
has changed a fair bit since then. However, I think it might be safer
to pass a flag "by_vacuum" to heap_page_prune_and_freeze() and skip
the rest of the loop after heap_prune_satisifies_vacuum() when
on-access pruning invokes it. I had avoided that because it felt ugly
and error-prone, however it addresses a few other of your points as
well.
For example, I think we should set a bit in the prune/freeze WAL
record's flags to indicate if pruning was done by vacuum or on access
(mentioned in another of your recent emails).
The first commit (after the WAL format changes) changes the all-visible
check to use GlobalVisTestIsRemovableXid. The commit message says that
it's because we don't have 'cutoffs' available, but we only care about
that when we're freezing, and when we're freezing, we actually do have
'cutoffs' in HeapPageFreeze. Using GlobalVisTestIsRemovableXid seems
sensible anyway, because that's what we use in
heap_prune_satisfies_vacuum() too, but just wanted to point that out.
Yes, this is true. If we skip this part of the loop when on-access
pruning invokes it, we can go back to using the OldestXmin. I have
done that as well as some other changes in the attached patch,
conflict_horizon_updates.diff. Note that this patch may not apply on
your latest patch as I wrote it on top of an older version. Switching
back to using OldestXmin for page visibility determination makes this
patch set more similar to master as well. We could keep the
alternative check (with GlobalVisState) to maintain the illusion that
callers passing by_vacuum as True can pass NULL for pagefrz, but I was
worried about the extra overhead.
It would be nice to pick a single reasonable visibility horizon (the
oldest running xid we compare things to) at the beginning of
heap_page_prune_and_freeze() and use it for determining if we can
remove tuples, if we can freeze tuples, and if the page is all
visible. It makes it confusing that we use OldestXmin for freezing and
setting the page visibility in the VM and GlobalVisState for
determining prunability. I think using GlobalVisState for freezing has
been discussed before and ruled out for a variety of reasons, and I
definitely don't suggest making that change in this patch set.
The 'frz_conflict_horizon' stuff is still fuzzy to me. (Not necessarily
these patches's fault). This at least is wrong, because Max(a, b)
doesn't handle XID wraparound correctly:if (do_freeze)
conflict_xid = Max(prstate.snapshotConflictHorizon,
presult->frz_conflict_horizon);
else
conflict_xid = prstate.snapshotConflictHorizon;Then there's this in lazy_scan_prune():
/* Using same cutoff when setting VM is now unnecessary */
if (presult.all_frozen)
presult.frz_conflict_horizon = InvalidTransactionId;This does the right thing in the end, but if all the tuples are frozen
shouldn't frz_conflict_horizon already be InvalidTransactionId? The
comment says it's "newest xmin on the page", and if everything was
frozen, all xmins are FrozenTransactionId. In other words, that should
be moved to heap_page_prune_and_freeze() so that it doesn't lie to its
caller. Also, frz_conflict_horizon is only set correctly if
'all_frozen==true', would be good to mention that in the comments too.
Yes, this is a good point. I've spent some time swapping all of this
back into my head. I think we should change the names of all these
conflict horizon variables and introduce some local variables again.
In the attached patch, I've updated the name of the variable in
PruneFreezeResult to vm_conflict_horizon, as it is only used for
emitting a VM update record. Now, I don't set it until the end of
heap_page_prune_and_freeze(). It is only updated from
InvalidTransactionId if the page is not all frozen. As you say, if the
page is all frozen, there can be no conflict.
I've also changed PruneState->snapshotConflictHorizon to
PruneState->latest_xid_removed.
And I introduced the local variables visibility_cutoff_xid and
frz_conflict_horizon. I think it is important we distinguish between
the latest xid pruned, the latest xmin of tuples frozen, and the
latest xid of all live tuples on the page.
Though we end up using visibility_cutoff_xid as the freeze conflict
horizon if the page is all frozen, I think it is more clear to have
the three variables and name them what they are. Then, we calculate
the correct one for the combined WAL record before emitting it. I've
done that in the attached. I've tried to reduce the scope of the
variables as much as possible to keep it as clear as I could.
I think I've also fixed the issue with using Max() to compare
TransactionIds by using TransactionIdFollows() instead.
Note that I still don't think we have a resolution on what to
correctly update new_relfrozenxid and new_relminmxid to at the end
when presult->nfrozen == 0 and presult->all_frozen is true.
if (presult->nfrozen > 0)
{
presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
}
else
{
presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
}
- Melanie
Attachments:
conflict_horizon_updates.difftext/x-patch; charset=US-ASCII; name=conflict_horizon_updates.diffDownload
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2b5f8ef1e80..06ee8565b80 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -38,7 +38,7 @@ typedef struct
bool mark_unused_now;
TransactionId new_prune_xid; /* new prune hint value for page */
- TransactionId snapshotConflictHorizon; /* latest xid removed */
+ TransactionId latest_xid_removed;
int nredirected; /* numbers of entries in arrays below */
int ndead;
int nunused;
@@ -176,7 +176,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, false, NULL,
+ heap_page_prune_and_freeze(relation, buffer, false, vistest, false, NULL,
&presult, NULL);
/*
@@ -214,6 +214,15 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* Prune and repair fragmentation and potentially freeze tuples on the
* specified page.
*
+ * heap_page_prune_and_freeze() may do all of the following: prune tuples,
+ * freeze tuples, determine the all-visible/all-frozen status of the apge and
+ * the associated snapshot conflict horizon, determine if truncating this page
+ * would be safe, and identify any new potential values for relfrozenxid and
+ * relminmxid. Not all of these responsibilities are useful to all callers. If
+ * by_vacuum is passed as True, heap_page_prune_and_freeze() will do all of
+ * these. Otherwise, the other parameters will determine which of these it
+ * does.
+ *
* If the page can be marked all-frozen in the visibility map, we may
* opportunistically freeze tuples on the page if either its tuples are old
* enough or freezing will be cheap enough.
@@ -242,6 +251,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool by_vacuum,
GlobalVisState *vistest,
bool mark_unused_now,
HeapPageFreeze *pagefrz,
@@ -254,6 +264,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
+ TransactionId visibility_cutoff_xid;
bool do_freeze;
bool do_prune;
bool do_hint;
@@ -275,7 +286,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.new_prune_xid = InvalidTransactionId;
prstate.vistest = vistest;
prstate.mark_unused_now = mark_unused_now;
- prstate.snapshotConflictHorizon = InvalidTransactionId;
+ prstate.latest_xid_removed = InvalidTransactionId;
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -302,8 +313,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all_visible is also set to true.
*/
presult->all_frozen = true;
- /* for recovery conflicts */
- presult->frz_conflict_horizon = InvalidTransactionId;
+
+ /*
+ * The visibility cutoff xid is the newest xmin of live tuples on the
+ * page. In the common case, this will be set as the conflict horizon the
+ * caller can use for updating the VM. If, at the end of freezing and
+ * pruning, the page is all-frozen, there is no possibility that any
+ * running transaction on the standby does not see tuples on the page as
+ * all-visible, so the conflict horizon remains InvalidTransactionId.
+ */
+ presult->vm_conflict_horizon = visibility_cutoff_xid = InvalidTransactionId;
/* For advancing relfrozenxid and relminmxid */
presult->new_relfrozenxid = InvalidTransactionId;
@@ -362,6 +381,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
+ /*
+ * On-access pruning does not update the VM nor provide pagefrz to
+ * consider freezing tuples, so skip the rest of the loop.
+ */
+ if (!by_vacuum)
+ continue;
+
/*
* The criteria for counting a tuple as live in this block need to
* match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
@@ -434,17 +460,19 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* don't consider the page all-visible.
*/
xmin = HeapTupleHeaderGetXmin(htup);
- if (xmin != FrozenTransactionId &&
- !GlobalVisTestIsRemovableXid(vistest, xmin))
+
+ /* For now always use pagefrz->cutoffs */
+ Assert(pagefrz);
+ if (!TransactionIdPrecedes(xmin, pagefrz->cutoffs->OldestXmin))
{
all_visible_except_removable = false;
break;
}
/* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, presult->frz_conflict_horizon) &&
+ if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
- presult->frz_conflict_horizon = xmin;
+ visibility_cutoff_xid = xmin;
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
@@ -673,10 +701,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_prune || do_freeze)
{
- /*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
- */
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+
+ /* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
{
heap_page_prune_execute(buffer, false,
@@ -692,12 +719,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* when the whole page is eligible to become all-frozen in the VM
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin. This avoids false
- * conflicts when hot_standby_feedback is in use.
+ * conflicts when hot_standby_feedback is in use. MFIXME: it is
+ * possible to use presult->all_visible by now, but is it clearer
+ * to use all_visible_except_removable?
*/
- if (!(all_visible_except_removable && presult->all_frozen))
+ if (all_visible_except_removable && presult->all_frozen)
+ frz_conflict_horizon = visibility_cutoff_xid;
+ else
{
- presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
- TransactionIdRetreat(presult->frz_conflict_horizon);
+ frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
}
heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
}
@@ -709,22 +740,22 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
+ TransactionId conflict_xid = InvalidTransactionId;
+
/*
- * The snapshotConflictHorizon for the whole record should be the most
- * conservative of all the horizons calculated for any of the possible
- * modifications. If this record will prune tuples, any transactions on
- * the standby older than the youngest xmax of the most recently removed
- * tuple this record will prune will conflict. If this record will freeze
- * tuples, any transactions on the standby with xids older than the
- * youngest tuple this record will freeze will conflict.
+ * The snapshot conflict horizon for the whole record should be
+ * the newest xid of all the horizons calculated for any of the
+ * possible modifications. If this record will prune tuples, any
+ * transactions on the standby older than the youngest xmax of the
+ * most recently removed tuple this record will prune will
+ * conflict. If this record will freeze tuples, any transactions
+ * on the standby with xids older than the youngest tuple this
+ * record will freeze will conflict.
*/
- TransactionId conflict_xid;
-
- if (do_freeze)
- conflict_xid = Max(prstate.snapshotConflictHorizon,
- presult->frz_conflict_horizon);
+ if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
+ conflict_xid = frz_conflict_horizon;
else
- conflict_xid = prstate.snapshotConflictHorizon;
+ conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
conflict_xid,
@@ -738,6 +769,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
+ /*
+ * For callers planning to update the visibility map, the conflict horizon
+ * for that record must be the newest xmin on the page. However, if the
+ * page is completely frozen, there can be no conflict and the
+ * vm_conflict_horizon should remain InvalidTransactionId.
+ */
+ if (!presult->all_frozen)
+ presult->vm_conflict_horizon = visibility_cutoff_xid;
+
/*
* If we froze tuples on the page, the caller can advance relfrozenxid and
* relminmxid to the values in pagefrz->FreezePageRelfrozenXid and
@@ -876,7 +916,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
heap_prune_record_unused(prstate, rootoffnum);
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
+ &prstate->latest_xid_removed);
ndeleted++;
}
@@ -1029,7 +1069,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
latestdead = offnum;
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
+ &prstate->latest_xid_removed);
}
else if (!recent_dead)
break;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c3da64102cf..ee4fdc00073 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1421,7 +1421,7 @@ lazy_scan_prune(LVRelState *vacrel,
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
* false otherwise.
*/
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+ heap_page_prune_and_freeze(rel, buf, true, vacrel->vistest, vacrel->nindexes == 0,
&pagefrz, &presult, &vacrel->offnum);
/*
@@ -1444,10 +1444,6 @@ lazy_scan_prune(LVRelState *vacrel,
* (don't confuse that with pages newly set all-frozen in VM).
*/
vacrel->frozen_pages++;
-
- /* Using same cutoff when setting VM is now unnecessary */
- if (presult.all_frozen)
- presult.frz_conflict_horizon = InvalidTransactionId;
}
/*
@@ -1473,7 +1469,7 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.frz_conflict_horizon);
+ debug_cutoff == presult.vm_conflict_horizon);
}
#endif
@@ -1527,7 +1523,7 @@ lazy_scan_prune(LVRelState *vacrel,
if (presult.all_frozen)
{
- Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1547,7 +1543,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, presult.frz_conflict_horizon,
+ vmbuffer, presult.vm_conflict_horizon,
flags);
}
@@ -1612,11 +1608,11 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Set the page all-frozen (and all-visible) in the VM.
*
- * We can pass InvalidTransactionId as our frz_conflict_horizon, since
- * a snapshotConflictHorizon sufficient to make everything safe for
- * REDO was logged when the page's tuples were frozen.
+ * We can pass InvalidTransactionId as our snapshot conflict horizon,
+ * since a snapshot conflict horizon sufficient to make everything
+ * safe for REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(presult.frz_conflict_horizon));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b2015f5a1ac..ba2ce01af04 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -212,8 +212,8 @@ typedef struct PruneFreezeResult
/* Whether or not the page can be set all frozen in the VM */
bool all_frozen;
- /* Number of newly frozen tuples */
- TransactionId frz_conflict_horizon; /* Newest xmin on the page */
+ /* Newest xmin on the page */
+ TransactionId vm_conflict_horizon;
/* New value of relfrozenxid found by heap_page_prune_and_freeze() */
TransactionId new_relfrozenxid;
@@ -322,6 +322,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ bool by_vacuum,
struct GlobalVisState *vistest,
bool mark_unused_now,
HeapPageFreeze *pagefrz,
On Sat, Mar 23, 2024 at 01:09:30AM +0200, Heikki Linnakangas wrote:
On 20/03/2024 21:17, Melanie Plageman wrote:
Attached patch changes-for-0001.patch has a bunch of updated comments --
especially for heapam_xlog.h (including my promised diagram). And a few
suggestions (mostly changes that I should have made before).New version of these WAL format changes attached. Squashed to one patch now.
I spent more time on the comments throughout the patch. And one
notable code change: I replaced the XLHP_LP_TRUNCATE_ONLY flag with
XLHP_CLEANUP_LOCK. XLHP_CLEANUP_LOCK directly indicates if you need a
cleanup lock to replay the record. It must always be set when
XLHP_HAS_REDIRECTIONS or XLHP_HAS_DEAD_ITEMS is set, because replaying those
always needs a cleanup lock. That felt easier to document and understand
than XLHP_LP_TRUNCATE_ONLY.
Makes sense to me.
From b26e36ba8614d907a6e15810ed4f684f8f628dd2 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 20 Mar 2024 14:53:31 +0200
Subject: [PATCH v5 08/26] minor refactoring in log_heap_prune_and_freeze()Mostly to make local variables more tightly-scoped.
So, I don't think you can move those sub-records into the tighter scope.
If you run tests with this applied, you'll see it crashes and a lot of
the data in the record is wrong. If you move the sub-record declarations
out to a wider scope, the tests pass.The WAL record data isn't actually copied into the buffer until
recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE_FREEZE);
after registering everything.
So all of those sub-records you made are out of scope the time it tries
to copy from them.I spent the better part of a day last week trying to figure out what was
happening after I did the exact same thing. I must say that I found the
xloginsert API incredibly unhelpful on this point.Oops. I had that in mind and that was actually why I moved the
XLogRegisterData() call to the end of the function, because I found it
confusing to register the struct before filling it in completely, even
though it works perfectly fine. But then I missed it anyway when I moved the
local variables. I added a brief comment on that.
Comment was a good idea.
There is another patch in the commitfest that touches this area: "Recording
whether Heap2/PRUNE records are from VACUUM or from opportunistic pruning"
[1]. That actually goes in the opposite direction than this patch. That
patch wants to add more information, to show whether a record was emitted by
VACUUM or on-access pruning, while this patch makes the freezing and
VACUUM's 2nd phase records also look the same. We could easily add more
flags to xl_heap_prune to distinguish them. Or assign different xl_info code
for them, like that other patch proposed. But I don't think that needs to
block this patch, that can be added as a separate patch.[1] /messages/by-id/CAH2-Wzmsevhox+HJpFmQgCxWWDgNzP0R9F+VBnpOK6TgxMPrRw@mail.gmail.com
I think it would be very helpful to distinguish amongst vacuum pass 1,
2, and on-access pruning. I often want to know what did most of the
pruning -- and I could see also wanting to know if the first or second
vacuum pass was responsible for removing the tuples. I agree it could be
done separately, but it would be very helpful to have as soon as
possible now that the record type will be the same for all three.
From 042185d3de14dcb7088bbe50e9c64e365ac42c2a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 22 Mar 2024 23:10:22 +0200
Subject: [PATCH v6] Merge prune, freeze and vacuum WAL record formats/* - * Handles XLOG_HEAP2_PRUNE record type. - * - * Acquires a full cleanup lock. + * Replay XLOG_HEAP2_PRUNE_FREEZE record. */ static void -heap_xlog_prune(XLogReaderState *record) +heap_xlog_prune_freeze(XLogReaderState *record) { XLogRecPtr lsn = record->EndRecPtr; - xl_heap_prune *xlrec = (xl_heap_prune *) XLogRecGetData(record); + char *ptr; + xl_heap_prune *xlrec; Buffer buffer; RelFileLocator rlocator; BlockNumber blkno; XLogRedoAction action;XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
+ ptr = XLogRecGetData(record);
I don't love having two different pointers that alias each other and we
don't know which one is for what. Perhaps we could memcpy xlrec like in
my attached diff (log_updates.diff). It also might perform slightly
better than accessing flags through a xl_heap_prune
* -- since it wouldn't be doing pointer dereferencing.
+ xlrec = (xl_heap_prune *) ptr;
+ ptr += SizeOfHeapPrune;/* - * We're about to remove tuples. In Hot Standby mode, ensure that there's - * no queries running for which the removed tuples are still visible. + * We will take an ordinary exclusive lock or a cleanup lock depending on + * whether the XLHP_CLEANUP_LOCK flag is set. With an ordinary exclusive + * lock, we better not be doing anything that requires moving existing + * tuple data. */ - if (InHotStandby) - ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon, - xlrec->isCatalogRel, + Assert((xlrec->flags & XLHP_CLEANUP_LOCK) != 0 || + (xlrec->flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0); + + /* + * We are about to remove and/or freeze tuples. In Hot Standby mode, + * ensure that there are no queries running for which the removed tuples + * are still visible or which still consider the frozen xids as running. + * The conflict horizon XID comes after xl_heap_prune. + */ + if (InHotStandby && (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON) != 0) + {
My attached patch has a TODO here for the comment. It sticks out that
the serialization and deserialization conditions are different for the
snapshot conflict horizon. We don't deserialize if InHotStandby is
false. That's still correct now because we don't write anything else to
the main data chunk afterward. But, if someone were to add another
member after snapshot_conflict_horizon, they would want to know to
deserialize snapshot_conflict_horizon first even if InHotStandby is
false.
+ TransactionId snapshot_conflict_horizon; +
I would throw a comment in about the memcpy being required because the
snapshot_conflict_horizon is in the buffer unaligned.
+ memcpy(&snapshot_conflict_horizon, ptr, sizeof(TransactionId)); + ResolveRecoveryConflictWithSnapshot(snapshot_conflict_horizon, + (xlrec->flags & XLHP_IS_CATALOG_REL) != 0, rlocator); + }/*
+ +/* + * Given a MAXALIGNed buffer returned by XLogRecGetBlockData() and pointed to + * by cursor and any xl_heap_prune flags, deserialize the arrays of + * OffsetNumbers contained in an XLOG_HEAP2_PRUNE_FREEZE record. + * + * This is in heapdesc.c so it can be shared between heap2_redo and heap2_desc + * code, the latter of which is used in frontend (pg_waldump) code. + */ +void +heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags, + int *nredirected, OffsetNumber **redirected, + int *ndead, OffsetNumber **nowdead, + int *nunused, OffsetNumber **nowunused, + int *nplans, xlhp_freeze_plan **plans, + OffsetNumber **frz_offsets) +{ + if (flags & XLHP_HAS_FREEZE_PLANS) + { + xlhp_freeze_plans *freeze_plans = (xlhp_freeze_plans *) cursor; + + *nplans = freeze_plans->nplans; + Assert(*nplans > 0); + *plans = freeze_plans->plans; + + cursor += offsetof(xlhp_freeze_plans, plans); + cursor += sizeof(xlhp_freeze_plan) * *nplans; + }
I noticed you decided to set these in the else statements. Is that to
emphasize that it is important to correctness that they be properly
initialized?
+ else + { + *nplans = 0; + *plans = NULL; + } +
Thanks again for all your work on this set!
- Melanie
Attachments:
log_updates.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a09ef75ac37..fb72d16c113 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8594,7 +8594,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
{
XLogRecPtr lsn = record->EndRecPtr;
char *ptr;
- xl_heap_prune *xlrec;
+ xl_heap_prune xlrec;
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
@@ -8602,8 +8602,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
ptr = XLogRecGetData(record);
- xlrec = (xl_heap_prune *) ptr;
- ptr += SizeOfHeapPrune;
+ memcpy(&xlrec, ptr, SizeOfHeapPrune);
/*
* We will take an ordinary exclusive lock or a cleanup lock depending on
@@ -8611,22 +8610,24 @@ heap_xlog_prune_freeze(XLogReaderState *record)
* lock, we better not be doing anything that requires moving existing
* tuple data.
*/
- Assert((xlrec->flags & XLHP_CLEANUP_LOCK) != 0 ||
- (xlrec->flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
+ Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
+ (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
/*
* We are about to remove and/or freeze tuples. In Hot Standby mode,
* ensure that there are no queries running for which the removed tuples
* are still visible or which still consider the frozen xids as running.
* The conflict horizon XID comes after xl_heap_prune.
+ * TODO: comment about deserialization conditions differing
*/
- if (InHotStandby && (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
+ if (InHotStandby && (xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
TransactionId snapshot_conflict_horizon;
- memcpy(&snapshot_conflict_horizon, ptr, sizeof(TransactionId));
+ // TODO: comment about unaligned so must memcpy
+ memcpy(&snapshot_conflict_horizon, ptr + SizeOfHeapPrune, sizeof(TransactionId));
ResolveRecoveryConflictWithSnapshot(snapshot_conflict_horizon,
- (xlrec->flags & XLHP_IS_CATALOG_REL) != 0,
+ (xlrec.flags & XLHP_IS_CATALOG_REL) != 0,
rlocator);
}
@@ -8634,7 +8635,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
* If we have a full-page image, restore it and we're done.
*/
action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec->flags & XLHP_CLEANUP_LOCK) != 0,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
&buffer);
if (action == BLK_NEEDS_REDO)
{
@@ -8651,7 +8652,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
OffsetNumber *frz_offsets;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
- heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec->flags,
+ heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
&nredirected, &redirected,
&ndead, &nowdead,
&nunused, &nowunused,
@@ -8663,7 +8664,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
*/
if (nredirected > 0 || ndead > 0 || nunused > 0)
heap_page_prune_execute(buffer,
- (xlrec->flags & XLHP_CLEANUP_LOCK) == 0,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
nowdead, ndead,
nowunused, nunused);
@@ -8715,7 +8716,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
*/
if (BufferIsValid(buffer))
{
- if (xlrec->flags & (XLHP_HAS_REDIRECTIONS |
+ if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
XLHP_HAS_DEAD_ITEMS |
XLHP_HAS_NOW_UNUSED_ITEMS))
{
On 24/03/2024 21:55, Melanie Plageman wrote:
On Sat, Mar 23, 2024 at 01:09:30AM +0200, Heikki Linnakangas wrote:
On 20/03/2024 21:17, Melanie Plageman wrote:
There is another patch in the commitfest that touches this area: "Recording
whether Heap2/PRUNE records are from VACUUM or from opportunistic pruning"
[1]. That actually goes in the opposite direction than this patch. That
patch wants to add more information, to show whether a record was emitted by
VACUUM or on-access pruning, while this patch makes the freezing and
VACUUM's 2nd phase records also look the same. We could easily add more
flags to xl_heap_prune to distinguish them. Or assign different xl_info code
for them, like that other patch proposed. But I don't think that needs to
block this patch, that can be added as a separate patch.[1] /messages/by-id/CAH2-Wzmsevhox+HJpFmQgCxWWDgNzP0R9F+VBnpOK6TgxMPrRw@mail.gmail.com
I think it would be very helpful to distinguish amongst vacuum pass 1,
2, and on-access pruning. I often want to know what did most of the
pruning -- and I could see also wanting to know if the first or second
vacuum pass was responsible for removing the tuples. I agree it could be
done separately, but it would be very helpful to have as soon as
possible now that the record type will be the same for all three.
Ok, I used separate 'info' codes for records generated on on-access
pruning and vacuum's 1st and 2nd pass. Similar to Peter's patch on that
other thread, except that I didn't reserve the whole high bit for this,
but used three different 'info' codes. Freezing uses the same
XLOG_HEAP2_PRUNE_VACUUM_SCAN as the pruning in vacuum's 1st pass. You
can distinguish them based on whether the record has nfrozen > 0, and
with the rest of the patches, they will be merged anyway.
From 042185d3de14dcb7088bbe50e9c64e365ac42c2a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Fri, 22 Mar 2024 23:10:22 +0200
Subject: [PATCH v6] Merge prune, freeze and vacuum WAL record formats/* - * Handles XLOG_HEAP2_PRUNE record type. - * - * Acquires a full cleanup lock. + * Replay XLOG_HEAP2_PRUNE_FREEZE record. */ static void -heap_xlog_prune(XLogReaderState *record) +heap_xlog_prune_freeze(XLogReaderState *record) { XLogRecPtr lsn = record->EndRecPtr; - xl_heap_prune *xlrec = (xl_heap_prune *) XLogRecGetData(record); + char *ptr; + xl_heap_prune *xlrec; Buffer buffer; RelFileLocator rlocator; BlockNumber blkno; XLogRedoAction action;XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
+ ptr = XLogRecGetData(record);I don't love having two different pointers that alias each other and we
don't know which one is for what. Perhaps we could memcpy xlrec like in
my attached diff (log_updates.diff). It also might perform slightly
better than accessing flags through a xl_heap_prune
* -- since it wouldn't be doing pointer dereferencing.
Ok.
/* - * We're about to remove tuples. In Hot Standby mode, ensure that there's - * no queries running for which the removed tuples are still visible. + * We will take an ordinary exclusive lock or a cleanup lock depending on + * whether the XLHP_CLEANUP_LOCK flag is set. With an ordinary exclusive + * lock, we better not be doing anything that requires moving existing + * tuple data. */ - if (InHotStandby) - ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon, - xlrec->isCatalogRel, + Assert((xlrec->flags & XLHP_CLEANUP_LOCK) != 0 || + (xlrec->flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0); + + /* + * We are about to remove and/or freeze tuples. In Hot Standby mode, + * ensure that there are no queries running for which the removed tuples + * are still visible or which still consider the frozen xids as running. + * The conflict horizon XID comes after xl_heap_prune. + */ + if (InHotStandby && (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON) != 0) + {My attached patch has a TODO here for the comment. It sticks out that
the serialization and deserialization conditions are different for the
snapshot conflict horizon. We don't deserialize if InHotStandby is
false. That's still correct now because we don't write anything else to
the main data chunk afterward. But, if someone were to add another
member after snapshot_conflict_horizon, they would want to know to
deserialize snapshot_conflict_horizon first even if InHotStandby is
false.
Good point. Fixed so that 'snapshot_conflict_horizon' is deserialized
even if !InHotStandby. A memcpy is cheap, and is probably optimized away
by the compiler anyway.
+ TransactionId snapshot_conflict_horizon; +I would throw a comment in about the memcpy being required because the
snapshot_conflict_horizon is in the buffer unaligned.
Added.
+/* + * Given a MAXALIGNed buffer returned by XLogRecGetBlockData() and pointed to + * by cursor and any xl_heap_prune flags, deserialize the arrays of + * OffsetNumbers contained in an XLOG_HEAP2_PRUNE_FREEZE record. + * + * This is in heapdesc.c so it can be shared between heap2_redo and heap2_desc + * code, the latter of which is used in frontend (pg_waldump) code. + */ +void +heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags, + int *nredirected, OffsetNumber **redirected, + int *ndead, OffsetNumber **nowdead, + int *nunused, OffsetNumber **nowunused, + int *nplans, xlhp_freeze_plan **plans, + OffsetNumber **frz_offsets) +{ + if (flags & XLHP_HAS_FREEZE_PLANS) + { + xlhp_freeze_plans *freeze_plans = (xlhp_freeze_plans *) cursor; + + *nplans = freeze_plans->nplans; + Assert(*nplans > 0); + *plans = freeze_plans->plans; + + cursor += offsetof(xlhp_freeze_plans, plans); + cursor += sizeof(xlhp_freeze_plan) * *nplans; + }I noticed you decided to set these in the else statements. Is that to
emphasize that it is important to correctness that they be properly
initialized?
The point was to always initialize *nplans et al in the function. You
required the caller to initialize them to zero, but that seems error-prone.
I made one more last minute change: I changed the order of the array
arguments in heap_xlog_deserialize_prune_and_freeze() to match the order
in log_heap_prune_and_freeze().
Committed with the above little changes. Thank you! Now, back to the
rest of the patches :-).
--
Heikki Linnakangas
Neon (https://neon.tech)
On 24/03/2024 18:32, Melanie Plageman wrote:
On Thu, Mar 21, 2024 at 9:28 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
In heap_page_prune_and_freeze(), we now do some extra work on each live
tuple, to set the all_visible_except_removable correctly. And also to
update live_tuples, recently_dead_tuples and hastup. When we're not
freezing, that's a waste of cycles, the caller doesn't care. I hope it's
enough that it doesn't matter, but is it?Last year on an early version of the patch set I did some pgbench
tpcb-like benchmarks -- since there is a lot of on-access pruning in
that workload -- and I don't remember it being a showstopper. The code
has changed a fair bit since then. However, I think it might be safer
to pass a flag "by_vacuum" to heap_page_prune_and_freeze() and skip
the rest of the loop after heap_prune_satisifies_vacuum() when
on-access pruning invokes it. I had avoided that because it felt ugly
and error-prone, however it addresses a few other of your points as
well.
Ok. I'm not a fan of the name 'by_vacuum' though. It'd be nice if the
argument described what it does, rather than who it's for. For example,
'need_all_visible'. If set to true, the function determines
'all_visible', otherwise it does not.
I started to look closer at the loops in heap_prune_chain() and how they
update all the various flags and counters. There's a lot going on there.
We have:
- live_tuples counter
- recently_dead_tuples counter
- all_visible[_except_removable]
- all_frozen
- visibility_cutoff_xid
- hastup
- prstate.frozen array
- nnewlpdead
- deadoffsets array
And that doesn't even include all the local variables and the final
dead/redirected arrays.
Some of those are set in the first loop that initializes 'htsv' for each
tuple on the page. Others are updated in heap_prune_chain(). Some are
updated in both. It's hard to follow which are set where.
I think recently_dead_tuples is updated incorrectly, for tuples that are
part of a completely dead HOT chain. For example, imagine a hot chain
with two tuples: RECENTLY_DEAD -> DEAD. heap_prune_chain() would follow
the chain, see the DEAD tuple at the end of the chain, and mark both
tuples for pruning. However, we already updated 'recently_dead_tuples'
in the first loop, which is wrong if we remove the tuple.
Maybe that's the only bug like this, but I'm a little scared. Is there
something we could do to make this simpler? Maybe move all the new work
that we added to the first loop, into heap_prune_chain() ? Maybe
introduce a few more helper heap_prune_record_*() functions, to update
the flags and counters also for live and insert/delete-in-progress
tuples and for dead line pointers? Something like
heap_prune_record_live() and heap_prune_record_lp_dead().
The 'frz_conflict_horizon' stuff is still fuzzy to me. (Not necessarily
these patches's fault). This at least is wrong, because Max(a, b)
doesn't handle XID wraparound correctly:if (do_freeze)
conflict_xid = Max(prstate.snapshotConflictHorizon,
presult->frz_conflict_horizon);
else
conflict_xid = prstate.snapshotConflictHorizon;Then there's this in lazy_scan_prune():
/* Using same cutoff when setting VM is now unnecessary */
if (presult.all_frozen)
presult.frz_conflict_horizon = InvalidTransactionId;This does the right thing in the end, but if all the tuples are frozen
shouldn't frz_conflict_horizon already be InvalidTransactionId? The
comment says it's "newest xmin on the page", and if everything was
frozen, all xmins are FrozenTransactionId. In other words, that should
be moved to heap_page_prune_and_freeze() so that it doesn't lie to its
caller. Also, frz_conflict_horizon is only set correctly if
'all_frozen==true', would be good to mention that in the comments too.Yes, this is a good point. I've spent some time swapping all of this
back into my head. I think we should change the names of all these
conflict horizon variables and introduce some local variables again.
In the attached patch, I've updated the name of the variable in
PruneFreezeResult to vm_conflict_horizon, as it is only used for
emitting a VM update record. Now, I don't set it until the end of
heap_page_prune_and_freeze(). It is only updated from
InvalidTransactionId if the page is not all frozen. As you say, if the
page is all frozen, there can be no conflict.
Makes sense.
I've also changed PruneState->snapshotConflictHorizon to
PruneState->latest_xid_removed.And I introduced the local variables visibility_cutoff_xid and
frz_conflict_horizon. I think it is important we distinguish between
the latest xid pruned, the latest xmin of tuples frozen, and the
latest xid of all live tuples on the page.Though we end up using visibility_cutoff_xid as the freeze conflict
horizon if the page is all frozen, I think it is more clear to have
the three variables and name them what they are.
Agreed.
Note that I still don't think we have a resolution on what to
correctly update new_relfrozenxid and new_relminmxid to at the end
when presult->nfrozen == 0 and presult->all_frozen is true.if (presult->nfrozen > 0)
{
presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
}
else
{
presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
}
One approach is to take them out of the PageFreezeResult struct again,
and pass them as pointers:
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
...
TransactionId *new_relfrozenxid,
MultiXactId *new_relminmxid,
...
)
That would be natural for the caller too, as it wouldn't need to set up
the old values to HeapPageFreeze before each call, nor copy the new
values to 'vacrel' after the call. I'm thinking that we'd move the
responsibility of setting up HeapPageFreeze to
heap_page_prune_and_freeze(), instead of having the caller set it up. So
the caller would look something like this:
heap_page_prune_and_freeze(rel, buf, vacrel->vistest,
&vacrel->cutoffs, &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid,
&presult,
PRUNE_VACUUM_SCAN, flags,
&vacrel->offnum);
In this setup, heap_page_prune_and_freeze() would update
*new_relfrozenxid and *new_relminmxid when it has a new value for them,
and leave them unchanged otherwise.
--
Heikki Linnakangas
Neon (https://neon.tech)
Thanks for committing the new WAL format!
On Mon, Mar 25, 2024 at 3:33 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 24/03/2024 18:32, Melanie Plageman wrote:
On Thu, Mar 21, 2024 at 9:28 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
In heap_page_prune_and_freeze(), we now do some extra work on each live
tuple, to set the all_visible_except_removable correctly. And also to
update live_tuples, recently_dead_tuples and hastup. When we're not
freezing, that's a waste of cycles, the caller doesn't care. I hope it's
enough that it doesn't matter, but is it?Last year on an early version of the patch set I did some pgbench
tpcb-like benchmarks -- since there is a lot of on-access pruning in
that workload -- and I don't remember it being a showstopper. The code
has changed a fair bit since then. However, I think it might be safer
to pass a flag "by_vacuum" to heap_page_prune_and_freeze() and skip
the rest of the loop after heap_prune_satisifies_vacuum() when
on-access pruning invokes it. I had avoided that because it felt ugly
and error-prone, however it addresses a few other of your points as
well.Ok. I'm not a fan of the name 'by_vacuum' though. It'd be nice if the
argument described what it does, rather than who it's for. For example,
'need_all_visible'. If set to true, the function determines
'all_visible', otherwise it does not.
I like that way of putting it -- describing what it does instead of
who it is for. However, we now have PruneReason as an argument to
heap_page_prune(), which would be usable for this purpose (for
skipping the rest of the first loop). It is not descriptive of how we
would use it in this scenario, though.
I started to look closer at the loops in heap_prune_chain() and how they
update all the various flags and counters. There's a lot going on there.
We have:- live_tuples counter
- recently_dead_tuples counter
- all_visible[_except_removable]
- all_frozen
- visibility_cutoff_xid
- hastup
- prstate.frozen array
- nnewlpdead
- deadoffsets arrayAnd that doesn't even include all the local variables and the final
dead/redirected arrays.
Yes, there are a lot of things happening. In an early version, I had
hoped for the first loop to be just getting the visibility information
and then to do most of the other stuff as we went in
heap_prune_chain() as you mention below. I couldn't quite get a
version of that working that looked nice. I agree that the whole thing
feels a bit brittle and error-prone. It's hard to be objective after
fiddling with something over the course of a year. I'm trying to take
a step back now and rethink it.
Some of those are set in the first loop that initializes 'htsv' for each
tuple on the page. Others are updated in heap_prune_chain(). Some are
updated in both. It's hard to follow which are set where.
Yep.
I think recently_dead_tuples is updated incorrectly, for tuples that are
part of a completely dead HOT chain. For example, imagine a hot chain
with two tuples: RECENTLY_DEAD -> DEAD. heap_prune_chain() would follow
the chain, see the DEAD tuple at the end of the chain, and mark both
tuples for pruning. However, we already updated 'recently_dead_tuples'
in the first loop, which is wrong if we remove the tuple.Maybe that's the only bug like this, but I'm a little scared. Is there
something we could do to make this simpler? Maybe move all the new work
that we added to the first loop, into heap_prune_chain() ? Maybe
introduce a few more helper heap_prune_record_*() functions, to update
the flags and counters also for live and insert/delete-in-progress
tuples and for dead line pointers? Something like
heap_prune_record_live() and heap_prune_record_lp_dead().
I had discarded previous attempts to get everything done in
heap_prune_chain() because it was hard to make sure I was doing the
right thing given that it visits the line pointers out of order so
making sure you've considered all of them once and only once was hard.
I hadn't thought of the approach you suggested with record_live() --
that might help. I will work on this tomorrow. I had hoped to get
something out today, but I am still in the middle of rebasing the back
20 patches from your v5 over current master and then adding in the
suggestions that I made in the various diffs on the thread.
Note that I still don't think we have a resolution on what to
correctly update new_relfrozenxid and new_relminmxid to at the end
when presult->nfrozen == 0 and presult->all_frozen is true.if (presult->nfrozen > 0)
{
presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
}
else
{
presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
}One approach is to take them out of the PageFreezeResult struct again,
and pass them as pointers:void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
...
TransactionId *new_relfrozenxid,
MultiXactId *new_relminmxid,
...
)
What about the question about whether or not we should be using
FreezePageRelfrozenXid when all_frozen is true and nrfrozen == 0. I
was confused about whether or not we had to do this by the comment in
HeapPageFreeze.
That would be natural for the caller too, as it wouldn't need to set up
the old values to HeapPageFreeze before each call, nor copy the new
values to 'vacrel' after the call. I'm thinking that we'd move the
responsibility of setting up HeapPageFreeze to
heap_page_prune_and_freeze(), instead of having the caller set it up. So
the caller would look something like this:heap_page_prune_and_freeze(rel, buf, vacrel->vistest,
&vacrel->cutoffs, &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid,
&presult,
PRUNE_VACUUM_SCAN, flags,
&vacrel->offnum);In this setup, heap_page_prune_and_freeze() would update
*new_relfrozenxid and *new_relminmxid when it has a new value for them,
and leave them unchanged otherwise.
I do prefer having heap_page_prune_and_freeze() own HeapPageFreeze.
- Melanie
On Mon, Mar 25, 2024 at 09:33:38PM +0200, Heikki Linnakangas wrote:
On 24/03/2024 18:32, Melanie Plageman wrote:
On Thu, Mar 21, 2024 at 9:28 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
In heap_page_prune_and_freeze(), we now do some extra work on each live
tuple, to set the all_visible_except_removable correctly. And also to
update live_tuples, recently_dead_tuples and hastup. When we're not
freezing, that's a waste of cycles, the caller doesn't care. I hope it's
enough that it doesn't matter, but is it?Last year on an early version of the patch set I did some pgbench
tpcb-like benchmarks -- since there is a lot of on-access pruning in
that workload -- and I don't remember it being a showstopper. The code
has changed a fair bit since then. However, I think it might be safer
to pass a flag "by_vacuum" to heap_page_prune_and_freeze() and skip
the rest of the loop after heap_prune_satisifies_vacuum() when
on-access pruning invokes it. I had avoided that because it felt ugly
and error-prone, however it addresses a few other of your points as
well.Ok. I'm not a fan of the name 'by_vacuum' though. It'd be nice if the
argument described what it does, rather than who it's for. For example,
'need_all_visible'. If set to true, the function determines 'all_visible',
otherwise it does not.
A very rough v7 is attached. The whole thing is rebased over master and
then 0016 contains an attempt at the refactor we discussed in this
email.
Instead of just using the PruneReason to avoid doing the extra steps
when on-access pruning calls heap_page_prune_and_freeze(), I've made an
"actions" variable and defined different flags for it. One of them is
a replacement for the existing mark_unused_now flag. I defined another
one, PRUNE_DO_TRY_FREEZE, which could be used in place of checking if
pagefrz is NULL.
There is a whole group of activities that only the vacuum caller does
outside of freezing -- setting hastup, counting live and recently dead
tuples, determining whole page visibility and a snapshot conflict
horizon for updating the VM. But I didn't want to introduce separate
flags for each of them, because then I would have to check each of them
before taking the action. That would be lots of extra branching and
on-access pruning does none of those actions while vacuum does all of
them.
I started to look closer at the loops in heap_prune_chain() and how they
update all the various flags and counters. There's a lot going on there. We
have:- live_tuples counter
- recently_dead_tuples counter
- all_visible[_except_removable]
- all_frozen
- visibility_cutoff_xid
- hastup
- prstate.frozen array
- nnewlpdead
- deadoffsets arrayAnd that doesn't even include all the local variables and the final
dead/redirected arrays.Some of those are set in the first loop that initializes 'htsv' for each
tuple on the page. Others are updated in heap_prune_chain(). Some are
updated in both. It's hard to follow which are set where.I think recently_dead_tuples is updated incorrectly, for tuples that are
part of a completely dead HOT chain. For example, imagine a hot chain with
two tuples: RECENTLY_DEAD -> DEAD. heap_prune_chain() would follow the
chain, see the DEAD tuple at the end of the chain, and mark both tuples for
pruning. However, we already updated 'recently_dead_tuples' in the first
loop, which is wrong if we remove the tuple.
Ah, yes, you are so right about this bug.
Maybe that's the only bug like this, but I'm a little scared. Is there
something we could do to make this simpler? Maybe move all the new work that
we added to the first loop, into heap_prune_chain() ? Maybe introduce a few
more helper heap_prune_record_*() functions, to update the flags and
counters also for live and insert/delete-in-progress tuples and for dead
line pointers? Something like heap_prune_record_live() and
heap_prune_record_lp_dead().
I like the idea of a heap_prune_record_live_or_recently_dead() function.
That's what I've attempted to implement in the attached 0016. I haven't
updated and cleaned up everything (especially comments) in the refactor,
but there are two major issues:
1) In heap_prune_chain(), a heap-only tuple which is not HOT updated may
end up being a live tuple not part of any chain or it may end up the
redirect target in a HOT chain. At the top of heap_prune_chain(), we
return if (HeapTupleHeaderIsHeapOnly(htup)). We may come back to this
tuple later if it is part of a chain. If we don't, we need to have
called heap_prune_record_live_or_recently_dead(). However, there are
other tuples that get redirected to which do not meet this criteria, so
we must call heap_prune_record_live_or_recently_dead() when setting an
item redirected to. If we call heap_prune_record_live_or_recently_dead()
in both places, we will double-count. To fix this, I introduced an
array, "counted". But that takes up extra space in the PruneState and
extra cycles to memset it.
I can't think of a way to make sure we count the right tuples without
another array. The tuples we need to count are those not pointed to by
prstate->marked + those tuples whose line pointers will be redirected to
(those are marked).
2) A large number of the members of PruneFreezeResult are only
initialized for the vacuum caller now. Even with a comment, this is a
bit confusing. And, it seems like there should be some symmetry between
the actions the caller tells heap_page_prune_and_freeze() to take and
the result parameters that are filled in.
I am concerned about adding all of the actions (setting hastup,
determining whole page visibility, etc as mentioned above) because then
I also have to check all the actions and that will add extra branching.
And out of the two callers of heap_page_prune_and_freeze(), one will do
all of the actions and one will do none of them except "main" pruning.
Note that I still don't think we have a resolution on what to
correctly update new_relfrozenxid and new_relminmxid to at the end
when presult->nfrozen == 0 and presult->all_frozen is true.if (presult->nfrozen > 0)
{
presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
}
else
{
presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
}One approach is to take them out of the PageFreezeResult struct again, and
pass them as pointers:void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
...
TransactionId *new_relfrozenxid,
MultiXactId *new_relminmxid,
...
)That would be natural for the caller too, as it wouldn't need to set up the
old values to HeapPageFreeze before each call, nor copy the new values to
'vacrel' after the call. I'm thinking that we'd move the responsibility of
setting up HeapPageFreeze to heap_page_prune_and_freeze(), instead of having
the caller set it up. So the caller would look something like this:heap_page_prune_and_freeze(rel, buf, vacrel->vistest,
&vacrel->cutoffs, &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid,
&presult,
PRUNE_VACUUM_SCAN, flags,
&vacrel->offnum);In this setup, heap_page_prune_and_freeze() would update *new_relfrozenxid
and *new_relminmxid when it has a new value for them, and leave them
unchanged otherwise.
I've passed new_relfrozen_xid and new_relmin_mxid as arguments.
But as for only updating them when there is a new value, that doesn't
sound cheaper than just setting them when they are passed in with the
values from [No]FreezePageRelfrozenXid, [No]FreezePageRelminMxid. Unless
you are imagining a way to simplify the current
[No]FreezePageRelfrozenXid, [No]FreezePageRelminMxid.
- Melanie
Attachments:
v7-0001-lazy_scan_prune-tests-tuple-vis-with-GlobalVisTes.patchtext/x-diff; charset=us-asciiDownload
From 454c4e3b7eedcc97dc107eec2f9418193ccc9efc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:14:47 -0500
Subject: [PATCH v7 01/16] lazy_scan_prune tests tuple vis with GlobalVisTest
One requirement for eventually combining the prune and freeze records,
is that we must check during pruning if live tuples on the page are
visible to everyone and thus, whether or not the page is all visible. We
only consider opportunistically freezing tuples if the whole page is all
visible and could be set all frozen.
During pruning (in heap_page_prune()), we do not have access to
VacuumCutoffs -- as on access pruning also calls heap_page_prune(). We
do, however, have access to a GlobalVisState. This can be used to
determine whether or not the tuple is visible to everyone. It also has
the potential of being more up-to-date than VacuumCutoffs->OldestXmin.
This commit simply modifies lazy_scan_prune() to use GlobalVisState
instead of OldestXmin. Future commits will move the
all_visible/all_frozen calculation into heap_page_prune().
---
src/backend/access/heap/vacuumlazy.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ba5b7083a3a..a7451743e25 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1579,11 +1579,15 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
+ * that everyone sees it as committed? A
+ * FrozenTransactionId is seen as committed to everyone.
+ * Otherwise, we check if there is a snapshot that
+ * considers this xid to still be running, and if so, we
+ * don't consider the page all-visible.
*/
xmin = HeapTupleHeaderGetXmin(htup);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ if (xmin != FrozenTransactionId &&
+ !GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
{
all_visible = false;
break;
--
2.40.1
v7-0002-Pass-heap_prune_chain-PruneResult-output-paramete.patchtext/x-diff; charset=us-asciiDownload
From fd5c22d54df25bcf091024390730cecdf53c3aa4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:39:59 -0500
Subject: [PATCH v7 02/16] Pass heap_prune_chain() PruneResult output parameter
Future commits will set other members of PruneResult in
heap_prune_chain(), so start passing it as an output parameter now. This
eliminates the output parameter htsv -- the array of HTSV_Results --
since that is a member of the PruneResult.
---
src/backend/access/heap/pruneheap.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4e58c2c2ff4..c1542b95af8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -59,8 +59,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- int8 *htsv,
- PruneState *prstate);
+ PruneState *prstate, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -325,7 +324,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Process this item or chain of items */
presult->ndeleted += heap_prune_chain(buffer, offnum,
- presult->htsv, &prstate);
+ &prstate, presult);
}
/* Clear the offset information once we have processed the given page. */
@@ -427,7 +426,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in htsv.
+ * Tuple visibility information is provided in presult->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -457,7 +456,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- int8 *htsv, PruneState *prstate)
+ PruneState *prstate, PruneResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -478,7 +477,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(htsv[rootoffnum] != -1);
+ Assert(presult->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -501,7 +500,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -598,7 +597,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(htsv[offnum]))
+ switch (htsv_get_valid_status(presult->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
--
2.40.1
v7-0003-Rename-PruneState-snapshotConflictHorizon-to-late.patchtext/x-diff; charset=us-asciiDownload
From 3fa6bd308c7ac1bebf0b04efb7e943029cd68f8d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 18:02:09 -0400
Subject: [PATCH v7 03/16] Rename PruneState->snapshotConflictHorizon to
latest_xid_removed
In anticipation of combining pruning and freezing and emitting a single
WAL record, rename PruneState->snapshotConflictHorizon to
latest_xid_removed. After pruning and freezing, we will choose a
combined record snapshot conflict horizon taking into account both
values.
---
src/backend/access/heap/pruneheap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c1542b95af8..ca4301bb8a9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -35,7 +35,7 @@ typedef struct
bool mark_unused_now;
TransactionId new_prune_xid; /* new prune hint value for page */
- TransactionId snapshotConflictHorizon; /* latest xid removed */
+ TransactionId latest_xid_removed;
int nredirected; /* numbers of entries in arrays below */
int ndead;
int nunused;
@@ -238,7 +238,7 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.new_prune_xid = InvalidTransactionId;
prstate.vistest = vistest;
prstate.mark_unused_now = mark_unused_now;
- prstate.snapshotConflictHorizon = InvalidTransactionId;
+ prstate.latest_xid_removed = InvalidTransactionId;
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -367,7 +367,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (RelationNeedsWAL(relation))
{
log_heap_prune_and_freeze(relation, buffer,
- prstate.snapshotConflictHorizon,
+ prstate.latest_xid_removed,
true, reason,
NULL, 0,
prstate.redirected, prstate.nredirected,
@@ -505,7 +505,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
heap_prune_record_unused(prstate, rootoffnum);
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
+ &prstate->latest_xid_removed);
ndeleted++;
}
@@ -651,7 +651,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
latestdead = offnum;
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
+ &prstate->latest_xid_removed);
}
else if (!recent_dead)
break;
--
2.40.1
v7-0004-heap_page_prune-sets-all_visible-and-visibility_c.patchtext/x-diff; charset=us-asciiDownload
From 550baedebd3bd9b98541ad267ac8face3b58fbe8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 18:31:24 -0400
Subject: [PATCH v7 04/16] heap_page_prune sets all_visible and
visibility_cutoff_xid
In order to combine the prune and freeze records, we must know if the
page is eligible to be opportunistically frozen before finishing
pruning. Save all_visible in the PruneResult and set it to false when we
see non-removable tuples which are not visible to everyone.
We will also need to ensure that the snapshotConflictHorizon for the combined
prune + freeze record is the more conservative of that calculated for each of
pruning and freezing. Calculate the visibility_cutoff_xid for the purposes of
freezing -- the newest xmin on the page -- in heap_page_prune() and save it in
PruneResult.visibility_cutoff_xid.
Note that these are only needed by vacuum callers of heap_page_prune(),
so don't update them for on-access pruning.
---
src/backend/access/heap/pruneheap.c | 131 +++++++++++++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 113 +++++------------------
src/include/access/heapam.h | 21 +++++
3 files changed, 169 insertions(+), 96 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ca4301bb8a9..5776ae84f4d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -63,8 +63,10 @@ static int heap_prune_chain(Buffer buffer,
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
-static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -249,6 +251,14 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ /*
+ * Keep track of whether or not the page is all_visible in case the caller
+ * wants to use this information to update the VM.
+ */
+ presult->all_visible = true;
+ /* for recovery conflicts */
+ presult->visibility_cutoff_xid = InvalidTransactionId;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -300,8 +310,101 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
+
+ if (reason == PRUNE_ON_ACCESS)
+ continue;
+
+ switch (presult->htsv[offnum])
+ {
+ case HEAPTUPLE_DEAD:
+
+ /*
+ * Deliberately delay unsetting all_visible until later during
+ * pruning. Removable dead tuples shouldn't preclude freezing
+ * the page. After finishing this first pass of tuple
+ * visibility checks, initialize all_visible_except_removable
+ * with the current value of all_visible to indicate whether
+ * or not the page is all visible except for dead tuples. This
+ * will allow us to attempt to freeze the page after pruning.
+ * Later during pruning, if we encounter an LP_DEAD item or
+ * are setting an item LP_DEAD, we will unset all_visible. As
+ * long as we unset it before updating the visibility map,
+ * this will be correct.
+ */
+ break;
+ case HEAPTUPLE_LIVE:
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed
+ * asynchronously. See SetHintBits for more info. Check that
+ * the tuple is hinted xmin-committed because of that.
+ */
+ if (presult->all_visible)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed? A
+ * FrozenTransactionId is seen as committed to everyone.
+ * Otherwise, we check if there is a snapshot that
+ * considers this xid to still be running, and if so, we
+ * don't consider the page all-visible.
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+ if (xmin != FrozenTransactionId &&
+ !GlobalVisTestIsRemovableXid(vistest, xmin))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, presult->visibility_cutoff_xid) &&
+ TransactionIdIsNormal(xmin))
+ presult->visibility_cutoff_xid = xmin;
+ }
+ break;
+ case HEAPTUPLE_RECENTLY_DEAD:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+ /* This is an expected case during concurrent vacuum */
+ presult->all_visible = false;
+ break;
+ default:
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+ break;
+ }
}
+ /*
+ * For vacuum, if the whole page will become frozen, we consider
+ * opportunistically freezing tuples. Dead tuples which will be removed by
+ * the end of vacuuming should not preclude us from opportunistically
+ * freezing. We will not be able to freeze the whole page if there are
+ * tuples present which are not visible to everyone or if there are dead
+ * tuples which are not yet removable. We need all_visible to be false if
+ * LP_DEAD tuples remain after pruning so that we do not incorrectly
+ * update the visibility map or page hint bit. So, we will update
+ * presult->all_visible to reflect the presence of LP_DEAD items while
+ * pruning and keep all_visible_except_removable to permit freezing if the
+ * whole page will eventually become all visible after removing tuples.
+ */
+ presult->all_visible_except_removable = presult->all_visible;
+
/* Scan the page */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -569,10 +672,14 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
/*
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now. We don't increment ndeleted here since
- * the LP was already marked dead.
+ * the LP was already marked dead. If it will not be marked
+ * LP_UNUSED, it will remain LP_DEAD, making the page not
+ * all_visible.
*/
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
+ else
+ presult->all_visible = false;
break;
}
@@ -709,7 +816,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
}
@@ -722,7 +829,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect item. We can clean up by setting the redirect item to
* DEAD state or LP_UNUSED if the caller indicated.
*/
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
return ndeleted;
@@ -759,13 +866,20 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
-heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+
+ /*
+ * Setting the line pointer LP_DEAD means the page will definitely not be
+ * all_visible.
+ */
+ presult->all_visible = false;
}
/*
@@ -775,7 +889,8 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
* pointers LP_DEAD if mark_unused_now is true.
*/
static void
-heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
@@ -786,7 +901,7 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
- heap_prune_record_dead(prstate, offnum);
+ heap_prune_record_dead(prstate, offnum, presult);
}
/* Record line pointer to be marked unused */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a7451743e25..17fb0b4f7b7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1422,9 +1422,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_visible,
- all_frozen;
- TransactionId visibility_cutoff_xid;
+ bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1465,17 +1463,16 @@ lazy_scan_prune(LVRelState *vacrel,
&presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
/*
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Keep track of whether or not the page is all_visible and
- * all_frozen and use this information to update the VM. all_visible
- * implies 0 lpdead_items, but don't trust all_frozen result unless
- * all_visible is also set to true.
+ * Now scan the page to collect LP_DEAD items and check for tuples
+ * requiring freezing among remaining tuples with storage. We will update
+ * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
+ * have determined whether or not the page is all_visible. Keep track of
+ * whether or not the page is all_frozen and use this information to
+ * update the VM. all_visible implies lpdead_items == 0, but don't trust
+ * all_frozen result unless all_visible is also set to true.
*
- * Also keep track of the visibility cutoff xid for recovery conflicts.
*/
- all_visible = true;
all_frozen = true;
- visibility_cutoff_xid = InvalidTransactionId;
/*
* Now scan the page to collect LP_DEAD items and update the variables set
@@ -1516,11 +1513,6 @@ lazy_scan_prune(LVRelState *vacrel,
* will only happen every other VACUUM, at most. Besides, VACUUM
* must treat hastup/nonempty_pages as provisional no matter how
* LP_DEAD items are handled (handled here, or handled later on).
- *
- * Also deliberately delay unsetting all_visible until just before
- * we return to lazy_scan_heap caller, as explained in full below.
- * (This is another case where it's useful to anticipate that any
- * LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
*/
deadoffsets[lpdead_items++] = offnum;
continue;
@@ -1558,46 +1550,6 @@ lazy_scan_prune(LVRelState *vacrel,
* what acquire_sample_rows() does.
*/
live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A
- * FrozenTransactionId is seen as committed to everyone.
- * Otherwise, we check if there is a snapshot that
- * considers this xid to still be running, and if so, we
- * don't consider the page all-visible.
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (xmin != FrozenTransactionId &&
- !GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
- {
- all_visible = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
break;
case HEAPTUPLE_RECENTLY_DEAD:
@@ -1607,7 +1559,6 @@ lazy_scan_prune(LVRelState *vacrel,
* pruning.)
*/
recently_dead_tuples++;
- all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1618,16 +1569,13 @@ lazy_scan_prune(LVRelState *vacrel,
* results. This assumption is a bit shaky, but it is what
* acquire_sample_rows() does, so be consistent.
*/
- all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
- all_visible = false;
/*
- * Count such rows as live. As above, we assume the deleting
- * transaction will commit and update the counters after we
- * report.
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
*/
live_tuples++;
break;
@@ -1670,7 +1618,7 @@ lazy_scan_prune(LVRelState *vacrel,
* page all-frozen afterwards (might not happen until final heap pass).
*/
if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (all_visible && all_frozen &&
+ (presult.all_visible_except_removable && all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1708,11 +1656,11 @@ lazy_scan_prune(LVRelState *vacrel,
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (all_visible && all_frozen)
+ if (presult.all_visible_except_removable && all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = visibility_cutoff_xid;
- visibility_cutoff_xid = InvalidTransactionId;
+ snapshotConflictHorizon = presult.visibility_cutoff_xid;
+ presult.visibility_cutoff_xid = InvalidTransactionId;
}
else
{
@@ -1748,17 +1696,19 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && lpdead_items == 0)
+ if (presult.all_visible)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
+ Assert(lpdead_items == 0);
+
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
Assert(false);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == visibility_cutoff_xid);
+ debug_cutoff == presult.visibility_cutoff_xid);
}
#endif
@@ -1783,19 +1733,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(dead_items->num_items <= dead_items->max_items);
pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
dead_items->num_items);
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on
- * to make the choice of whether or not to freeze the page unaffected
- * by the short-term presence of LP_DEAD items. These LP_DEAD items
- * were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible. It needs
- * to reflect the present state of things, as expected by our caller.
- */
- all_visible = false;
}
/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1812,20 +1749,20 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (lpdead_items > 0);
- Assert(!all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_visible || !(*has_lpdead_items));
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (all_frozen)
{
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1845,7 +1782,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
+ vmbuffer, presult.visibility_cutoff_xid,
flags);
}
@@ -1893,7 +1830,7 @@ lazy_scan_prune(LVRelState *vacrel,
* it as all-frozen. Note that all_frozen is only valid if all_visible is
* true, so we must check both all_visible and all_frozen.
*/
- else if (all_visible_according_to_vm && all_visible &&
+ else if (all_visible_according_to_vm && presult.all_visible &&
all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
@@ -1914,7 +1851,7 @@ lazy_scan_prune(LVRelState *vacrel,
* since a snapshotConflictHorizon sufficient to make everything safe
* for REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 368c570a0f4..8d0dd40ba6d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -199,6 +199,27 @@ typedef struct PruneResult
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ /*
+ * The rest of the fields in PruneResult are only guaranteed to be
+ * initialized if heap_page_prune is passed PruneReason VACUUM_SCAN.
+ */
+
+ /*
+ * Whether or not the page is truly all-visible after pruning. If there
+ * are LP_DEAD items on the page which cannot be removed until vacuum's
+ * second pass, this will be false.
+ */
+ bool all_visible;
+
+ /*
+ * Whether or not the page is all-visible except for tuples which will be
+ * removed during vacuum's second pass. This is used by VACUUM to
+ * determine whether or not to consider opportunistically freezing the
+ * page.
+ */
+ bool all_visible_except_removable;
+ TransactionId visibility_cutoff_xid; /* Newest xmin on the page */
+
/*
* Tuple visibility is only computed once for each tuple, for correctness
* and efficiency reasons; see comment in heap_page_prune() for details.
--
2.40.1
v7-0005-Add-reference-to-VacuumCutoffs-in-HeapPageFreeze.patchtext/x-diff; charset=us-asciiDownload
From a87f670388f2c3683c4eecdb95ed81a0874335c3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 16:22:17 -0500
Subject: [PATCH v7 05/16] Add reference to VacuumCutoffs in HeapPageFreeze
Future commits will move opportunistic freezing into the main path of
pruning in heap_page_prune(). Because on-access pruning will not do
opportunistic freezing, it is cleaner to keep the visibility information
required for calling heap_prepare_freeze_tuple() inside of the
HeapPageFreeze structure itself by saving a reference to VacuumCutoffs.
---
src/backend/access/heap/heapam.c | 16 ++++++++--------
src/backend/access/heap/vacuumlazy.c | 3 ++-
src/include/access/heapam.h | 2 +-
3 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index cc67dd813d2..e38c710c192 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6020,9 +6020,9 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
*/
static TransactionId
FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
- const struct VacuumCutoffs *cutoffs, uint16 *flags,
- HeapPageFreeze *pagefrz)
+ uint16 *flags, HeapPageFreeze *pagefrz)
{
+ const struct VacuumCutoffs *cutoffs = pagefrz->cutoffs;
TransactionId newxmax;
MultiXactMember *members;
int nmembers;
@@ -6370,10 +6370,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
*/
bool
heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen)
{
+ const struct VacuumCutoffs *cutoffs = pagefrz->cutoffs;
bool xmin_already_frozen = false,
xmax_already_frozen = false;
bool freeze_xmin = false,
@@ -6445,8 +6445,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* perform no-op xmax processing. The only constraint is that the
* FreezeLimit/MultiXactCutoff postcondition must never be violated.
*/
- newxmax = FreezeMultiXactId(xid, tuple->t_infomask, cutoffs,
- &flags, pagefrz);
+ newxmax = FreezeMultiXactId(xid, tuple->t_infomask, &flags, pagefrz);
if (flags & FRM_NOOP)
{
@@ -6624,7 +6623,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* Does this tuple force caller to freeze the entire page?
*/
pagefrz->freeze_required =
- heap_tuple_should_freeze(tuple, cutoffs,
+ heap_tuple_should_freeze(tuple, pagefrz->cutoffs,
&pagefrz->NoFreezePageRelfrozenXid,
&pagefrz->NoFreezePageRelminMxid);
}
@@ -6785,8 +6784,9 @@ heap_freeze_tuple(HeapTupleHeader tuple,
pagefrz.NoFreezePageRelfrozenXid = FreezeLimit;
pagefrz.NoFreezePageRelminMxid = MultiXactCutoff;
- do_freeze = heap_prepare_freeze_tuple(tuple, &cutoffs,
- &pagefrz, &frz, &totally_frozen);
+ pagefrz.cutoffs = &cutoffs;
+
+ do_freeze = heap_prepare_freeze_tuple(tuple, &pagefrz, &frz, &totally_frozen);
/*
* Note that because this is not a WAL-logged operation, we don't need to
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 17fb0b4f7b7..1b060124a3f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1442,6 +1442,7 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ pagefrz.cutoffs = &vacrel->cutoffs;
tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
@@ -1587,7 +1588,7 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
+ if (heap_prepare_freeze_tuple(htup, &pagefrz,
&frozen[tuples_frozen], &totally_frozen))
{
/* Save prepared freeze plan for later */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8d0dd40ba6d..3f510e8e197 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -189,6 +189,7 @@ typedef struct HeapPageFreeze
TransactionId NoFreezePageRelfrozenXid;
MultiXactId NoFreezePageRelminMxid;
+ struct VacuumCutoffs *cutoffs;
} HeapPageFreeze;
/*
@@ -321,7 +322,6 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
--
2.40.1
v7-0006-Prepare-freeze-tuples-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From bf07b2d814d978a9bf21460801d6f65a71fb0e8b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 19:23:11 -0400
Subject: [PATCH v7 06/16] Prepare freeze tuples in heap_page_prune()
In order to combine the freeze and prune records, we must determine
which tuples are freezable before actually executing pruning. All of the
page modifications should be made in the same critical section along
with emitting the combined WAL. Determine whether or not tuples should
or must be frozen and whether or not the page will be all frozen as a
consequence during pruning.
---
src/backend/access/heap/pruneheap.c | 41 +++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 68 ++++++----------------------
src/include/access/heapam.h | 12 +++++
3 files changed, 64 insertions(+), 57 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5776ae84f4d..457650ab651 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -153,7 +153,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false,
+ heap_page_prune(relation, buffer, vistest, false, NULL,
&presult, PRUNE_ON_ACCESS, NULL);
/*
@@ -201,6 +201,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED
* during pruning.
*
+ * pagefrz contains both input and output parameters used if the caller is
+ * interested in potentially freezing tuples on the page.
+ *
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
* heap_page_prune() is responsible for initializing it.
@@ -215,6 +218,7 @@ void
heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
PruneReason reason,
OffsetNumber *off_loc)
@@ -250,11 +254,16 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ presult->nfrozen = 0;
/*
- * Keep track of whether or not the page is all_visible in case the caller
- * wants to use this information to update the VM.
+ * Caller will update the VM after pruning, collecting LP_DEAD items, and
+ * freezing tuples. Keep track of whether or not the page is all_visible
+ * and all_frozen and use this information to update the VM. all_visible
+ * implies lpdead_items == 0, but don't trust all_frozen result unless
+ * all_visible is also set to true.
*/
+ presult->all_frozen = true;
presult->all_visible = true;
/* for recovery conflicts */
presult->visibility_cutoff_xid = InvalidTransactionId;
@@ -388,6 +397,32 @@ heap_page_prune(Relation relation, Buffer buffer,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
+
+ /*
+ * Consider freezing any normal tuples which will not be removed
+ */
+ if (presult->htsv[offnum] != HEAPTUPLE_DEAD && pagefrz)
+ {
+ bool totally_frozen;
+
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, pagefrz,
+ &presult->frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ presult->frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to
+ * become totally frozen (according to its freeze plan), then the
+ * page definitely cannot be set all-frozen in the visibility map
+ * later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+ }
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1b060124a3f..2a3cc5c7cd3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1416,16 +1416,13 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff;
ItemId itemid;
PruneResult presult;
- int tuples_frozen,
- lpdead_items,
+ int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1443,7 +1440,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
recently_dead_tuples = 0;
@@ -1460,21 +1456,9 @@ lazy_scan_prune(LVRelState *vacrel,
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
* false otherwise.
*/
- heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+ heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0, &pagefrz,
&presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
- /*
- * Now scan the page to collect LP_DEAD items and check for tuples
- * requiring freezing among remaining tuples with storage. We will update
- * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
- * have determined whether or not the page is all_visible. Keep track of
- * whether or not the page is all_frozen and use this information to
- * update the VM. all_visible implies lpdead_items == 0, but don't trust
- * all_frozen result unless all_visible is also set to true.
- *
- */
- all_frozen = true;
-
/*
* Now scan the page to collect LP_DEAD items and update the variables set
* just above.
@@ -1483,9 +1467,6 @@ lazy_scan_prune(LVRelState *vacrel,
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
- HeapTupleHeader htup;
- bool totally_frozen;
-
/*
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
@@ -1521,8 +1502,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(ItemIdIsNormal(itemid));
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
/*
* The criteria for counting a tuple as live in this block need to
* match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
@@ -1587,29 +1566,8 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
- /* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &pagefrz,
- &frozen[tuples_frozen], &totally_frozen))
- {
- /* Save prepared freeze plan for later */
- frozen[tuples_frozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the page
- * definitely cannot be set all-frozen in the visibility map later on
- */
- if (!totally_frozen)
- all_frozen = false;
}
- /*
- * We have now divided every item on the page into either an LP_DEAD item
- * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
- * that remains and needs to be considered for freezing now (LP_UNUSED and
- * LP_REDIRECT items also remain, but are of no further interest to us).
- */
vacrel->offnum = InvalidOffsetNumber;
/*
@@ -1618,8 +1576,8 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (presult.all_visible_except_removable && all_frozen &&
+ if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ (presult.all_visible_except_removable && presult.all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1629,7 +1587,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (tuples_frozen == 0)
+ if (presult.nfrozen == 0)
{
/*
* We have no freeze plans to execute, so there's no added cost
@@ -1657,7 +1615,7 @@ lazy_scan_prune(LVRelState *vacrel,
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (presult.all_visible_except_removable && all_frozen)
+ if (presult.all_visible_except_removable && presult.all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
snapshotConflictHorizon = presult.visibility_cutoff_xid;
@@ -1673,7 +1631,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(vacrel->rel, buf,
snapshotConflictHorizon,
- frozen, tuples_frozen);
+ presult.frozen, presult.nfrozen);
}
}
else
@@ -1684,8 +1642,8 @@ lazy_scan_prune(LVRelState *vacrel,
*/
vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- all_frozen = false;
- tuples_frozen = 0; /* avoid miscounts in instrumentation */
+ presult.all_frozen = false;
+ presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1708,6 +1666,8 @@ lazy_scan_prune(LVRelState *vacrel,
&debug_cutoff, &debug_all_frozen))
Assert(false);
+ Assert(presult.all_frozen == debug_all_frozen);
+
Assert(!TransactionIdIsValid(debug_cutoff) ||
debug_cutoff == presult.visibility_cutoff_xid);
}
@@ -1738,7 +1698,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += tuples_frozen;
+ vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
vacrel->live_tuples += live_tuples;
vacrel->recently_dead_tuples += recently_dead_tuples;
@@ -1761,7 +1721,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (all_frozen)
+ if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1832,7 +1792,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && presult.all_visible &&
- all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3f510e8e197..59c81f38e51 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -219,6 +219,9 @@ typedef struct PruneResult
* page.
*/
bool all_visible_except_removable;
+
+ /* Whether or not the page can be set all-frozen in the VM */
+ bool all_frozen;
TransactionId visibility_cutoff_xid; /* Newest xmin on the page */
/*
@@ -231,6 +234,14 @@ typedef struct PruneResult
* 1. Otherwise every access would need to subtract 1.
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+
+ /* Number of tuples we may freeze */
+ int nfrozen;
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneResult;
/* 'reason' codes for heap_page_prune() */
@@ -350,6 +361,7 @@ extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
PruneReason reason,
OffsetNumber *off_loc);
--
2.40.1
v7-0007-lazy_scan_prune-reorder-freeze-execution-logic.patchtext/x-diff; charset=us-asciiDownload
From 1a3855df690c912e1189633058aa3ed9a4a77151 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 19:39:25 -0400
Subject: [PATCH v7 07/16] lazy_scan_prune reorder freeze execution logic
To combine the prune and freeze records, freezing must be done before a
pruning WAL record is emitted. We will move the freeze execution into
heap_page_prune() in future commits. lazy_scan_prune() currently
executes freezing, updates vacrel->NewRelfrozenXid and
vacrel->NewRelminMxid, and resets the snapshotConflictHorizon that the
visibility map update record may use in the same block of if statements.
This commit starts reordering that logic so that the freeze execution
can be separated from the other updates which should not be done in
pruning.
---
src/backend/access/heap/vacuumlazy.c | 93 +++++++++++++++-------------
1 file changed, 50 insertions(+), 43 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2a3cc5c7cd3..f474e661428 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1421,6 +1421,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
+ bool do_freeze;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1576,10 +1577,15 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ do_freeze = pagefrz.freeze_required ||
(presult.all_visible_except_removable && presult.all_frozen &&
- fpi_before != pgWalUsage.wal_fpi))
+ presult.nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+
+ if (do_freeze)
{
+ TransactionId snapshotConflictHorizon;
+
/*
* We're freezing the page. Our final NewRelfrozenXid doesn't need to
* be affected by the XIDs that are just about to be frozen anyway.
@@ -1587,52 +1593,53 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (presult.nfrozen == 0)
- {
- /*
- * We have no freeze plans to execute, so there's no added cost
- * from following the freeze path. That's why it was chosen. This
- * is important in the case where the page only contains totally
- * frozen tuples at this point (perhaps only following pruning).
- * Such pages can be marked all-frozen in the VM by our caller,
- * even though none of its tuples were newly frozen here (note
- * that the "no freeze" path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter
- * here, since it only counts pages with newly frozen tuples
- * (don't confuse that with pages newly set all-frozen in VM).
- */
- }
+ vacrel->frozen_pages++;
+
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts when
+ * the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin.
+ */
+ if (presult.all_visible_except_removable && presult.all_frozen)
+ snapshotConflictHorizon = presult.visibility_cutoff_xid;
else
{
- TransactionId snapshotConflictHorizon;
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ snapshotConflictHorizon = pagefrz.cutoffs->OldestXmin;
+ TransactionIdRetreat(snapshotConflictHorizon);
+ }
- vacrel->frozen_pages++;
+ /* Using same cutoff when setting VM is now unnecessary */
+ if (presult.all_visible_except_removable && presult.all_frozen)
+ presult.visibility_cutoff_xid = InvalidTransactionId;
- /*
- * We can use visibility_cutoff_xid as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM
- * once we're done with it. Otherwise we generate a conservative
- * cutoff by stepping back from OldestXmin.
- */
- if (presult.all_visible_except_removable && presult.all_frozen)
- {
- /* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = presult.visibility_cutoff_xid;
- presult.visibility_cutoff_xid = InvalidTransactionId;
- }
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(vacrel->rel, buf,
+ snapshotConflictHorizon,
+ presult.frozen, presult.nfrozen);
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
+ }
+ else if (presult.all_frozen && presult.nfrozen == 0)
+ {
+ /* Page should be all visible except to-be-removed tuples */
+ Assert(presult.all_visible_except_removable);
+
+ /*
+ * We have no freeze plans to execute, so there's no added cost from
+ * following the freeze path. That's why it was chosen. This is
+ * important in the case where the page only contains totally frozen
+ * tuples at this point (perhaps only following pruning). Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here (note that the "no freeze"
+ * path never sets pages all-frozen).
+ *
+ * We never increment the frozen_pages instrumentation counter here,
+ * since it only counts pages with newly frozen tuples (don't confuse
+ * that with pages newly set all-frozen in VM).
+ */
+ vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
+ vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
}
else
{
--
2.40.1
v7-0008-Execute-freezing-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 674b2272f9a40310dd01860a221d7221563571c6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 20:32:11 -0400
Subject: [PATCH v7 08/16] Execute freezing in heap_page_prune()
As a step toward combining the prune and freeze WAL records, execute
freezing in heap_page_prune(). The logic to determine whether or not to
execute freeze plans was moved from lazy_scan_prune() over to
heap_page_prune() with little modification.
---
src/backend/access/heap/heapam_handler.c | 2 +-
src/backend/access/heap/pruneheap.c | 189 ++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 150 +++++-------------
src/backend/storage/ipc/procarray.c | 6 +-
src/include/access/heapam.h | 53 ++++---
src/tools/pgindent/typedefs.list | 2 +-
6 files changed, 225 insertions(+), 177 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 2b7c7026429..4802d789963 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1048,7 +1048,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
* We ignore unused and redirect line pointers. DEAD line pointers
* should be counted as dead, because we need vacuum to run to get rid
* of them. Note that this rule agrees with the way that
- * heap_page_prune() counts things.
+ * heap_page_prune_and_freeze() counts things.
*/
if (!ItemIdIsNormal(itemid))
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 457650ab651..e009c7579dd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -17,16 +17,19 @@
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
+#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
+#include "commands/vacuum.h"
#include "access/xloginsert.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
-/* Working data for heap_page_prune and subroutines */
+/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
/* tuple visibility test, initialized for the relation */
@@ -51,6 +54,11 @@ typedef struct
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneState;
/* Local functions */
@@ -59,14 +67,15 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- PruneState *prstate, PruneResult *presult);
+ PruneState *prstate, PruneFreezeResult *presult);
+
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult);
+ PruneFreezeResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult);
+ PruneFreezeResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -146,15 +155,15 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
- PruneResult presult;
+ PruneFreezeResult presult;
/*
* For now, pass mark_unused_now as false regardless of whether or
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false, NULL,
- &presult, PRUNE_ON_ACCESS, NULL);
+ heap_page_prune_and_freeze(relation, buffer, vistest, false, NULL,
+ &presult, PRUNE_ON_ACCESS, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -188,7 +197,12 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
/*
- * Prune and repair fragmentation in the specified page.
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * If the page can be marked all-frozen in the visibility map, we may
+ * opportunistically freeze tuples on the page if either its tuples are old
+ * enough or freezing will be cheap enough.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -201,12 +215,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED
* during pruning.
*
- * pagefrz contains both input and output parameters used if the caller is
- * interested in potentially freezing tuples on the page.
+ * pagefrz is an input parameter containing visibility cutoff information and
+ * the current relfrozenxid and relminmxids used if the caller is interested in
+ * freezing tuples on the page.
*
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
- * heap_page_prune() is responsible for initializing it.
+ * heap_page_prune_and_freeze() is responsible for initializing it.
*
* reason indicates why the pruning is performed. It is included in the WAL
* record for debugging and analysis purposes, but otherwise has no effect.
@@ -215,13 +230,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* callback.
*/
void
-heap_page_prune(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc)
+heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ GlobalVisState *vistest,
+ bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -229,6 +244,10 @@ heap_page_prune(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
+ TransactionId visibility_cutoff_xid;
+ bool do_freeze;
+ bool all_visible_except_removable;
+ int64 fpi_before = pgWalUsage.wal_fpi;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -264,9 +283,20 @@ heap_page_prune(Relation relation, Buffer buffer,
* all_visible is also set to true.
*/
presult->all_frozen = true;
- presult->all_visible = true;
- /* for recovery conflicts */
- presult->visibility_cutoff_xid = InvalidTransactionId;
+
+ /*
+ * The visibility cutoff xid is the newest xmin of live tuples on the
+ * page. In the common case, this will be set as the conflict horizon the
+ * caller can use for updating the VM. If, at the end of freezing and
+ * pruning, the page is all-frozen, there is no possibility that any
+ * running transaction on the standby does not see tuples on the page as
+ * all-visible, so the conflict horizon remains InvalidTransactionId.
+ */
+ presult->vm_conflict_horizon = visibility_cutoff_xid = InvalidTransactionId;
+
+ /* For advancing relfrozenxid and relminmxid */
+ presult->new_relfrozenxid = InvalidTransactionId;
+ presult->new_relminmxid = InvalidMultiXactId;
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -291,6 +321,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* prefetching efficiency significantly / decreases the number of cache
* misses.
*/
+ all_visible_except_removable = true;
for (offnum = maxoff;
offnum >= FirstOffsetNumber;
offnum = OffsetNumberPrev(offnum))
@@ -351,13 +382,13 @@ heap_page_prune(Relation relation, Buffer buffer,
* asynchronously. See SetHintBits for more info. Check that
* the tuple is hinted xmin-committed because of that.
*/
- if (presult->all_visible)
+ if (all_visible_except_removable)
{
TransactionId xmin;
if (!HeapTupleHeaderXminCommitted(htup))
{
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
}
@@ -373,25 +404,25 @@ heap_page_prune(Relation relation, Buffer buffer,
if (xmin != FrozenTransactionId &&
!GlobalVisTestIsRemovableXid(vistest, xmin))
{
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
}
/* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, presult->visibility_cutoff_xid) &&
+ if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
- presult->visibility_cutoff_xid = xmin;
+ visibility_cutoff_xid = xmin;
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
/* This is an expected case during concurrent vacuum */
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
default:
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
@@ -407,11 +438,11 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Tuple with storage -- consider need to freeze */
if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &presult->frozen[presult->nfrozen],
+ &prstate.frozen[presult->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- presult->frozen[presult->nfrozen++].offset = offnum;
+ prstate.frozen[presult->nfrozen++].offset = offnum;
}
/*
@@ -438,7 +469,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* pruning and keep all_visible_except_removable to permit freezing if the
* whole page will eventually become all visible after removing tuples.
*/
- presult->all_visible_except_removable = presult->all_visible;
+ presult->all_visible = all_visible_except_removable;
/* Scan the page */
for (offnum = FirstOffsetNumber;
@@ -537,6 +568,86 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ */
+ if (pagefrz)
+ do_freeze = pagefrz->freeze_required ||
+ (all_visible_except_removable && presult->all_frozen &&
+ presult->nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+ else
+ do_freeze = false;
+
+ if (do_freeze)
+ {
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for conflicts
+ * when the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin. This avoids false conflicts when
+ * hot_standby_feedback is in use.
+ */
+ if (all_visible_except_removable && presult->all_frozen)
+ frz_conflict_horizon = visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(relation, buffer,
+ frz_conflict_horizon,
+ prstate.frozen, presult->nfrozen);
+ }
+ else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
+ {
+ /*
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all frozen and there
+ * will be no newly frozen tuples.
+ */
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+
+ /*
+ * For callers planning to update the visibility map, the conflict horizon
+ * for that record must be the newest xmin on the page. However, if the
+ * page is completely frozen, there can be no conflict and the
+ * vm_conflict_horizon should remain InvalidTransactionId.
+ */
+ if (!presult->all_frozen)
+ presult->vm_conflict_horizon = visibility_cutoff_xid;
+
+ if (pagefrz)
+ {
+ /*
+ * If we will freeze tuples on the page or, even if we don't freeze
+ * tuples on the page, if we will set the page all-frozen in the
+ * visibility map, we can advance relfrozenxid and relminmxid to the
+ * values in pagefrz->FreezePageRelfrozenXid and
+ * pagefrz->FreezePageRelminMxid.
+ */
+ if (presult->all_frozen || presult->nfrozen > 0)
+ {
+ presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
+ }
+ else
+ {
+ presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
+ }
+ }
}
@@ -594,7 +705,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- PruneState *prstate, PruneResult *presult)
+ PruneState *prstate, PruneFreezeResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -859,10 +970,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
/*
* We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune caused us to
- * visit the dead successor of a redirect item before visiting the
- * redirect item. We can clean up by setting the redirect item to
- * DEAD state or LP_UNUSED if the caller indicated.
+ * item. This can happen if the loop in heap_page_prune_and_freeze()
+ * caused us to visit the dead successor of a redirect item before
+ * visiting the redirect item. We can clean up by setting the
+ * redirect item to DEAD state or LP_UNUSED if the caller indicated.
*/
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
@@ -902,7 +1013,7 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult)
+ PruneFreezeResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
@@ -925,7 +1036,7 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
*/
static void
heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult)
+ PruneFreezeResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f474e661428..8beef4093ae 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,12 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* as an upper bound on the XIDs stored in the pages we'll actually scan
* (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
*
- * Next acquire vistest, a related cutoff that's used in heap_page_prune.
- * We expect vistest will always make heap_page_prune remove any deleted
- * tuple whose xmax is < OldestXmin. lazy_scan_prune must never become
- * confused about whether a tuple should be frozen or removed. (In the
- * future we might want to teach lazy_scan_prune to recompute vistest from
- * time to time, to increase the number of dead tuples it can prune away.)
+ * Next acquire vistest, a related cutoff that's used in
+ * heap_page_prune_and_freeze(). We expect vistest will always make
+ * heap_page_prune_and_freeze() remove any deleted tuple whose xmax is <
+ * OldestXmin. (In the future we might want to teach lazy_scan_prune to
+ * recompute vistest from time to time, to increase the number of dead
+ * tuples it can prune away.)
*/
vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
@@ -1378,21 +1378,21 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
- * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
- * whether or not a tuple should be considered DEAD. This happened when an
- * inserting transaction concurrently aborted (after our heap_page_prune()
- * call, before our HeapTupleSatisfiesVacuum() call). There was rather a lot
- * of complexity just so we could deal with tuples that were DEAD to VACUUM,
- * but nevertheless were left with storage after pruning.
+ * Prior to PostgreSQL 14 there were very rare cases where
+ * heap_page_prune_and_freeze() was allowed to disagree with our
+ * HeapTupleSatisfiesVacuum() call about whether or not a tuple should be
+ * considered DEAD. This happened when an inserting transaction concurrently
+ * aborted (after our heap_page_prune_and_freeze() call, before our
+ * HeapTupleSatisfiesVacuum() call). There was rather a lot of complexity just
+ * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
+ * left with storage after pruning.
*
* As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune()'s visibility check. Without the second call to
- * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
- * disagreement. We'll just handle such tuples as if they had become fully dead
- * right after this operation completes instead of in the middle of it. Note that
- * any tuple that becomes dead after the call to heap_page_prune() can't need to
- * be frozen, because it was visible to another session when vacuum started.
+ * result of heap_page_prune_and_freeze()'s visibility check. Without the
+ * second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
+ * there can be no disagreement. We'll just handle such tuples as if they had
+ * become fully dead right after this operation completes instead of in the
+ * middle of it.
*
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
@@ -1415,26 +1415,24 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
- PruneResult presult;
+ PruneFreezeResult presult;
int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool do_freeze;
- int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
/*
* maxoff might be reduced following line pointer array truncation in
- * heap_page_prune. That's safe for us to ignore, since the reclaimed
- * space will continue to look like LP_UNUSED items below.
+ * heap_page_prune_and_freeze(). That's safe for us to ignore, since the
+ * reclaimed space will continue to look like LP_UNUSED items below.
*/
maxoff = PageGetMaxOffsetNumber(page);
- /* Initialize (or reset) page-level state */
+ /* Initialize pagefrz */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
@@ -1446,7 +1444,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples = 0;
/*
- * Prune all HOT-update chains in this page.
+ * Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
* in presult.ndeleted. It should not be confused with lpdead_items;
@@ -1457,8 +1455,8 @@ lazy_scan_prune(LVRelState *vacrel,
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
* false otherwise.
*/
- heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0, &pagefrz,
- &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
+ heap_page_prune_and_freeze(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+ &pagefrz, &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and update the variables set
@@ -1571,86 +1569,20 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = InvalidOffsetNumber;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- do_freeze = pagefrz.freeze_required ||
- (presult.all_visible_except_removable && presult.all_frozen &&
- presult.nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
+ Assert(MultiXactIdIsValid(presult.new_relminmxid));
+ vacrel->NewRelfrozenXid = presult.new_relfrozenxid;
+ Assert(TransactionIdIsValid(presult.new_relfrozenxid));
+ vacrel->NewRelminMxid = presult.new_relminmxid;
- if (do_freeze)
+ if (presult.nfrozen > 0)
{
- TransactionId snapshotConflictHorizon;
-
/*
- * We're freezing the page. Our final NewRelfrozenXid doesn't need to
- * be affected by the XIDs that are just about to be frozen anyway.
+ * We never increment the frozen_pages instrumentation counter when
+ * nfrozen == 0, since it only counts pages with newly frozen tuples
+ * (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
-
vacrel->frozen_pages++;
- /*
- * We can use frz_conflict_horizon as our cutoff for conflicts when
- * the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin.
- */
- if (presult.all_visible_except_removable && presult.all_frozen)
- snapshotConflictHorizon = presult.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = pagefrz.cutoffs->OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
-
- /* Using same cutoff when setting VM is now unnecessary */
- if (presult.all_visible_except_removable && presult.all_frozen)
- presult.visibility_cutoff_xid = InvalidTransactionId;
-
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
-
- }
- else if (presult.all_frozen && presult.nfrozen == 0)
- {
- /* Page should be all visible except to-be-removed tuples */
- Assert(presult.all_visible_except_removable);
-
- /*
- * We have no freeze plans to execute, so there's no added cost from
- * following the freeze path. That's why it was chosen. This is
- * important in the case where the page only contains totally frozen
- * tuples at this point (perhaps only following pruning). Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here (note that the "no freeze"
- * path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter here,
- * since it only counts pages with newly frozen tuples (don't confuse
- * that with pages newly set all-frozen in VM).
- */
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- }
- else
- {
- /*
- * Page requires "no freeze" processing. It might be set all-visible
- * in the visibility map, but it can never be set all-frozen.
- */
- vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- presult.all_frozen = false;
- presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1676,7 +1608,7 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.visibility_cutoff_xid);
+ debug_cutoff == presult.vm_conflict_horizon);
}
#endif
@@ -1730,7 +1662,7 @@ lazy_scan_prune(LVRelState *vacrel,
if (presult.all_frozen)
{
- Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1750,7 +1682,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, presult.visibility_cutoff_xid,
+ vmbuffer, presult.vm_conflict_horizon,
flags);
}
@@ -1815,11 +1747,11 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Set the page all-frozen (and all-visible) in the VM.
*
- * We can pass InvalidTransactionId as our visibility_cutoff_xid,
- * since a snapshotConflictHorizon sufficient to make everything safe
- * for REDO was logged when the page's tuples were frozen.
+ * We can pass InvalidTransactionId as our vm_conflict_horizon, since
+ * a snapshotConflictHorizon sufficient to make everything safe for
+ * REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index b3cd248fb64..88a6d504dff 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1715,9 +1715,9 @@ TransactionIdIsActive(TransactionId xid)
* Note: the approximate horizons (see definition of GlobalVisState) are
* updated by the computations done here. That's currently required for
* correctness and a small optimization. Without doing so it's possible that
- * heap vacuum's call to heap_page_prune() uses a more conservative horizon
- * than later when deciding which tuples can be removed - which the code
- * doesn't expect (breaking HOT).
+ * heap vacuum's call to heap_page_prune_and_freeze() uses a more conservative
+ * horizon than later when deciding which tuples can be removed - which the
+ * code doesn't expect (breaking HOT).
*/
static void
ComputeXidHorizons(ComputeXidHorizonsResult *h)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 59c81f38e51..6f9c66a872b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -195,13 +195,13 @@ typedef struct HeapPageFreeze
/*
* Per-page state returned from pruning
*/
-typedef struct PruneResult
+typedef struct PruneFreezeResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
/*
- * The rest of the fields in PruneResult are only guaranteed to be
+ * The rest of the fields in PruneFreezeResult are only guaranteed to be
* initialized if heap_page_prune is passed PruneReason VACUUM_SCAN.
*/
@@ -212,23 +212,22 @@ typedef struct PruneResult
*/
bool all_visible;
- /*
- * Whether or not the page is all-visible except for tuples which will be
- * removed during vacuum's second pass. This is used by VACUUM to
- * determine whether or not to consider opportunistically freezing the
- * page.
- */
- bool all_visible_except_removable;
-
/* Whether or not the page can be set all-frozen in the VM */
bool all_frozen;
- TransactionId visibility_cutoff_xid; /* Newest xmin on the page */
+
+ /*
+ * If the page is all-visible and not all-frozen this is the oldest xid
+ * that can see the page as all-visible. It is to be used as the snapshot
+ * conflict horizon when emitting a XLOG_HEAP2_VISIBLE record.
+ */
+ TransactionId vm_conflict_horizon;
/*
* Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
*
* This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
* 1. Otherwise every access would need to subtract 1.
@@ -242,9 +241,14 @@ typedef struct PruneResult
* One entry for every tuple that we may freeze.
*/
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
-} PruneResult;
+ /* New value of relfrozenxid found by heap_page_prune_and_freeze() */
+ TransactionId new_relfrozenxid;
-/* 'reason' codes for heap_page_prune() */
+ /* New value of relminmxid found by heap_page_prune_and_freeze() */
+ MultiXactId new_relminmxid;
+} PruneFreezeResult;
+
+/* 'reason' codes for heap_page_prune_and_freeze() */
typedef enum
{
PRUNE_ON_ACCESS, /* on-access pruning */
@@ -254,7 +258,7 @@ typedef enum
/*
* Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneResult.htsv for details. This helper function is meant to
+ * of int8. See PruneFreezeResult.htsv for details. This helper function is meant to
* guard against examining visibility status array members which have not yet
* been computed.
*/
@@ -332,6 +336,7 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
Buffer *buffer, struct TM_FailureData *tmfd);
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
+
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
@@ -358,13 +363,13 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc);
+extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ struct GlobalVisState *vistest,
+ bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4679660837c..cc6a33ab3ee 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2192,8 +2192,8 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
+PruneFreezeResult
PruneReason
-PruneResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.40.1
v7-0009-Make-opp-freeze-heuristic-compatible-with-prune-f.patchtext/x-diff; charset=us-asciiDownload
From e0801f4e1efd83272b945233a39ea0134d271ff8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 20:48:11 -0400
Subject: [PATCH v7 09/16] Make opp freeze heuristic compatible with
prune+freeze record
Once the prune and freeze records are combined, we will no longer be
able to use a test of whether or not pruning emitted an FPI to decide
whether or not to opportunistically freeze a freezable page.
While this heuristic should be improved, for now, approximate the
previous logic by keeping track of whether or not a hint bit FPI was
emitted during visibility checks (when checksums are on) and combine
that with checking XLogCheckBufferNeedsBackup(). If we just finished
deciding whether or not to prune and the current buffer seems to need an
FPI after modification, it is likely that pruning would have emitted an
FPI.
---
src/backend/access/heap/pruneheap.c | 57 +++++++++++++++++++++--------
1 file changed, 42 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e009c7579dd..d38de9b063d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -247,6 +247,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId visibility_cutoff_xid;
bool do_freeze;
bool all_visible_except_removable;
+ bool do_prune;
+ bool whole_page_freezable;
+ bool hint_bit_fpi;
+ bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
/*
@@ -456,6 +460,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
}
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted. Then reset fpi_before for no prune case.
+ */
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ fpi_before = pgWalUsage.wal_fpi;
+
/*
* For vacuum, if the whole page will become frozen, we consider
* opportunistically freezing tuples. Dead tuples which will be removed by
@@ -500,11 +511,41 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = InvalidOffsetNumber;
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
+ /*
+ * Only incur overhead of checking if we will do an FPI if we might use
+ * the information.
+ */
+ if (do_prune && pagefrz)
+ prune_fpi = XLogCheckBufferNeedsBackup(buffer);
+
+ /* Is the whole page freezable? And is there something to freeze */
+ whole_page_freezable = all_visible_except_removable &&
+ presult->all_frozen;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and prune
+ * records are combined, this heuristic couldn't be used anymore. The
+ * opportunistic freeze heuristic must be improved; however, for now, try
+ * to approximate it.
+ */
+ do_freeze = pagefrz &&
+ (pagefrz->freeze_required ||
+ (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
/* Have we found any prunable items? */
- if (prstate.nredirected > 0 || prstate.ndead > 0 || prstate.nunused > 0)
+ if (do_prune)
{
/*
* Apply the planned item changes, then repair page fragmentation, and
@@ -569,20 +610,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- if (pagefrz)
- do_freeze = pagefrz->freeze_required ||
- (all_visible_except_removable && presult->all_frozen &&
- presult->nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
- else
- do_freeze = false;
-
if (do_freeze)
{
TransactionId frz_conflict_horizon = InvalidTransactionId;
--
2.40.1
v7-0010-Separate-tuple-pre-freeze-checks-and-invoke-earli.patchtext/x-diff; charset=us-asciiDownload
From b6dc8c44859d83440cf572ebd3d4c0e0f47e99db Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 20:54:37 -0400
Subject: [PATCH v7 10/16] Separate tuple pre freeze checks and invoke earlier
When combining the prune and freeze records their critical sections will
have to be combined. heap_freeze_execute_prepared() does a set of pre
freeze validations before starting its critical section. Move these
validations into a helper function, heap_pre_freeze_checks(), and invoke
it in heap_page_prune() before the pruning critical section.
---
src/backend/access/heap/heapam.c | 58 ++++++++++++++++-------------
src/backend/access/heap/pruneheap.c | 41 +++++++++++---------
src/include/access/heapam.h | 3 ++
3 files changed, 59 insertions(+), 43 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e38c710c192..be48098f7f3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6657,35 +6657,19 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
- */
+* Perform xmin/xmax XID status sanity checks before calling
+* heap_freeze_execute_prepared().
+*
+* heap_prepare_freeze_tuple doesn't perform these checks directly because
+* pg_xact lookups are relatively expensive. They shouldn't be repeated
+* by successive VACUUMs that each decide against freezing the same page.
+*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- /*
- * Perform xmin/xmax XID status sanity checks before critical section.
- *
- * heap_prepare_freeze_tuple doesn't perform these checks directly because
- * pg_xact lookups are relatively expensive. They shouldn't be repeated
- * by successive VACUUMs that each decide against freezing the same page.
- */
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6724,6 +6708,30 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xmax)));
}
}
+}
+
+/*
+ * heap_freeze_execute_prepared
+ *
+ * Executes freezing of one or more heap tuples on a page on behalf of caller.
+ * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
+ * Caller must set 'offset' in each plan for us. Note that we destructively
+ * sort caller's tuples array in-place, so caller had better be done with it.
+ *
+ * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
+ * later on without any risk of unsafe pg_xact lookups, even following a hard
+ * crash (or when querying from a standby). We represent freezing by setting
+ * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
+ * See section on buffer access rules in src/backend/storage/buffer/README.
+ */
+void
+heap_freeze_execute_prepared(Relation rel, Buffer buffer,
+ TransactionId snapshotConflictHorizon,
+ HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
+
+ Assert(ntuples > 0);
START_CRIT_SECTION();
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d38de9b063d..fe463ad7146 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -245,6 +245,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PruneState prstate;
HeapTupleData tup;
TransactionId visibility_cutoff_xid;
+ TransactionId frz_conflict_horizon;
bool do_freeze;
bool all_visible_except_removable;
bool do_prune;
@@ -297,6 +298,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all-visible, so the conflict horizon remains InvalidTransactionId.
*/
presult->vm_conflict_horizon = visibility_cutoff_xid = InvalidTransactionId;
+ frz_conflict_horizon = InvalidTransactionId;
/* For advancing relfrozenxid and relminmxid */
presult->new_relfrozenxid = InvalidTransactionId;
@@ -541,6 +543,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
(pagefrz->freeze_required ||
(whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+ if (do_freeze)
+ {
+ heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
+
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for conflicts
+ * when the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin. This avoids false conflicts when
+ * hot_standby_feedback is in use.
+ */
+ if (all_visible_except_removable && presult->all_frozen)
+ frz_conflict_horizon = visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+ }
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -612,24 +635,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- TransactionId frz_conflict_horizon = InvalidTransactionId;
-
- /*
- * We can use the visibility_cutoff_xid as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin. This avoids false conflicts when
- * hot_standby_feedback is in use.
- */
- if (all_visible_except_removable && presult->all_frozen)
- frz_conflict_horizon = visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
-
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(relation, buffer,
frz_conflict_horizon,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 6f9c66a872b..dbf6323b5ff 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -340,6 +340,9 @@ extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
+
+extern void heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
TransactionId snapshotConflictHorizon,
HeapTupleFreeze *tuples, int ntuples);
--
2.40.1
v7-0011-Remove-heap_freeze_execute_prepared.patchtext/x-diff; charset=us-asciiDownload
From 5d0ff10bab645988f228050f3cd3084163bcc2e2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 09:10:14 -0400
Subject: [PATCH v7 11/16] Remove heap_freeze_execute_prepared()
In order to merge freeze and prune records, the execution of tuple
freezing and the WAL logging of the changes to the page must be
separated so that the WAL logging can be combined with prune WAL
logging. This commit makes a helper for the tuple freezing and then
inlines the contents of heap_freeze_execute_prepared() where it is
called in heap_page_prune().
---
src/backend/access/heap/heapam.c | 49 +++++++----------------------
src/backend/access/heap/pruneheap.c | 22 ++++++++++---
src/include/access/heapam.h | 28 +++++++++--------
3 files changed, 44 insertions(+), 55 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index be48098f7f3..1c1785994b1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6340,9 +6340,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* XIDs or MultiXactIds that will need to be processed by a future VACUUM.
*
* VACUUM caller must assemble HeapTupleFreeze freeze plan entries for every
- * tuple that we returned true for, and call heap_freeze_execute_prepared to
- * execute freezing. Caller must initialize pagefrz fields for page as a
- * whole before first call here for each heap page.
+ * tuple that we returned true for, and then execute freezing. Caller must
+ * initialize pagefrz fields for page as a whole before first call here for
+ * each heap page.
*
* VACUUM caller decides on whether or not to freeze the page as a whole.
* We'll often prepare freeze plans for a page that caller just discards.
@@ -6657,8 +6657,8 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
-* Perform xmin/xmax XID status sanity checks before calling
-* heap_freeze_execute_prepared().
+* Perform xmin/xmax XID status sanity checks before actually executing freeze
+* plans.
*
* heap_prepare_freeze_tuple doesn't perform these checks directly because
* pg_xact lookups are relatively expensive. They shouldn't be repeated
@@ -6711,30 +6711,17 @@ heap_pre_freeze_checks(Buffer buffer,
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
+ * Helper which executes freezing of one or more heap tuples on a page on
+ * behalf of caller. Caller passes an array of tuple plans from
+ * heap_prepare_freeze_tuple. Caller must set 'offset' in each plan for us.
+ * Must be called in a critical section that also marks the buffer dirty and,
+ * if needed, emits WAL.
*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- START_CRIT_SECTION();
-
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6746,20 +6733,6 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
}
MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(rel))
- {
- log_heap_prune_and_freeze(rel, buffer, snapshotConflictHorizon,
- false, /* no cleanup lock required */
- PRUNE_VACUUM_SCAN,
- tuples, ntuples,
- NULL, 0, /* redirected */
- NULL, 0, /* dead */
- NULL, 0); /* unused */
- }
-
- END_CRIT_SECTION();
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fe463ad7146..8914d4bf5c8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -635,10 +635,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(relation, buffer,
- frz_conflict_horizon,
- prstate.frozen, presult->nfrozen);
+ START_CRIT_SECTION();
+
+ Assert(presult->nfrozen > 0);
+
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+
+ MarkBufferDirty(buffer);
+
+ /* Now WAL-log freezing if necessary */
+ if (RelationNeedsWAL(relation))
+ log_heap_prune_and_freeze(relation, buffer,
+ frz_conflict_horizon, false, reason,
+ prstate.frozen, presult->nfrozen,
+ NULL, 0, /* redirected */
+ NULL, 0, /* dead */
+ NULL, 0); /* unused */
+
+ END_CRIT_SECTION();
}
else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
{
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index dbf6323b5ff..ac0ef6e4281 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -14,6 +14,7 @@
#ifndef HEAPAM_H
#define HEAPAM_H
+#include "access/heapam_xlog.h"
#include "access/relation.h" /* for backward compatibility */
#include "access/relscan.h"
#include "access/sdir.h"
@@ -101,8 +102,8 @@ typedef enum
} HTSV_Result;
/*
- * heap_prepare_freeze_tuple may request that heap_freeze_execute_prepared
- * check any tuple's to-be-frozen xmin and/or xmax status using pg_xact
+ * heap_prepare_freeze_tuple may request that any tuple's to-be-frozen xmin
+ * and/or xmax status is checked using pg_xact during freezing execution.
*/
#define HEAP_FREEZE_CHECK_XMIN_COMMITTED 0x01
#define HEAP_FREEZE_CHECK_XMAX_ABORTED 0x02
@@ -154,14 +155,14 @@ typedef struct HeapPageFreeze
/*
* "Freeze" NewRelfrozenXid/NewRelminMxid trackers.
*
- * Trackers used when heap_freeze_execute_prepared freezes, or when there
- * are zero freeze plans for a page. It is always valid for vacuumlazy.c
- * to freeze any page, by definition. This even includes pages that have
- * no tuples with storage to consider in the first place. That way the
- * 'totally_frozen' results from heap_prepare_freeze_tuple can always be
- * used in the same way, even when no freeze plans need to be executed to
- * "freeze the page". Only the "freeze" path needs to consider the need
- * to set pages all-frozen in the visibility map under this scheme.
+ * Trackers used when tuples will be frozen, or when there are zero freeze
+ * plans for a page. It is always valid for vacuumlazy.c to freeze any
+ * page, by definition. This even includes pages that have no tuples with
+ * storage to consider in the first place. That way the 'totally_frozen'
+ * results from heap_prepare_freeze_tuple can always be used in the same
+ * way, even when no freeze plans need to be executed to "freeze the
+ * page". Only the "freeze" path needs to consider the need to set pages
+ * all-frozen in the visibility map under this scheme.
*
* When we freeze a page, we generally freeze all XIDs < OldestXmin, only
* leaving behind XIDs that are ineligible for freezing, if any. And so
@@ -343,12 +344,13 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
extern void heap_pre_freeze_checks(Buffer buffer,
HeapTupleFreeze *tuples, int ntuples);
-extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples);
+
+extern void heap_freeze_prepared_tuples(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern bool heap_freeze_tuple(HeapTupleHeader tuple,
TransactionId relfrozenxid, TransactionId relminmxid,
TransactionId FreezeLimit, TransactionId MultiXactCutoff);
+
extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
TransactionId *NoFreezePageRelfrozenXid,
--
2.40.1
v7-0012-Merge-prune-and-freeze-records.patchtext/x-diff; charset=us-asciiDownload
From 0b8c552855594f7da4244362705681ccd48596c4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 09:37:46 -0400
Subject: [PATCH v7 12/16] Merge prune and freeze records
When both pruning and freezing is done, this means a single, combined
WAL record is emitted for both operations. This will reduce the number
of WAL records emitted.
When there are only tuples to freeze present, we can avoid taking a full
cleanup lock when replaying the record.
---
src/backend/access/heap/heapam.c | 2 -
src/backend/access/heap/pruneheap.c | 215 +++++++++++++++-------------
2 files changed, 114 insertions(+), 103 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1c1785994b1..5d8f183085d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6731,8 +6731,6 @@ heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
htup = (HeapTupleHeader) PageGetItem(page, itemid);
heap_execute_freeze_tuple(htup, frz);
}
-
- MarkBufferDirty(buffer);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8914d4bf5c8..db8a182a197 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -249,9 +249,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool all_visible_except_removable;
bool do_prune;
- bool whole_page_freezable;
+ bool do_hint;
bool hint_bit_fpi;
- bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
/*
@@ -464,10 +463,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
- * an FPI to be emitted. Then reset fpi_before for no prune case.
+ * an FPI to be emitted.
*/
hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
- fpi_before = pgWalUsage.wal_fpi;
/*
* For vacuum, if the whole page will become frozen, we consider
@@ -517,16 +515,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /* Record number of newly-set-LP_DEAD items for caller */
+ presult->nnewlpdead = prstate.ndead;
+
/*
- * Only incur overhead of checking if we will do an FPI if we might use
- * the information.
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
*/
- if (do_prune && pagefrz)
- prune_fpi = XLogCheckBufferNeedsBackup(buffer);
-
- /* Is the whole page freezable? And is there something to freeze */
- whole_page_freezable = all_visible_except_removable &&
- presult->all_frozen;
+ do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
/*
* Freeze the page when heap_prepare_freeze_tuple indicates that at least
@@ -539,46 +537,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* opportunistic freeze heuristic must be improved; however, for now, try
* to approximate it.
*/
- do_freeze = pagefrz &&
- (pagefrz->freeze_required ||
- (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
- if (do_freeze)
+ do_freeze = false;
+ if (pagefrz)
{
- heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
+ /* Is the whole page freezable? And is there something to freeze? */
+ bool whole_page_freezable = all_visible_except_removable &&
+ presult->all_frozen;
- /*
- * We can use the visibility_cutoff_xid as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin. This avoids false conflicts when
- * hot_standby_feedback is in use.
- */
- if (all_visible_except_removable && presult->all_frozen)
- frz_conflict_horizon = visibility_cutoff_xid;
- else
+ if (pagefrz->freeze_required)
+ do_freeze = true;
+ else if (whole_page_freezable && presult->nfrozen > 0)
{
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
+ /*
+ * Freezing would make the page all-frozen. In this case, we will
+ * freeze if we have already emitted an FPI or will do so anyway.
+ * Be sure only to incur the overhead of checking if we will do an
+ * FPI if we may use that information.
+ */
+ if (hint_bit_fpi ||
+ ((do_prune || do_hint) && XLogCheckBufferNeedsBackup(buffer)))
+ {
+ do_freeze = true;
+ }
}
}
- /* Any error while applying the changes is critical */
- START_CRIT_SECTION();
+ /*
+ * Validate the tuples we are considering freezing. We do this even if
+ * pruning and hint bit setting have not emitted an FPI so far because we
+ * still may emit an FPI while setting the page hint bit later. But we
+ * want to avoid doing the pre-freeze checks in a critical section.
+ */
+ if (do_freeze)
+ heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
- /* Have we found any prunable items? */
- if (do_prune)
+ if (!do_freeze && (!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
{
/*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all-frozen and there
+ * will be no newly frozen tuples.
*/
- heap_page_prune_execute(buffer, false,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+
+ /* Any error while applying the changes is critical */
+ START_CRIT_SECTION();
+ if (do_hint)
+ {
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
* XID of any soon-prunable tuple.
@@ -586,12 +595,52 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
/*
- * Also clear the "page is full" flag, since there's no point in
- * repeating the prune/defrag process until something else happens to
- * the page.
+ * Clear the "page is full" flag if it is set since there's no point
+ * in repeating the prune/defrag process until something else happens
+ * to the page.
*/
PageClearFull(page);
+ /*
+ * We only needed to update pd_prune_xid and clear the page-is-full
+ * hint bit, this is a non-WAL-logged hint. If we will also freeze or
+ * prune the page, we will mark the buffer dirty below.
+ */
+ if (!do_freeze && !do_prune)
+ MarkBufferDirtyHint(buffer, true);
+ }
+
+ if (do_prune || do_freeze)
+ {
+ /* Apply the planned item changes, then repair page fragmentation. */
+ if (do_prune)
+ {
+ heap_page_prune_execute(buffer, false,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for
+ * conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin. This
+ * avoids false conflicts when hot_standby_feedback is in use.
+ */
+ if (all_visible_except_removable && presult->all_frozen)
+ frz_conflict_horizon = visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -599,72 +648,35 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
+ /*
+ * The snapshotConflictHorizon for the whole record should be the
+ * most conservative of all the horizons calculated for any of the
+ * possible modifications. If this record will prune tuples, any
+ * transactions on the standby older than the youngest xmax of the
+ * most recently removed tuple this record will prune will
+ * conflict. If this record will freeze tuples, any transactions
+ * on the standby with xids older than the youngest tuple this
+ * record will freeze will conflict.
+ */
+ TransactionId conflict_xid;
+
+ if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
+ conflict_xid = frz_conflict_horizon;
+ else
+ conflict_xid = prstate.latest_xid_removed;
+
log_heap_prune_and_freeze(relation, buffer,
- prstate.latest_xid_removed,
+ conflict_xid,
true, reason,
- NULL, 0,
+ prstate.frozen, presult->nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
}
}
- else
- {
- /*
- * If we didn't prune anything, but have found a new value for the
- * pd_prune_xid field, update it and mark the buffer dirty. This is
- * treated as a non-WAL-logged hint.
- *
- * Also clear the "page is full" flag if it is set, since there's no
- * point in repeating the prune/defrag process until something else
- * happens to the page.
- */
- if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page))
- {
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
- PageClearFull(page);
- MarkBufferDirtyHint(buffer, true);
- }
- }
END_CRIT_SECTION();
- /* Record number of newly-set-LP_DEAD items for caller */
- presult->nnewlpdead = prstate.ndead;
-
- if (do_freeze)
- {
- START_CRIT_SECTION();
-
- Assert(presult->nfrozen > 0);
-
- heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
-
- MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(relation))
- log_heap_prune_and_freeze(relation, buffer,
- frz_conflict_horizon, false, reason,
- prstate.frozen, presult->nfrozen,
- NULL, 0, /* redirected */
- NULL, 0, /* dead */
- NULL, 0); /* unused */
-
- END_CRIT_SECTION();
- }
- else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
- {
- /*
- * If we will neither freeze tuples on the page nor set the page all
- * frozen in the visibility map, the page is not all frozen and there
- * will be no newly frozen tuples.
- */
- presult->all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumentation */
- }
-
/*
* For callers planning to update the visibility map, the conflict horizon
* for that record must be the newest xmin on the page. However, if the
@@ -681,9 +693,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* tuples on the page, if we will set the page all-frozen in the
* visibility map, we can advance relfrozenxid and relminmxid to the
* values in pagefrz->FreezePageRelfrozenXid and
- * pagefrz->FreezePageRelminMxid.
+ * pagefrz->FreezePageRelminMxid. MFIXME: which one should be pick if
+ * presult->nfrozen == 0 and presult->all_frozen = True.
*/
- if (presult->all_frozen || presult->nfrozen > 0)
+ if (presult->nfrozen > 0)
{
presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
--
2.40.1
v7-0013-Set-hastup-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 1841cfac42ee067b4fcb099e0cd680e5b3a80918 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 09:56:02 -0400
Subject: [PATCH v7 13/16] Set hastup in heap_page_prune
lazy_scan_prune() loops through the line pointers and tuple visibility
information for each tuple on a page, setting hastup to true if there
are any LP_REDIRECT line pointers or tuples with storage which will not
be removed. We want to remove this extra loop from lazy_scan_prune(),
and we know about non-removable tuples during heap_page_prune() anyway.
Set hastup when recording LP_REDIRECT line pointers in
heap_prune_chain() and when LP_NORMAL line pointers refer to tuples
whose visibility status is not HEAPTUPLE_DEAD.
---
src/backend/access/heap/pruneheap.c | 64 ++++++++++++++++++----------
src/backend/access/heap/vacuumlazy.c | 24 +----------
src/include/access/heapam.h | 3 ++
3 files changed, 46 insertions(+), 45 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index db8a182a197..f8966d06cd2 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -71,7 +71,8 @@ static int heap_prune_chain(Buffer buffer,
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum);
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneFreezeResult *presult);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
PruneFreezeResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
@@ -279,6 +280,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->nnewlpdead = 0;
presult->nfrozen = 0;
+ presult->hastup = false;
+
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
* freezing tuples. Keep track of whether or not the page is all_visible
@@ -434,30 +437,42 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
}
- /*
- * Consider freezing any normal tuples which will not be removed
- */
- if (presult->htsv[offnum] != HEAPTUPLE_DEAD && pagefrz)
+ if (presult->htsv[offnum] != HEAPTUPLE_DEAD)
{
- bool totally_frozen;
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the
+ * soft assumption that any LP_DEAD items encountered here will
+ * become LP_UNUSED later on, before count_nondeletable_pages is
+ * reached. If we don't make this assumption then rel truncation
+ * will only happen every other VACUUM, at most. Besides, VACUUM
+ * must treat hastup/nonempty_pages as provisional no matter how
+ * LP_DEAD items are handled (handled here, or handled later on).
+ */
+ presult->hastup = true;
- /* Tuple with storage -- consider need to freeze */
- if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &prstate.frozen[presult->nfrozen],
- &totally_frozen)))
+ /* Consider freezing any normal tuples which will not be removed */
+ if (pagefrz)
{
- /* Save prepared freeze plan for later */
- prstate.frozen[presult->nfrozen++].offset = offnum;
- }
+ bool totally_frozen;
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the
- * page definitely cannot be set all-frozen in the visibility map
- * later on
- */
- if (!totally_frozen)
- presult->all_frozen = false;
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, pagefrz,
+ &prstate.frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ prstate.frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or
+ * eligible to become totally frozen (according to its freeze
+ * plan), then the page definitely cannot be set all-frozen in
+ * the visibility map later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+ }
}
}
@@ -1023,7 +1038,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (i >= nchain)
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
- heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
+ heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], presult);
}
else if (nchain < 2 && ItemIdIsRedirected(rootlp))
{
@@ -1057,7 +1072,8 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
/* Record line pointer to be redirected */
static void
heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum)
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneFreezeResult *presult)
{
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
@@ -1067,6 +1083,8 @@ heap_prune_record_redirect(PruneState *prstate,
prstate->marked[offnum] = true;
Assert(!prstate->marked[rdoffnum]);
prstate->marked[rdoffnum] = true;
+
+ presult->hastup = true;
}
/* Record line pointer to be marked dead */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8beef4093ae..68258d083ab 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1420,7 +1420,6 @@ lazy_scan_prune(LVRelState *vacrel,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
- bool hastup = false;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1473,28 +1472,12 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- if (!ItemIdIsUsed(itemid))
- continue;
-
/* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid))
- {
- /* page makes rel truncation unsafe */
- hastup = true;
+ if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
continue;
- }
if (ItemIdIsDead(itemid))
{
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- */
deadoffsets[lpdead_items++] = offnum;
continue;
}
@@ -1562,9 +1545,6 @@ lazy_scan_prune(LVRelState *vacrel,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
-
- hastup = true; /* page makes rel truncation unsafe */
-
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1643,7 +1623,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->recently_dead_tuples += recently_dead_tuples;
/* Can't truncate this page */
- if (hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ac0ef6e4281..567faa34664 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -216,6 +216,9 @@ typedef struct PruneFreezeResult
/* Whether or not the page can be set all-frozen in the VM */
bool all_frozen;
+ /* Whether or not the page makes rel truncation unsafe */
+ bool hastup;
+
/*
* If the page is all-visible and not all-frozen this is the oldest xid
* that can see the page as all-visible. It is to be used as the snapshot
--
2.40.1
v7-0014-Count-tuples-for-vacuum-logging-in-heap_page_prun.patchtext/x-diff; charset=us-asciiDownload
From 183538e7189874394f6311984b8475b66bbca5ad Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 10:04:38 -0400
Subject: [PATCH v7 14/16] Count tuples for vacuum logging in heap_page_prune
lazy_scan_prune() loops through all of the tuple visibility information
that was recorded in heap_page_prune() and then counts live and recently
dead tuples. That information is available in heap_page_prune(), so just
record it there. Add live and recently dead tuple counters to the
PruneResult. Doing this counting in heap_page_prune() eliminates the
need for saving the tuple visibility status information in the
PruneResult. Instead, save it in the PruneState where it can be
referenced by heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 98 ++++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 93 +-------------------------
src/include/access/heapam.h | 36 ++--------
3 files changed, 97 insertions(+), 130 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f8966d06cd2..ee557c9ed35 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -55,6 +55,18 @@ typedef struct
*/
bool marked[MaxHeapTuplesPerPage + 1];
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
+
/*
* One entry for every tuple that we may freeze.
*/
@@ -69,6 +81,7 @@ static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
PruneState *prstate, PruneFreezeResult *presult);
+static inline HTSV_Result htsv_get_valid_status(int status);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
@@ -273,7 +286,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
memset(prstate.marked, 0, sizeof(prstate.marked));
/*
- * presult->htsv is not initialized here because all ntuple spots in the
+ * prstate.htsv is not initialized here because all ntuple spots in the
* array will be set either to a valid HTSV_Result value or -1.
*/
presult->ndeleted = 0;
@@ -282,6 +295,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = false;
+ presult->live_tuples = 0;
+ presult->recently_dead_tuples = 0;
+
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
* freezing tuples. Keep track of whether or not the page is all_visible
@@ -340,7 +356,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsNormal(itemid))
{
- presult->htsv[offnum] = -1;
+ prstate.htsv[offnum] = -1;
continue;
}
@@ -356,13 +372,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
- presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
if (reason == PRUNE_ON_ACCESS)
continue;
- switch (presult->htsv[offnum])
+ /*
+ * The criteria for counting a tuple as live in this block need to
+ * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
+ * and ANALYZE may produce wildly different reltuples values, e.g.
+ * when there are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as
+ * VACUUM can't run inside a transaction block, which makes some cases
+ * impossible (e.g. in-progress insert from the same transaction).
+ *
+ * We treat LP_DEAD items (which are the closest thing to DEAD tuples
+ * that might be seen here) differently, too: we assume that they'll
+ * become LP_UNUSED before VACUUM finishes. This difference is only
+ * superficial. VACUUM effectively agrees with ANALYZE about DEAD
+ * items, in the end. VACUUM won't remember LP_DEAD items, but only
+ * because they're not supposed to be left behind when it is done.
+ * (Cases where we bypass index vacuuming will violate this optimistic
+ * assumption, but the overall impact of that should be negligible.)
+ */
+ switch (prstate.htsv[offnum])
{
case HEAPTUPLE_DEAD:
@@ -382,6 +417,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
case HEAPTUPLE_LIVE:
+ /*
+ * Count it as live. Not only is this natural, but it's also
+ * what acquire_sample_rows() does.
+ */
+ presult->live_tuples++;
+
/*
* Is the tuple definitely visible to all transactions?
*
@@ -423,13 +464,34 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
+
+ /*
+ * If tuple is recently dead then we must not remove it from
+ * the relation. (We only remove items that are LP_DEAD from
+ * pruning.)
+ */
+ presult->recently_dead_tuples++;
all_visible_except_removable = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and
+ * we assume that will happen only after we report our
+ * results. This assumption is a bit shaky, but it is what
+ * acquire_sample_rows() does, so be consistent.
+ */
all_visible_except_removable = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
+
+ /*
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
+ */
+ presult->live_tuples++;
all_visible_except_removable = false;
break;
default:
@@ -437,7 +499,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
}
- if (presult->htsv[offnum] != HEAPTUPLE_DEAD)
+ if (prstate.htsv[offnum] != HEAPTUPLE_DEAD)
{
/*
* Deliberately don't set hastup for LP_DEAD items. We make the
@@ -746,10 +808,24 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
}
+/*
+ * Pruning calculates tuple visibility once and saves the results in an array
+ * of int8. See PruneState.htsv for details. This helper function is meant to
+ * guard against examining visibility status array members which have not yet
+ * been computed.
+ */
+static inline HTSV_Result
+htsv_get_valid_status(int status)
+{
+ Assert(status >= HEAPTUPLE_DEAD &&
+ status <= HEAPTUPLE_DELETE_IN_PROGRESS);
+ return (HTSV_Result) status;
+}
+
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in presult->htsv.
+ * Tuple visibility information is provided in prstate->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -800,7 +876,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(presult->htsv[rootoffnum] != -1);
+ Assert(prstate->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -823,7 +899,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -924,7 +1000,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(presult->htsv[offnum]))
+ switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 68258d083ab..c28e786a1e0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1378,22 +1378,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where
- * heap_page_prune_and_freeze() was allowed to disagree with our
- * HeapTupleSatisfiesVacuum() call about whether or not a tuple should be
- * considered DEAD. This happened when an inserting transaction concurrently
- * aborted (after our heap_page_prune_and_freeze() call, before our
- * HeapTupleSatisfiesVacuum() call). There was rather a lot of complexity just
- * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
- * left with storage after pruning.
- *
- * As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune_and_freeze()'s visibility check. Without the
- * second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
- * there can be no disagreement. We'll just handle such tuples as if they had
- * become fully dead right after this operation completes instead of in the
- * middle of it.
- *
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
@@ -1415,10 +1399,8 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
+ int lpdead_items = 0;
PruneFreezeResult presult;
- int lpdead_items,
- live_tuples,
- recently_dead_tuples;
HeapPageFreeze pagefrz;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1438,9 +1420,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- lpdead_items = 0;
- live_tuples = 0;
- recently_dead_tuples = 0;
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1472,9 +1451,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- /* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
- continue;
if (ItemIdIsDead(itemid))
{
@@ -1482,69 +1458,6 @@ lazy_scan_prune(LVRelState *vacrel,
continue;
}
- Assert(ItemIdIsNormal(itemid));
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (htsv_get_valid_status(presult.htsv[offnum]))
- {
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- live_tuples++;
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- recently_dead_tuples++;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This an expected case during concurrent vacuum. Count such
- * rows as live. As above, we assume the deleting transaction
- * will commit and update the counters after we report.
- */
- live_tuples++;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1619,8 +1532,8 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
+ vacrel->live_tuples += presult.live_tuples;
+ vacrel->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 567faa34664..06d75f2ad04 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -203,8 +203,14 @@ typedef struct PruneFreezeResult
/*
* The rest of the fields in PruneFreezeResult are only guaranteed to be
- * initialized if heap_page_prune is passed PruneReason VACUUM_SCAN.
+ * initialized if heap_page_prune_and_freeze() is passed a PruneReason
+ * other than PRUNE_ON_ACCESS.
*/
+ int live_tuples;
+ int recently_dead_tuples;
+
+ /* Number of tuples we froze */
+ int nfrozen;
/*
* Whether or not the page is truly all-visible after pruning. If there
@@ -226,21 +232,6 @@ typedef struct PruneFreezeResult
*/
TransactionId vm_conflict_horizon;
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
- * details. This is of type int8[], instead of HTSV_Result[], so we can
- * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
- * items.
- *
- * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
- * 1. Otherwise every access would need to subtract 1.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
-
- /* Number of tuples we may freeze */
- int nfrozen;
-
/*
* One entry for every tuple that we may freeze.
*/
@@ -260,19 +251,6 @@ typedef enum
PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
} PruneReason;
-/*
- * Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneFreezeResult.htsv for details. This helper function is meant to
- * guard against examining visibility status array members which have not yet
- * been computed.
- */
-static inline HTSV_Result
-htsv_get_valid_status(int status)
-{
- Assert(status >= HEAPTUPLE_DEAD &&
- status <= HEAPTUPLE_DELETE_IN_PROGRESS);
- return (HTSV_Result) status;
-}
/* ----------------
* function prototypes for heap access method
--
2.40.1
v7-0015-Save-dead-tuple-offsets-during-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 3d68b53359f06d627d8751b08ad913caa571b61e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 10:16:11 -0400
Subject: [PATCH v7 15/16] Save dead tuple offsets during heap_page_prune
After heap_page_prune() returned, lazy_scan_prune() looped through all
of the offsets of LP_DEAD items which it later added to
LVRelState->dead_items. Instead take care of this when marking a line
pointer or when an existing non-removable LP_DEAD item is encountered in
heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 7 ++++
src/backend/access/heap/vacuumlazy.c | 60 +++++++---------------------
src/include/access/heapam.h | 2 +
3 files changed, 23 insertions(+), 46 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ee557c9ed35..6d5f8ba4417 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,6 +297,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->live_tuples = 0;
presult->recently_dead_tuples = 0;
+ presult->lpdead_items = 0;
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
@@ -975,7 +976,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
+ {
presult->all_visible = false;
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
+ }
break;
}
@@ -1179,6 +1183,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
* all_visible.
*/
presult->all_visible = false;
+
+ /* Record the dead offset for vacuum */
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c28e786a1e0..0fb5a7dd24d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1396,23 +1396,11 @@ lazy_scan_prune(LVRelState *vacrel,
bool *has_lpdead_items)
{
Relation rel = vacrel->rel;
- OffsetNumber offnum,
- maxoff;
- ItemId itemid;
- int lpdead_items = 0;
PruneFreezeResult presult;
HeapPageFreeze pagefrz;
- OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
- /*
- * maxoff might be reduced following line pointer array truncation in
- * heap_page_prune_and_freeze(). That's safe for us to ignore, since the
- * reclaimed space will continue to look like LP_UNUSED items below.
- */
- maxoff = PageGetMaxOffsetNumber(page);
-
/* Initialize pagefrz */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
@@ -1425,41 +1413,21 @@ lazy_scan_prune(LVRelState *vacrel,
* Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
- * in presult.ndeleted. It should not be confused with lpdead_items;
- * lpdead_items's final value can be thought of as the number of tuples
- * that were deleted from indexes.
+ * in presult.ndeleted. It should not be confused with
+ * presult.lpdead_items; presult.lpdead_items's final value can be thought
+ * of as the number of tuples that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
* false otherwise.
+ *
+ * We will update the VM after collecting LP_DEAD items and freezing
+ * tuples. Pruning will have determined whether or not the page is
+ * all-visible.
*/
heap_page_prune_and_freeze(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
&pagefrz, &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
- /*
- * Now scan the page to collect LP_DEAD items and update the variables set
- * just above.
- */
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
- {
- /*
- * Set the offset number so that we can display it along with any
- * error that occurred while processing this tuple.
- */
- vacrel->offnum = offnum;
- itemid = PageGetItemId(page, offnum);
-
-
- if (ItemIdIsDead(itemid))
- {
- deadoffsets[lpdead_items++] = offnum;
- continue;
- }
-
- }
-
vacrel->offnum = InvalidOffsetNumber;
Assert(MultiXactIdIsValid(presult.new_relminmxid));
@@ -1492,7 +1460,7 @@ lazy_scan_prune(LVRelState *vacrel,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(lpdead_items == 0);
+ Assert(presult.lpdead_items == 0);
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
@@ -1508,7 +1476,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
- if (lpdead_items > 0)
+ if (presult.lpdead_items > 0)
{
VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
@@ -1517,9 +1485,9 @@ lazy_scan_prune(LVRelState *vacrel,
ItemPointerSetBlockNumber(&tmp, blkno);
- for (int i = 0; i < lpdead_items; i++)
+ for (int i = 0; i < presult.lpdead_items; i++)
{
- ItemPointerSetOffsetNumber(&tmp, deadoffsets[i]);
+ ItemPointerSetOffsetNumber(&tmp, presult.deadoffsets[i]);
dead_items->items[dead_items->num_items++] = tmp;
}
@@ -1531,7 +1499,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += lpdead_items;
+ vacrel->lpdead_items += presult.lpdead_items;
vacrel->live_tuples += presult.live_tuples;
vacrel->recently_dead_tuples += presult.recently_dead_tuples;
@@ -1540,7 +1508,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
- *has_lpdead_items = (lpdead_items > 0);
+ *has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
@@ -1608,7 +1576,7 @@ lazy_scan_prune(LVRelState *vacrel,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (lpdead_items > 0 && PageIsAllVisible(page))
+ else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 06d75f2ad04..2740eaac13e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -241,6 +241,8 @@ typedef struct PruneFreezeResult
/* New value of relminmxid found by heap_page_prune_and_freeze() */
MultiXactId new_relminmxid;
+ int lpdead_items; /* includes existing LP_DEAD items */
+ OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
/* 'reason' codes for heap_page_prune_and_freeze() */
--
2.40.1
v7-0016-move-live-tuple-accounting-to-heap_prune_chain.patchtext/x-diff; charset=us-asciiDownload
From a5cb8877b001e9ad5e46ba565778f41bfa47ffec Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 13:54:19 -0400
Subject: [PATCH v7 16/16] move live tuple accounting to heap_prune_chain()
ci-os-only:
---
src/backend/access/heap/pruneheap.c | 636 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 38 +-
src/include/access/heapam.h | 59 ++-
3 files changed, 424 insertions(+), 309 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6d5f8ba4417..744f3b5fabd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -34,8 +34,9 @@ typedef struct
{
/* tuple visibility test, initialized for the relation */
GlobalVisState *vistest;
- /* whether or not dead items can be set LP_UNUSED during pruning */
- bool mark_unused_now;
+ uint8 actions;
+ TransactionId visibility_cutoff_xid;
+ bool all_visible_except_removable;
TransactionId new_prune_xid; /* new prune hint value for page */
TransactionId latest_xid_removed;
@@ -67,10 +68,14 @@ typedef struct
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+ HeapPageFreeze pagefrz;
+
/*
- * One entry for every tuple that we may freeze.
+ * Whether or not this tuple has been counted toward vacuum stats. In
+ * heap_prune_chain(), we have to be sure that Heap Only Tuples that are
+ * not part of any chain are counted correctly.
*/
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+ bool counted[MaxHeapTuplesPerPage + 1];
} PruneState;
/* Local functions */
@@ -83,7 +88,7 @@ static int heap_prune_chain(Buffer buffer,
static inline HTSV_Result htsv_get_valid_status(int status);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
-static void heap_prune_record_redirect(PruneState *prstate,
+static void heap_prune_record_redirect(Page page, PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
PruneFreezeResult *presult);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
@@ -91,6 +96,9 @@ static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
PruneFreezeResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
+
+static void heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate,
+ OffsetNumber offnum, PruneFreezeResult *presult);
static void page_verify_redirects(Page page);
@@ -172,12 +180,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeResult presult;
/*
- * For now, pass mark_unused_now as false regardless of whether or
- * not the relation has indexes, since we cannot safely determine
- * that during on-access pruning with the current implementation.
+ * For now, do not set PRUNE_DO_MARK_UNUSED_NOW regardless of
+ * whether or not the relation has indexes, since we cannot safely
+ * determine that during on-access pruning with the current
+ * implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, false, NULL,
- &presult, PRUNE_ON_ACCESS, NULL);
+ heap_page_prune_and_freeze(relation, buffer, 0, vistest,
+ NULL, &presult, PRUNE_ON_ACCESS, NULL, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -209,7 +218,6 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
-
/*
* Prune and repair fragmentation and potentially freeze tuples on the
* specified page.
@@ -223,16 +231,12 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * actions are the pruning actions that heap_page_prune_and_freeze() should
+ * take.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
- * mark_unused_now indicates whether or not dead items can be set LP_UNUSED
- * during pruning.
- *
- * pagefrz is an input parameter containing visibility cutoff information and
- * the current relfrozenxid and relminmxids used if the caller is interested in
- * freezing tuples on the page.
- *
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
* heap_page_prune_and_freeze() is responsible for initializing it.
@@ -242,15 +246,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*
* off_loc is the offset location required by the caller to use in error
* callback.
+ *
+ * new_relfrozen_xid and new_relmin_xid are provided by the caller if they
+ * would like the current values of those updated as part of advancing
+ * relfrozenxid/relminmxid.
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ uint8 actions,
GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
+ struct VacuumCutoffs *cutoffs,
PruneFreezeResult *presult,
PruneReason reason,
- OffsetNumber *off_loc)
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -258,15 +268,43 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
- TransactionId visibility_cutoff_xid;
TransactionId frz_conflict_horizon;
bool do_freeze;
- bool all_visible_except_removable;
bool do_prune;
bool do_hint;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ /*
+ * pagefrz contains visibility cutoff information and the current
+ * relfrozenxid and relminmxids used if the caller is interested in
+ * freezing tuples on the page.
+ */
+ prstate.pagefrz.cutoffs = cutoffs;
+ prstate.pagefrz.freeze_required = false;
+
+ if (new_relmin_mxid)
+ {
+ prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
+ prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+ }
+ else
+ {
+ prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ }
+
+ if (new_relfrozen_xid)
+ {
+ prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
+ }
+ else
+ {
+ prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ }
+
/*
* Our strategy is to scan the page and make lists of items to change,
* then apply the changes within a critical section. This keeps as much
@@ -280,10 +318,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
prstate.new_prune_xid = InvalidTransactionId;
prstate.vistest = vistest;
- prstate.mark_unused_now = mark_unused_now;
+ prstate.actions = actions;
prstate.latest_xid_removed = InvalidTransactionId;
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
+ memset(prstate.counted, 0, sizeof(prstate.counted));
/*
* prstate.htsv is not initialized here because all ntuple spots in the
@@ -291,7 +330,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
presult->ndeleted = 0;
presult->nnewlpdead = 0;
- presult->nfrozen = 0;
presult->hastup = false;
@@ -300,13 +338,45 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = 0;
/*
- * Caller will update the VM after pruning, collecting LP_DEAD items, and
+ * Caller may update the VM after pruning, collecting LP_DEAD items, and
* freezing tuples. Keep track of whether or not the page is all_visible
* and all_frozen and use this information to update the VM. all_visible
* implies lpdead_items == 0, but don't trust all_frozen result unless
- * all_visible is also set to true.
+ * all_visible is also set to true. If we won't even try freezing,
+ * initialize all_frozen to false.
+ *
+ * For vacuum, if the whole page will become frozen, we consider
+ * opportunistically freezing tuples. Dead tuples which will be removed by
+ * the end of vacuuming should not preclude us from opportunistically
+ * freezing. We will not be able to freeze the whole page if there are
+ * tuples present which are not visible to everyone or if there are dead
+ * tuples which are not yet removable. We need all_visible to be false if
+ * LP_DEAD tuples remain after pruning so that we do not incorrectly
+ * update the visibility map or page hint bit. So, we will update
+ * presult->all_visible to reflect the presence of LP_DEAD items while
+ * pruning and keep all_visible_except_removable to permit freezing if the
+ * whole page will eventually become all visible after removing tuples.
*/
- presult->all_frozen = true;
+ presult->all_visible = true;
+
+ if (prstate.actions & PRUNE_DO_TRY_FREEZE)
+ presult->set_all_frozen = true;
+ else
+ presult->set_all_frozen = false;
+ presult->nfrozen = 0;
+
+ /*
+ * Deliberately delay unsetting all_visible until later during pruning.
+ * Removable dead tuples shouldn't preclude freezing the page. After
+ * finishing this first pass of tuple visibility checks, initialize
+ * all_visible_except_removable with the current value of all_visible to
+ * indicate whether or not the page is all visible except for dead tuples.
+ * This will allow us to attempt to freeze the page after pruning. Later
+ * during pruning, if we encounter an LP_DEAD item or are setting an item
+ * LP_DEAD, we will unset all_visible. As long as we unset it before
+ * updating the visibility map, this will be correct.
+ */
+ prstate.all_visible_except_removable = true;
/*
* The visibility cutoff xid is the newest xmin of live tuples on the
@@ -316,13 +386,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* running transaction on the standby does not see tuples on the page as
* all-visible, so the conflict horizon remains InvalidTransactionId.
*/
- presult->vm_conflict_horizon = visibility_cutoff_xid = InvalidTransactionId;
+ presult->vm_conflict_horizon = prstate.visibility_cutoff_xid = InvalidTransactionId;
frz_conflict_horizon = InvalidTransactionId;
- /* For advancing relfrozenxid and relminmxid */
- presult->new_relfrozenxid = InvalidTransactionId;
- presult->new_relminmxid = InvalidMultiXactId;
-
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -346,7 +412,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* prefetching efficiency significantly / decreases the number of cache
* misses.
*/
- all_visible_except_removable = true;
for (offnum = maxoff;
offnum >= FirstOffsetNumber;
offnum = OffsetNumberPrev(offnum))
@@ -375,168 +440,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
-
- if (reason == PRUNE_ON_ACCESS)
- continue;
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (prstate.htsv[offnum])
- {
- case HEAPTUPLE_DEAD:
-
- /*
- * Deliberately delay unsetting all_visible until later during
- * pruning. Removable dead tuples shouldn't preclude freezing
- * the page. After finishing this first pass of tuple
- * visibility checks, initialize all_visible_except_removable
- * with the current value of all_visible to indicate whether
- * or not the page is all visible except for dead tuples. This
- * will allow us to attempt to freeze the page after pruning.
- * Later during pruning, if we encounter an LP_DEAD item or
- * are setting an item LP_DEAD, we will unset all_visible. As
- * long as we unset it before updating the visibility map,
- * this will be correct.
- */
- break;
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- presult->live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible_except_removable)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible_except_removable = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A
- * FrozenTransactionId is seen as committed to everyone.
- * Otherwise, we check if there is a snapshot that
- * considers this xid to still be running, and if so, we
- * don't consider the page all-visible.
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (xmin != FrozenTransactionId &&
- !GlobalVisTestIsRemovableXid(vistest, xmin))
- {
- all_visible_except_removable = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- presult->recently_dead_tuples++;
- all_visible_except_removable = false;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- all_visible_except_removable = false;
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This an expected case during concurrent vacuum. Count such
- * rows as live. As above, we assume the deleting transaction
- * will commit and update the counters after we report.
- */
- presult->live_tuples++;
- all_visible_except_removable = false;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
-
- if (prstate.htsv[offnum] != HEAPTUPLE_DEAD)
- {
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- */
- presult->hastup = true;
-
- /* Consider freezing any normal tuples which will not be removed */
- if (pagefrz)
- {
- bool totally_frozen;
-
- /* Tuple with storage -- consider need to freeze */
- if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &prstate.frozen[presult->nfrozen],
- &totally_frozen)))
- {
- /* Save prepared freeze plan for later */
- prstate.frozen[presult->nfrozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or
- * eligible to become totally frozen (according to its freeze
- * plan), then the page definitely cannot be set all-frozen in
- * the visibility map later on
- */
- if (!totally_frozen)
- presult->all_frozen = false;
- }
- }
}
/*
@@ -545,21 +448,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
- /*
- * For vacuum, if the whole page will become frozen, we consider
- * opportunistically freezing tuples. Dead tuples which will be removed by
- * the end of vacuuming should not preclude us from opportunistically
- * freezing. We will not be able to freeze the whole page if there are
- * tuples present which are not visible to everyone or if there are dead
- * tuples which are not yet removable. We need all_visible to be false if
- * LP_DEAD tuples remain after pruning so that we do not incorrectly
- * update the visibility map or page hint bit. So, we will update
- * presult->all_visible to reflect the presence of LP_DEAD items while
- * pruning and keep all_visible_except_removable to permit freezing if the
- * whole page will eventually become all visible after removing tuples.
- */
- presult->all_visible = all_visible_except_removable;
-
/* Scan the page */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -615,15 +503,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* opportunistic freeze heuristic must be improved; however, for now, try
* to approximate it.
*/
-
do_freeze = false;
- if (pagefrz)
+ if (prstate.actions & PRUNE_DO_TRY_FREEZE)
{
/* Is the whole page freezable? And is there something to freeze? */
- bool whole_page_freezable = all_visible_except_removable &&
- presult->all_frozen;
+ bool whole_page_freezable = prstate.all_visible_except_removable &&
+ presult->set_all_frozen;
- if (pagefrz->freeze_required)
+ if (prstate.pagefrz.freeze_required)
do_freeze = true;
else if (whole_page_freezable && presult->nfrozen > 0)
{
@@ -648,17 +535,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* want to avoid doing the pre-freeze checks in a critical section.
*/
if (do_freeze)
- heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
-
- if (!do_freeze && (!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
+ heap_pre_freeze_checks(buffer, prstate.pagefrz.frozen, presult->nfrozen);
+ else if (!presult->set_all_frozen || presult->nfrozen > 0)
{
/*
* If we will neither freeze tuples on the page nor set the page all
* frozen in the visibility map, the page is not all-frozen and there
* will be no newly frozen tuples.
*/
- presult->all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ presult->set_all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumenation */
}
/* Any error while applying the changes is critical */
@@ -708,15 +594,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* conservative cutoff by stepping back from OldestXmin. This
* avoids false conflicts when hot_standby_feedback is in use.
*/
- if (all_visible_except_removable && presult->all_frozen)
- frz_conflict_horizon = visibility_cutoff_xid;
+ if (prstate.all_visible_except_removable && presult->set_all_frozen)
+ frz_conflict_horizon = prstate.visibility_cutoff_xid;
else
{
/* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ frz_conflict_horizon = prstate.pagefrz.cutoffs->OldestXmin;
TransactionIdRetreat(frz_conflict_horizon);
}
- heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+ heap_freeze_prepared_tuples(buffer, prstate.pagefrz.frozen, presult->nfrozen);
}
MarkBufferDirty(buffer);
@@ -746,7 +632,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
log_heap_prune_and_freeze(relation, buffer,
conflict_xid,
true, reason,
- prstate.frozen, presult->nfrozen,
+ prstate.pagefrz.frozen, presult->nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
@@ -761,29 +647,31 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* page is completely frozen, there can be no conflict and the
* vm_conflict_horizon should remain InvalidTransactionId.
*/
- if (!presult->all_frozen)
- presult->vm_conflict_horizon = visibility_cutoff_xid;
+ if (!presult->set_all_frozen)
+ presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * If we will freeze tuples on the page or, even if we don't freeze tuples
+ * on the page, if we will set the page all-frozen in the visibility map,
+ * we can advance relfrozenxid and relminmxid to the values in
+ * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
+ * MFIXME: which one should be pick if presult->nfrozen == 0 and
+ * presult->all_frozen = True.
+ */
+ if (new_relfrozen_xid)
+ {
+ if (presult->nfrozen > 0)
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ else
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ }
- if (pagefrz)
+ if (new_relmin_mxid)
{
- /*
- * If we will freeze tuples on the page or, even if we don't freeze
- * tuples on the page, if we will set the page all-frozen in the
- * visibility map, we can advance relfrozenxid and relminmxid to the
- * values in pagefrz->FreezePageRelfrozenXid and
- * pagefrz->FreezePageRelminMxid. MFIXME: which one should be pick if
- * presult->nfrozen == 0 and presult->all_frozen = True.
- */
if (presult->nfrozen > 0)
- {
- presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
- presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
- }
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
else
- {
- presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
- presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
- }
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
@@ -900,13 +788,32 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
- !HeapTupleHeaderIsHotUpdated(htup))
+ if (!HeapTupleHeaderIsHotUpdated(htup))
{
- heap_prune_record_unused(prstate, rootoffnum);
- HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->latest_xid_removed);
- ndeleted++;
+ if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD)
+ {
+ heap_prune_record_unused(prstate, rootoffnum);
+ HeapTupleHeaderAdvanceConflictHorizon(htup,
+ &prstate->latest_xid_removed);
+ ndeleted++;
+ }
+ else
+ {
+ Assert(!prstate->marked[rootoffnum]);
+
+ /*
+ * MFIXME: not sure if this is right -- maybe counting too
+ * many
+ */
+
+ /*
+ * Ensure that this tuple is counted. If it is later
+ * redirected to, it would have been counted then, but we
+ * won't double count because we check if it has already
+ * been counted first.
+ */
+ heap_prune_record_live_or_recently_dead(dp, prstate, rootoffnum, presult);
+ }
}
/* Nothing more to do */
@@ -967,13 +874,13 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (ItemIdIsDead(lp))
{
/*
- * If the caller set mark_unused_now true, we can set dead line
- * pointers LP_UNUSED now. We don't increment ndeleted here since
- * the LP was already marked dead. If it will not be marked
+ * If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can set dead
+ * line pointers LP_UNUSED now. We don't increment ndeleted here
+ * since the LP was already marked dead. If it will not be marked
* LP_UNUSED, it will remain LP_DEAD, making the page not
* all_visible.
*/
- if (unlikely(prstate->mark_unused_now))
+ if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
heap_prune_record_unused(prstate, offnum);
else
{
@@ -1118,7 +1025,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (i >= nchain)
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
- heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], presult);
+ heap_prune_record_redirect(dp, prstate, rootoffnum, chainitems[i], presult);
}
else if (nchain < 2 && ItemIdIsRedirected(rootlp))
{
@@ -1132,6 +1039,14 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
+ /*
+ * If not marked for pruning, consider if the tuple should be counted as
+ * live or recently dead. Note that line pointers redirected to will
+ * already have been counted.
+ */
+ if (ItemIdIsNormal(rootlp) && !prstate->marked[rootoffnum])
+ heap_prune_record_live_or_recently_dead(dp, prstate, rootoffnum, presult);
+
return ndeleted;
}
@@ -1151,13 +1066,15 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
/* Record line pointer to be redirected */
static void
-heap_prune_record_redirect(PruneState *prstate,
+heap_prune_record_redirect(Page page, PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
PruneFreezeResult *presult)
{
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
prstate->redirected[prstate->nredirected * 2 + 1] = rdoffnum;
+ heap_prune_record_live_or_recently_dead(page, prstate, rdoffnum, presult);
+
prstate->nredirected++;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
@@ -1189,22 +1106,22 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
}
/*
- * Depending on whether or not the caller set mark_unused_now to true, record that a
- * line pointer should be marked LP_DEAD or LP_UNUSED. There are other cases in
- * which we will mark line pointers LP_UNUSED, but we will not mark line
- * pointers LP_DEAD if mark_unused_now is true.
+ * Depending on whether or not the caller set PRUNE_DO_MARK_UNUSED_NOW, record
+ * that a line pointer should be marked LP_DEAD or LP_UNUSED. There are other
+ * cases in which we will mark line pointers LP_UNUSED, but we will not mark
+ * line pointers LP_DEAD if PRUNE_DO_MARK_UNUSED_NOW is set.
*/
static void
heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
PruneFreezeResult *presult)
{
/*
- * If the caller set mark_unused_now to true, we can remove dead tuples
+ * If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can remove dead tuples
* during pruning instead of marking their line pointers dead. Set this
* tuple's line pointer LP_UNUSED. We hint that this option is less
* likely.
*/
- if (unlikely(prstate->mark_unused_now))
+ if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
heap_prune_record_unused(prstate, offnum);
else
heap_prune_record_dead(prstate, offnum, presult);
@@ -1221,6 +1138,187 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
prstate->marked[offnum] = true;
}
+static void
+heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNumber offnum,
+ PruneFreezeResult *presult)
+{
+ HTSV_Result status;
+ HeapTupleHeader htup;
+ bool totally_frozen;
+
+ /* This could happen for items which are redirected to. */
+ if (prstate->counted[offnum])
+ return;
+
+ prstate->counted[offnum] = true;
+
+ /*
+ * If we don't want to do any of the special defined actions, we don't
+ * need to continue.
+ */
+ if (prstate->actions == 0)
+ return;
+
+ status = htsv_get_valid_status(prstate->htsv[offnum]);
+
+ Assert(status != HEAPTUPLE_DEAD);
+
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the soft
+ * assumption that any LP_DEAD items encountered here will become
+ * LP_UNUSED later on, before count_nondeletable_pages is reached. If we
+ * don't make this assumption then rel truncation will only happen every
+ * other VACUUM, at most. Besides, VACUUM must treat
+ * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
+ * handled (handled here, or handled later on).
+ */
+ presult->hastup = true;
+
+ /*
+ * The criteria for counting a tuple as live in this block need to match
+ * what analyze.c's acquire_sample_rows() does, otherwise VACUUM and
+ * ANALYZE may produce wildly different reltuples values, e.g. when there
+ * are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as VACUUM
+ * can't run inside a transaction block, which makes some cases impossible
+ * (e.g. in-progress insert from the same transaction).
+ *
+ * We treat LP_DEAD items (which are the closest thing to DEAD tuples that
+ * might be seen here) differently, too: we assume that they'll become
+ * LP_UNUSED before VACUUM finishes. This difference is only superficial.
+ * VACUUM effectively agrees with ANALYZE about DEAD items, in the end.
+ * VACUUM won't remember LP_DEAD items, but only because they're not
+ * supposed to be left behind when it is done. (Cases where we bypass
+ * index vacuuming will violate this optimistic assumption, but the
+ * overall impact of that should be negligible.)
+ *
+ * HEAPTUPLE_LIVE tuples are naturally counted as live. This is also what
+ * acquire_sample_rows() does.
+ *
+ * HEAPTUPLE_DELETE_IN_PROGRESS tuples are expected during concurrent
+ * vacuum. We expect the deleting transaction to update the counters at
+ * commit after we report our results, so count these tuples as live to
+ * ensure the math works out. The assumption that the transaction will
+ * commit and update the counters after we report is a bit shaky; but it
+ * is what acquire_sample_rows() does, so we do the same to be consistent.
+ */
+ htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+
+ switch (status)
+ {
+ case HEAPTUPLE_LIVE:
+
+ /*
+ * Count it as live. Not only is this natural, but it's also what
+ * acquire_sample_rows() does.
+ */
+ presult->live_tuples++;
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed asynchronously.
+ * See SetHintBits for more info. Check that the tuple is hinted
+ * xmin-committed because of that.
+ */
+ if (prstate->all_visible_except_removable)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed? A FrozenTransactionId
+ * is seen as committed to everyone. Otherwise, we check if
+ * there is a snapshot that considers this xid to still be
+ * running, and if so, we don't consider the page all-visible.
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+
+ /* For now always use pagefrz->cutoffs */
+ Assert(prstate->pagefrz.cutoffs);
+ if (!TransactionIdPrecedes(xmin, prstate->pagefrz.cutoffs->OldestXmin))
+ {
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
+ TransactionIdIsNormal(xmin))
+ prstate->visibility_cutoff_xid = xmin;
+ }
+ break;
+ case HEAPTUPLE_RECENTLY_DEAD:
+
+ /*
+ * If tuple is recently dead then we must not remove it from the
+ * relation. (We only remove items that are LP_DEAD from
+ * pruning.)
+ */
+ presult->recently_dead_tuples++;
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and we
+ * assume that will happen only after we report our results. This
+ * assumption is a bit shaky, but it is what acquire_sample_rows()
+ * does, so be consistent.
+ */
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+ /*
+ * This an expected case during concurrent vacuum. Count such rows
+ * as live. As above, we assume the deleting transaction will
+ * commit and update the counters after we report.
+ */
+ presult->live_tuples++;
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ default:
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+ break;
+ }
+
+ /* Consider freezing any normal tuples which will not be removed */
+ if (prstate->actions & PRUNE_DO_TRY_FREEZE)
+ {
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, &prstate->pagefrz,
+ &prstate->pagefrz.frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ prstate->pagefrz.frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to
+ * become totally frozen (according to its freeze plan), then the page
+ * definitely cannot be set all-frozen in the visibility map later on
+ */
+ if (!totally_frozen)
+ presult->set_all_frozen = false;
+ }
+
+}
/*
* Perform the actual page changes needed by heap_page_prune.
@@ -1354,12 +1452,12 @@ heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
else
{
/*
- * When heap_page_prune() was called, mark_unused_now may have
- * been passed as true, which allows would-be LP_DEAD items to be
- * made LP_UNUSED instead. This is only possible if the relation
- * has no indexes. If there are any dead items, then
- * mark_unused_now was not true and every item being marked
- * LP_UNUSED must refer to a heap-only tuple.
+ * When heap_page_prune() was called, PRUNE_DO_MARK_UNUSED_NOW may
+ * have been set, which allows would-be LP_DEAD items to be made
+ * LP_UNUSED instead. This is only possible if the relation has
+ * no indexes. If there are any dead items, then
+ * PRUNE_DO_MARK_UNUSED_NOW was not set and every item being
+ * marked LP_UNUSED must refer to a heap-only tuple.
*/
if (ndead > 0)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0fb5a7dd24d..04e86347a0b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1397,18 +1397,10 @@ lazy_scan_prune(LVRelState *vacrel,
{
Relation rel = vacrel->rel;
PruneFreezeResult presult;
- HeapPageFreeze pagefrz;
+ uint8 actions = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
- /* Initialize pagefrz */
- pagefrz.freeze_required = false;
- pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
- pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
- pagefrz.cutoffs = &vacrel->cutoffs;
-
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
*
@@ -1418,22 +1410,26 @@ lazy_scan_prune(LVRelState *vacrel,
* of as the number of tuples that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
- * items LP_UNUSED, so mark_unused_now should be true if no indexes and
- * false otherwise.
+ * items LP_UNUSED, so PRUNE_DO_MARK_UNUSED_NOW should be set if no
+ * indexes and unset otherwise.
*
* We will update the VM after collecting LP_DEAD items and freezing
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
- &pagefrz, &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
+ actions |= PRUNE_DO_TRY_FREEZE;
- vacrel->offnum = InvalidOffsetNumber;
+ if (vacrel->nindexes == 0)
+ actions |= PRUNE_DO_MARK_UNUSED_NOW;
- Assert(MultiXactIdIsValid(presult.new_relminmxid));
- vacrel->NewRelfrozenXid = presult.new_relfrozenxid;
- Assert(TransactionIdIsValid(presult.new_relfrozenxid));
- vacrel->NewRelminMxid = presult.new_relminmxid;
+ heap_page_prune_and_freeze(rel, buf, actions, vacrel->vistest,
+ &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum,
+ &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
+
+ Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
+ Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
+
+ vacrel->offnum = InvalidOffsetNumber;
if (presult.nfrozen > 0)
{
@@ -1466,7 +1462,7 @@ lazy_scan_prune(LVRelState *vacrel,
&debug_cutoff, &debug_all_frozen))
Assert(false);
- Assert(presult.all_frozen == debug_all_frozen);
+ Assert(presult.set_all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
debug_cutoff == presult.vm_conflict_horizon);
@@ -1521,7 +1517,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (presult.all_frozen)
+ if (presult.set_all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1592,7 +1588,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.set_all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2740eaac13e..747a9ea0052 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -191,8 +191,35 @@ typedef struct HeapPageFreeze
MultiXactId NoFreezePageRelminMxid;
struct VacuumCutoffs *cutoffs;
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} HeapPageFreeze;
+/*
+ * Actions that can be taken during pruning and freezing. By default, we will
+ * at least attempt regular pruning.
+ */
+
+/*
+ * mark_unused_now indicates whether or not dead items can be set LP_UNUSED
+ * during pruning.
+ */
+#define PRUNE_DO_MARK_UNUSED_NOW (1 << 1)
+
+/*
+ * Freeze if advantageous or required and try to advance relfrozenxid and
+ * relminmxid. To attempt freezing, we will need to determine if the page is
+ * all frozen. So, if this action is set, we will also inform the caller if the
+ * page is all-visible and/or all-frozen and calculate a snapshot conflict
+ * horizon for updating the visibility map. While doing this, we also count if
+ * tuples are live or recently dead.
+ */
+#define PRUNE_DO_TRY_FREEZE (1 << 2)
+
+
/*
* Per-page state returned from pruning
*/
@@ -203,14 +230,17 @@ typedef struct PruneFreezeResult
/*
* The rest of the fields in PruneFreezeResult are only guaranteed to be
- * initialized if heap_page_prune_and_freeze() is passed a PruneReason
- * other than PRUNE_ON_ACCESS.
+ * initialized if heap_page_prune_and_freeze() is passed
+ * PRUNE_DO_TRY_FREEZE.
*/
- int live_tuples;
- int recently_dead_tuples;
-
/* Number of tuples we froze */
int nfrozen;
+ /* Whether or not the page should be set all-frozen in the VM */
+ bool set_all_frozen;
+
+ /* Number of live and recently dead tuples */
+ int live_tuples;
+ int recently_dead_tuples;
/*
* Whether or not the page is truly all-visible after pruning. If there
@@ -219,8 +249,6 @@ typedef struct PruneFreezeResult
*/
bool all_visible;
- /* Whether or not the page can be set all-frozen in the VM */
- bool all_frozen;
/* Whether or not the page makes rel truncation unsafe */
bool hastup;
@@ -232,15 +260,6 @@ typedef struct PruneFreezeResult
*/
TransactionId vm_conflict_horizon;
- /*
- * One entry for every tuple that we may freeze.
- */
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
- /* New value of relfrozenxid found by heap_page_prune_and_freeze() */
- TransactionId new_relfrozenxid;
-
- /* New value of relminmxid found by heap_page_prune_and_freeze() */
- MultiXactId new_relminmxid;
int lpdead_items; /* includes existing LP_DEAD items */
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
@@ -352,12 +371,14 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ uint8 actions,
struct GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
+ struct VacuumCutoffs *cutoffs,
PruneFreezeResult *presult,
PruneReason reason,
- OffsetNumber *off_loc);
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid);
extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
--
2.40.1
On Tue, Mar 26, 2024 at 5:46 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Mon, Mar 25, 2024 at 09:33:38PM +0200, Heikki Linnakangas wrote:
On 24/03/2024 18:32, Melanie Plageman wrote:
On Thu, Mar 21, 2024 at 9:28 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
In heap_page_prune_and_freeze(), we now do some extra work on each live
tuple, to set the all_visible_except_removable correctly. And also to
update live_tuples, recently_dead_tuples and hastup. When we're not
freezing, that's a waste of cycles, the caller doesn't care. I hope it's
enough that it doesn't matter, but is it?Last year on an early version of the patch set I did some pgbench
tpcb-like benchmarks -- since there is a lot of on-access pruning in
that workload -- and I don't remember it being a showstopper. The code
has changed a fair bit since then. However, I think it might be safer
to pass a flag "by_vacuum" to heap_page_prune_and_freeze() and skip
the rest of the loop after heap_prune_satisifies_vacuum() when
on-access pruning invokes it. I had avoided that because it felt ugly
and error-prone, however it addresses a few other of your points as
well.Ok. I'm not a fan of the name 'by_vacuum' though. It'd be nice if the
argument described what it does, rather than who it's for. For example,
'need_all_visible'. If set to true, the function determines 'all_visible',
otherwise it does not.A very rough v7 is attached. The whole thing is rebased over master and
then 0016 contains an attempt at the refactor we discussed in this
email.Instead of just using the PruneReason to avoid doing the extra steps
when on-access pruning calls heap_page_prune_and_freeze(), I've made an
"actions" variable and defined different flags for it. One of them is
a replacement for the existing mark_unused_now flag. I defined another
one, PRUNE_DO_TRY_FREEZE, which could be used in place of checking if
pagefrz is NULL.There is a whole group of activities that only the vacuum caller does
outside of freezing -- setting hastup, counting live and recently dead
tuples, determining whole page visibility and a snapshot conflict
horizon for updating the VM. But I didn't want to introduce separate
flags for each of them, because then I would have to check each of them
before taking the action. That would be lots of extra branching and
on-access pruning does none of those actions while vacuum does all of
them.I started to look closer at the loops in heap_prune_chain() and how they
update all the various flags and counters. There's a lot going on there. We
have:- live_tuples counter
- recently_dead_tuples counter
- all_visible[_except_removable]
- all_frozen
- visibility_cutoff_xid
- hastup
- prstate.frozen array
- nnewlpdead
- deadoffsets arrayAnd that doesn't even include all the local variables and the final
dead/redirected arrays.Some of those are set in the first loop that initializes 'htsv' for each
tuple on the page. Others are updated in heap_prune_chain(). Some are
updated in both. It's hard to follow which are set where.I think recently_dead_tuples is updated incorrectly, for tuples that are
part of a completely dead HOT chain. For example, imagine a hot chain with
two tuples: RECENTLY_DEAD -> DEAD. heap_prune_chain() would follow the
chain, see the DEAD tuple at the end of the chain, and mark both tuples for
pruning. However, we already updated 'recently_dead_tuples' in the first
loop, which is wrong if we remove the tuple.Ah, yes, you are so right about this bug.
Maybe that's the only bug like this, but I'm a little scared. Is there
something we could do to make this simpler? Maybe move all the new work that
we added to the first loop, into heap_prune_chain() ? Maybe introduce a few
more helper heap_prune_record_*() functions, to update the flags and
counters also for live and insert/delete-in-progress tuples and for dead
line pointers? Something like heap_prune_record_live() and
heap_prune_record_lp_dead().I like the idea of a heap_prune_record_live_or_recently_dead() function.
That's what I've attempted to implement in the attached 0016. I haven't
updated and cleaned up everything (especially comments) in the refactor,
but there are two major issues:1) In heap_prune_chain(), a heap-only tuple which is not HOT updated may
end up being a live tuple not part of any chain or it may end up the
redirect target in a HOT chain. At the top of heap_prune_chain(), we
return if (HeapTupleHeaderIsHeapOnly(htup)). We may come back to this
tuple later if it is part of a chain. If we don't, we need to have
called heap_prune_record_live_or_recently_dead(). However, there are
other tuples that get redirected to which do not meet this criteria, so
we must call heap_prune_record_live_or_recently_dead() when setting an
item redirected to. If we call heap_prune_record_live_or_recently_dead()
in both places, we will double-count. To fix this, I introduced an
array, "counted". But that takes up extra space in the PruneState and
extra cycles to memset it.I can't think of a way to make sure we count the right tuples without
another array. The tuples we need to count are those not pointed to by
prstate->marked + those tuples whose line pointers will be redirected to
(those are marked).2) A large number of the members of PruneFreezeResult are only
initialized for the vacuum caller now. Even with a comment, this is a
bit confusing. And, it seems like there should be some symmetry between
the actions the caller tells heap_page_prune_and_freeze() to take and
the result parameters that are filled in.I am concerned about adding all of the actions (setting hastup,
determining whole page visibility, etc as mentioned above) because then
I also have to check all the actions and that will add extra branching.
And out of the two callers of heap_page_prune_and_freeze(), one will do
all of the actions and one will do none of them except "main" pruning.
This morning I worked on a version of this patchset which moved the
counting of live and recently dead tuples and the calculation of the
vm conflict horizon back to lazy_scan_prune() but kept the freezing
and dead offset collection in heap_prune_chain(). I encountered the
same problem with ensuring each tuple was considered for freezing
exactly once. It also made me realize that my patch set (v7) still has
the same problem in which all_visible_except_removable will be
incorrectly set to false and recently dead tuples incorrectly
incremented when encountering HEAPTUPLE_RECENTLY_DEAD tuples whose
line pointers get set LP_DEAD during pruning. And I think I am
incorrectly calling heap_prepare_freeze_tuple() on them too.
I need some way to modify the control flow or accounting such that I
know which HEAPTUPLE_RECENTLY_DEAD tuples will not be marked LP_DEAD.
And a way to consider freezing and do live tuple accounting for these
and HEAPTUPLE_LIVE tuples exactly once.
- Melanie
On 27/03/2024 17:18, Melanie Plageman wrote:
I need some way to modify the control flow or accounting such that I
know which HEAPTUPLE_RECENTLY_DEAD tuples will not be marked LP_DEAD.
And a way to consider freezing and do live tuple accounting for these
and HEAPTUPLE_LIVE tuples exactly once.
Just a quick update: I've been massaging this some more today, and I
think I'm onto got something palatable. I'll send an updated patch later
today, but the key is to note that for each item on the page, there is
one point where we determine the fate of the item, whether it's pruned
or not. That can happen in different points in in heap_page_prune().
That's also when we set marked[offnum] = true. Whenever that happens, we
all call one of the a heap_page_prune_record_*() subroutines. We already
have those subroutines for when a tuple is marked as dead or unused, but
let's add similar subroutines for the case that we're leaving the tuple
unchanged. If we move all the bookkeeping logic to those subroutines, we
can ensure that it gets done exactly once for each tuple, and at that
point we know what we are going to do to the tuple, so we can count it
correctly. So heap_prune_chain() decides what to do with each tuple, and
ensures that each tuple is marked only once, and the subroutines update
all the variables, add the item to the correct arrays etc. depending on
what we're doing with it.
--
Heikki Linnakangas
Neon (https://neon.tech)
On Tue, Mar 19, 2024 at 9:36 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
* "Freeze" NewRelfrozenXid/NewRelminMxid trackers.
*
* Trackers used when heap_freeze_execute_prepared freezes, or when there
* are zero freeze plans for a page. It is always valid for vacuumlazy.c
* to freeze any page, by definition. This even includes pages that have
* no tuples with storage to consider in the first place. That way the
* 'totally_frozen' results from heap_prepare_freeze_tuple can always be
* used in the same way, even when no freeze plans need to be executed to
* "freeze the page". Only the "freeze" path needs to consider the need
* to set pages all-frozen in the visibility map under this scheme.
*
* When we freeze a page, we generally freeze all XIDs < OldestXmin, only
* leaving behind XIDs that are ineligible for freezing, if any. And so
* you might wonder why these trackers are necessary at all; why should
* _any_ page that VACUUM freezes _ever_ be left with XIDs/MXIDs that
* ratchet back the top-level NewRelfrozenXid/NewRelminMxid trackers?
*
* It is useful to use a definition of "freeze the page" that does not
* overspecify how MultiXacts are affected. heap_prepare_freeze_tuple
* generally prefers to remove Multis eagerly, but lazy processing is used
* in cases where laziness allows VACUUM to avoid allocating a new Multi.
* The "freeze the page" trackers enable this flexibility.
*/So, I don't really know if it is right to just check presult->nfrozen >
0 when updating relminmxid. I have changed it to the way you suggested.
But we can change it back.
I think that this is just about safe. I had to check, though. I see
that the FRM_NOOP case (within
FreezeMultiXactId/heap_prepare_freeze_tuple) will ratchet back both
sets of trackers (both the freeze and no freeze variants). However,
it's rather hard to see that this is true.
The intent here was that cases where "presult->nfrozen == 0" would
always take the "freeze" path. That seems more natural to me, at
least, since I think of the freeze path as the default choice. By
definition, lazy_scan_prune() can always take the freeze path -- even
when the page has no tuples with storage. But it cannot always take
the no-freeze path -- "disobeying" pagefrz.freeze_required creates the
risk that relfrozenxid/relminmxid will be advanced to unsafe values at
the end of the VACUUM. IMV you should stick with that approach now,
even if it is currently safe to do it the other way around.
--
Peter Geoghegan
On Wed, Mar 27, 2024 at 12:18 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 27/03/2024 17:18, Melanie Plageman wrote:
I need some way to modify the control flow or accounting such that I
know which HEAPTUPLE_RECENTLY_DEAD tuples will not be marked LP_DEAD.
And a way to consider freezing and do live tuple accounting for these
and HEAPTUPLE_LIVE tuples exactly once.Just a quick update: I've been massaging this some more today, and I
think I'm onto got something palatable. I'll send an updated patch later
today, but the key is to note that for each item on the page, there is
one point where we determine the fate of the item, whether it's pruned
or not. That can happen in different points in in heap_page_prune().
That's also when we set marked[offnum] = true. Whenever that happens, we
all call one of the a heap_page_prune_record_*() subroutines. We already
have those subroutines for when a tuple is marked as dead or unused, but
let's add similar subroutines for the case that we're leaving the tuple
unchanged. If we move all the bookkeeping logic to those subroutines, we
can ensure that it gets done exactly once for each tuple, and at that
point we know what we are going to do to the tuple, so we can count it
correctly. So heap_prune_chain() decides what to do with each tuple, and
ensures that each tuple is marked only once, and the subroutines update
all the variables, add the item to the correct arrays etc. depending on
what we're doing with it.
Yes, this would be ideal.
I was doing some experimentation with pageinspect today (trying to
find that single place where live tuples fates are decided) and it
seems like a heap-only tuple that is not HOT-updated will usually be
the one at the end of the chain. Which seems like it would be covered
by adding a record_live() type function call in the loop of
heap_prune_chain():
/*
* If the tuple is not HOT-updated, then we are at the end of this
* HOT-update chain.
*/
if (!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_live_or_recently_dead(dp, prstate,
offnum, presult);
break;
}
but that doesn't end up producing the same results as
if (HeapTupleHeaderIsHeapOnly(htup)
&& !HeapTupleHeaderIsHotUpdated(htup) &&
presult->htsv[rootoffnum] == HEAPTUPLE_DEAD)
heap_prune_record_live_or_recently_dead(dp, prstate,
offnum, presult);
at the top of heap_prune_chain().
- Melanie
On Wed, Mar 27, 2024 at 2:26 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
On Wed, Mar 27, 2024 at 12:18 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 27/03/2024 17:18, Melanie Plageman wrote:
I need some way to modify the control flow or accounting such that I
know which HEAPTUPLE_RECENTLY_DEAD tuples will not be marked LP_DEAD.
And a way to consider freezing and do live tuple accounting for these
and HEAPTUPLE_LIVE tuples exactly once.Just a quick update: I've been massaging this some more today, and I
think I'm onto got something palatable. I'll send an updated patch later
today, but the key is to note that for each item on the page, there is
one point where we determine the fate of the item, whether it's pruned
or not. That can happen in different points in in heap_page_prune().
That's also when we set marked[offnum] = true. Whenever that happens, we
all call one of the a heap_page_prune_record_*() subroutines. We already
have those subroutines for when a tuple is marked as dead or unused, but
let's add similar subroutines for the case that we're leaving the tuple
unchanged. If we move all the bookkeeping logic to those subroutines, we
can ensure that it gets done exactly once for each tuple, and at that
point we know what we are going to do to the tuple, so we can count it
correctly. So heap_prune_chain() decides what to do with each tuple, and
ensures that each tuple is marked only once, and the subroutines update
all the variables, add the item to the correct arrays etc. depending on
what we're doing with it.Yes, this would be ideal.
I was doing some experimentation with pageinspect today (trying to
find that single place where live tuples fates are decided) and it
seems like a heap-only tuple that is not HOT-updated will usually be
the one at the end of the chain. Which seems like it would be covered
by adding a record_live() type function call in the loop of
heap_prune_chain():/*
* If the tuple is not HOT-updated, then we are at the end of this
* HOT-update chain.
*/
if (!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_live_or_recently_dead(dp, prstate,
offnum, presult);
break;
}but that doesn't end up producing the same results as
if (HeapTupleHeaderIsHeapOnly(htup)
&& !HeapTupleHeaderIsHotUpdated(htup) &&
presult->htsv[rootoffnum] == HEAPTUPLE_DEAD)
heap_prune_record_live_or_recently_dead(dp, prstate,
offnum, presult);
sorry, that should say presult->htsv[rootoffnum] != HEAPTUPLE_DEAD.
The latter should be a subset of the former. But instead it seems
there are cases I missed by doing only the former.
- Melanie
On 27/03/2024 20:26, Melanie Plageman wrote:
On Wed, Mar 27, 2024 at 12:18 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 27/03/2024 17:18, Melanie Plageman wrote:
I need some way to modify the control flow or accounting such that I
know which HEAPTUPLE_RECENTLY_DEAD tuples will not be marked LP_DEAD.
And a way to consider freezing and do live tuple accounting for these
and HEAPTUPLE_LIVE tuples exactly once.Just a quick update: I've been massaging this some more today, and I
think I'm onto got something palatable. I'll send an updated patch later
today, but the key is to note that for each item on the page, there is
one point where we determine the fate of the item, whether it's pruned
or not. That can happen in different points in in heap_page_prune().
That's also when we set marked[offnum] = true. Whenever that happens, we
all call one of the a heap_page_prune_record_*() subroutines. We already
have those subroutines for when a tuple is marked as dead or unused, but
let's add similar subroutines for the case that we're leaving the tuple
unchanged. If we move all the bookkeeping logic to those subroutines, we
can ensure that it gets done exactly once for each tuple, and at that
point we know what we are going to do to the tuple, so we can count it
correctly. So heap_prune_chain() decides what to do with each tuple, and
ensures that each tuple is marked only once, and the subroutines update
all the variables, add the item to the correct arrays etc. depending on
what we're doing with it.Yes, this would be ideal.
Well, that took me a lot longer than expected. My approach of "make sure
you all the right heap_prune_record_*() subroutine in all cases didn't
work out quite as easily as I thought. Because, as you pointed out, it's
difficult to know if a non-DEAD tuple that is part of a HOT chain will
be visited later as part of the chain processing, or needs to be counted
at the top of heap_prune_chain().
The solution I came up with is to add a third phase to pruning. At the
top of heap_prune_chain(), if we see a live heap-only tuple, and we're
not sure if it will be counted later as part of a HOT chain, we stash it
away and revisit it later, after processing all the hot chains. That's
somewhat similar to your 'counted' array, but not quite.
Attached is that approach, on top of v7. It's a bit messy, I made a
bunch of other changes too and didn't fully separate them out to
separate patch. Sorry about that.
One change with this is that live_tuples and many of the other fields
are now again updated, even if the caller doesn't need them. It was hard
to skip them in a way that would save any cycles, with the other
refactorings.
Some other notable changes are mentioned in the commit message.
I was doing some experimentation with pageinspect today (trying to
find that single place where live tuples fates are decided) and it
seems like a heap-only tuple that is not HOT-updated will usually be
the one at the end of the chain. Which seems like it would be covered
by adding a record_live() type function call in the loop of
heap_prune_chain():/*
* If the tuple is not HOT-updated, then we are at the end of this
* HOT-update chain.
*/
if (!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_live_or_recently_dead(dp, prstate,
offnum, presult);
break;
}but that doesn't end up producing the same results as
if (HeapTupleHeaderIsHeapOnly(htup)
&& !HeapTupleHeaderIsHotUpdated(htup) &&
presult->htsv[rootoffnum] == HEAPTUPLE_DEAD)
heap_prune_record_live_or_recently_dead(dp, prstate,
offnum, presult);at the top of heap_prune_chain().
Yep, this is tricky, I also spent a lot of time trying to find a good
"choke point" where we could say for sure that a live tuple is processed
exactly once, but fumbled just like you.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
v8-0001-lazy_scan_prune-tests-tuple-vis-with-GlobalVisTes.patchtext/x-patch; charset=UTF-8; name=v8-0001-lazy_scan_prune-tests-tuple-vis-with-GlobalVisTes.patchDownload
From 14ab9ed8b24b7ee3b104c950e98cc4e03062c413 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:14:47 -0500
Subject: [PATCH v8 01/22] lazy_scan_prune tests tuple vis with GlobalVisTest
One requirement for eventually combining the prune and freeze records,
is that we must check during pruning if live tuples on the page are
visible to everyone and thus, whether or not the page is all visible. We
only consider opportunistically freezing tuples if the whole page is all
visible and could be set all frozen.
During pruning (in heap_page_prune()), we do not have access to
VacuumCutoffs -- as on access pruning also calls heap_page_prune(). We
do, however, have access to a GlobalVisState. This can be used to
determine whether or not the tuple is visible to everyone. It also has
the potential of being more up-to-date than VacuumCutoffs->OldestXmin.
This commit simply modifies lazy_scan_prune() to use GlobalVisState
instead of OldestXmin. Future commits will move the
all_visible/all_frozen calculation into heap_page_prune().
---
src/backend/access/heap/vacuumlazy.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ba5b7083a3a..a7451743e25 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1579,11 +1579,15 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
+ * that everyone sees it as committed? A
+ * FrozenTransactionId is seen as committed to everyone.
+ * Otherwise, we check if there is a snapshot that
+ * considers this xid to still be running, and if so, we
+ * don't consider the page all-visible.
*/
xmin = HeapTupleHeaderGetXmin(htup);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ if (xmin != FrozenTransactionId &&
+ !GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
{
all_visible = false;
break;
--
2.39.2
v8-0002-Pass-heap_prune_chain-PruneResult-output-paramete.patchtext/x-patch; charset=UTF-8; name=v8-0002-Pass-heap_prune_chain-PruneResult-output-paramete.patchDownload
From 4f3d886d4562264439875d545a6f04f303e26526 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:39:59 -0500
Subject: [PATCH v8 02/22] Pass heap_prune_chain() PruneResult output parameter
Future commits will set other members of PruneResult in
heap_prune_chain(), so start passing it as an output parameter now. This
eliminates the output parameter htsv -- the array of HTSV_Results --
since that is a member of the PruneResult.
---
src/backend/access/heap/pruneheap.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4e58c2c2ff4..c1542b95af8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -59,8 +59,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- int8 *htsv,
- PruneState *prstate);
+ PruneState *prstate, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -325,7 +324,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Process this item or chain of items */
presult->ndeleted += heap_prune_chain(buffer, offnum,
- presult->htsv, &prstate);
+ &prstate, presult);
}
/* Clear the offset information once we have processed the given page. */
@@ -427,7 +426,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in htsv.
+ * Tuple visibility information is provided in presult->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -457,7 +456,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- int8 *htsv, PruneState *prstate)
+ PruneState *prstate, PruneResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -478,7 +477,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(htsv[rootoffnum] != -1);
+ Assert(presult->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -501,7 +500,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -598,7 +597,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(htsv[offnum]))
+ switch (htsv_get_valid_status(presult->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
--
2.39.2
v8-0003-Rename-PruneState-snapshotConflictHorizon-to-late.patchtext/x-patch; charset=UTF-8; name=v8-0003-Rename-PruneState-snapshotConflictHorizon-to-late.patchDownload
From 24040bea932f533ce2de033edcfb5c142d860a81 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 18:02:09 -0400
Subject: [PATCH v8 03/22] Rename PruneState->snapshotConflictHorizon to
latest_xid_removed
In anticipation of combining pruning and freezing and emitting a single
WAL record, rename PruneState->snapshotConflictHorizon to
latest_xid_removed. After pruning and freezing, we will choose a
combined record snapshot conflict horizon taking into account both
values.
---
src/backend/access/heap/pruneheap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c1542b95af8..ca4301bb8a9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -35,7 +35,7 @@ typedef struct
bool mark_unused_now;
TransactionId new_prune_xid; /* new prune hint value for page */
- TransactionId snapshotConflictHorizon; /* latest xid removed */
+ TransactionId latest_xid_removed;
int nredirected; /* numbers of entries in arrays below */
int ndead;
int nunused;
@@ -238,7 +238,7 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.new_prune_xid = InvalidTransactionId;
prstate.vistest = vistest;
prstate.mark_unused_now = mark_unused_now;
- prstate.snapshotConflictHorizon = InvalidTransactionId;
+ prstate.latest_xid_removed = InvalidTransactionId;
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -367,7 +367,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (RelationNeedsWAL(relation))
{
log_heap_prune_and_freeze(relation, buffer,
- prstate.snapshotConflictHorizon,
+ prstate.latest_xid_removed,
true, reason,
NULL, 0,
prstate.redirected, prstate.nredirected,
@@ -505,7 +505,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
heap_prune_record_unused(prstate, rootoffnum);
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
+ &prstate->latest_xid_removed);
ndeleted++;
}
@@ -651,7 +651,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
latestdead = offnum;
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
+ &prstate->latest_xid_removed);
}
else if (!recent_dead)
break;
--
2.39.2
v8-0004-heap_page_prune-sets-all_visible-and-visibility_c.patchtext/x-patch; charset=UTF-8; name=v8-0004-heap_page_prune-sets-all_visible-and-visibility_c.patchDownload
From db92ef7111b56d8a12b915e797b6c5e5dd11667f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 18:31:24 -0400
Subject: [PATCH v8 04/22] heap_page_prune sets all_visible and
visibility_cutoff_xid
In order to combine the prune and freeze records, we must know if the
page is eligible to be opportunistically frozen before finishing
pruning. Save all_visible in the PruneResult and set it to false when we
see non-removable tuples which are not visible to everyone.
We will also need to ensure that the snapshotConflictHorizon for the combined
prune + freeze record is the more conservative of that calculated for each of
pruning and freezing. Calculate the visibility_cutoff_xid for the purposes of
freezing -- the newest xmin on the page -- in heap_page_prune() and save it in
PruneResult.visibility_cutoff_xid.
Note that these are only needed by vacuum callers of heap_page_prune(),
so don't update them for on-access pruning.
---
src/backend/access/heap/pruneheap.c | 131 +++++++++++++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 113 +++++------------------
src/include/access/heapam.h | 21 +++++
3 files changed, 169 insertions(+), 96 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ca4301bb8a9..5776ae84f4d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -63,8 +63,10 @@ static int heap_prune_chain(Buffer buffer,
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
-static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -249,6 +251,14 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ /*
+ * Keep track of whether or not the page is all_visible in case the caller
+ * wants to use this information to update the VM.
+ */
+ presult->all_visible = true;
+ /* for recovery conflicts */
+ presult->visibility_cutoff_xid = InvalidTransactionId;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -300,8 +310,101 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
+
+ if (reason == PRUNE_ON_ACCESS)
+ continue;
+
+ switch (presult->htsv[offnum])
+ {
+ case HEAPTUPLE_DEAD:
+
+ /*
+ * Deliberately delay unsetting all_visible until later during
+ * pruning. Removable dead tuples shouldn't preclude freezing
+ * the page. After finishing this first pass of tuple
+ * visibility checks, initialize all_visible_except_removable
+ * with the current value of all_visible to indicate whether
+ * or not the page is all visible except for dead tuples. This
+ * will allow us to attempt to freeze the page after pruning.
+ * Later during pruning, if we encounter an LP_DEAD item or
+ * are setting an item LP_DEAD, we will unset all_visible. As
+ * long as we unset it before updating the visibility map,
+ * this will be correct.
+ */
+ break;
+ case HEAPTUPLE_LIVE:
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed
+ * asynchronously. See SetHintBits for more info. Check that
+ * the tuple is hinted xmin-committed because of that.
+ */
+ if (presult->all_visible)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed? A
+ * FrozenTransactionId is seen as committed to everyone.
+ * Otherwise, we check if there is a snapshot that
+ * considers this xid to still be running, and if so, we
+ * don't consider the page all-visible.
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+ if (xmin != FrozenTransactionId &&
+ !GlobalVisTestIsRemovableXid(vistest, xmin))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, presult->visibility_cutoff_xid) &&
+ TransactionIdIsNormal(xmin))
+ presult->visibility_cutoff_xid = xmin;
+ }
+ break;
+ case HEAPTUPLE_RECENTLY_DEAD:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+ /* This is an expected case during concurrent vacuum */
+ presult->all_visible = false;
+ break;
+ default:
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+ break;
+ }
}
+ /*
+ * For vacuum, if the whole page will become frozen, we consider
+ * opportunistically freezing tuples. Dead tuples which will be removed by
+ * the end of vacuuming should not preclude us from opportunistically
+ * freezing. We will not be able to freeze the whole page if there are
+ * tuples present which are not visible to everyone or if there are dead
+ * tuples which are not yet removable. We need all_visible to be false if
+ * LP_DEAD tuples remain after pruning so that we do not incorrectly
+ * update the visibility map or page hint bit. So, we will update
+ * presult->all_visible to reflect the presence of LP_DEAD items while
+ * pruning and keep all_visible_except_removable to permit freezing if the
+ * whole page will eventually become all visible after removing tuples.
+ */
+ presult->all_visible_except_removable = presult->all_visible;
+
/* Scan the page */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -569,10 +672,14 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
/*
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now. We don't increment ndeleted here since
- * the LP was already marked dead.
+ * the LP was already marked dead. If it will not be marked
+ * LP_UNUSED, it will remain LP_DEAD, making the page not
+ * all_visible.
*/
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
+ else
+ presult->all_visible = false;
break;
}
@@ -709,7 +816,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
}
@@ -722,7 +829,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect item. We can clean up by setting the redirect item to
* DEAD state or LP_UNUSED if the caller indicated.
*/
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
return ndeleted;
@@ -759,13 +866,20 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
-heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+
+ /*
+ * Setting the line pointer LP_DEAD means the page will definitely not be
+ * all_visible.
+ */
+ presult->all_visible = false;
}
/*
@@ -775,7 +889,8 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
* pointers LP_DEAD if mark_unused_now is true.
*/
static void
-heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
@@ -786,7 +901,7 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
- heap_prune_record_dead(prstate, offnum);
+ heap_prune_record_dead(prstate, offnum, presult);
}
/* Record line pointer to be marked unused */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a7451743e25..17fb0b4f7b7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1422,9 +1422,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_visible,
- all_frozen;
- TransactionId visibility_cutoff_xid;
+ bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1465,17 +1463,16 @@ lazy_scan_prune(LVRelState *vacrel,
&presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
/*
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Keep track of whether or not the page is all_visible and
- * all_frozen and use this information to update the VM. all_visible
- * implies 0 lpdead_items, but don't trust all_frozen result unless
- * all_visible is also set to true.
+ * Now scan the page to collect LP_DEAD items and check for tuples
+ * requiring freezing among remaining tuples with storage. We will update
+ * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
+ * have determined whether or not the page is all_visible. Keep track of
+ * whether or not the page is all_frozen and use this information to
+ * update the VM. all_visible implies lpdead_items == 0, but don't trust
+ * all_frozen result unless all_visible is also set to true.
*
- * Also keep track of the visibility cutoff xid for recovery conflicts.
*/
- all_visible = true;
all_frozen = true;
- visibility_cutoff_xid = InvalidTransactionId;
/*
* Now scan the page to collect LP_DEAD items and update the variables set
@@ -1516,11 +1513,6 @@ lazy_scan_prune(LVRelState *vacrel,
* will only happen every other VACUUM, at most. Besides, VACUUM
* must treat hastup/nonempty_pages as provisional no matter how
* LP_DEAD items are handled (handled here, or handled later on).
- *
- * Also deliberately delay unsetting all_visible until just before
- * we return to lazy_scan_heap caller, as explained in full below.
- * (This is another case where it's useful to anticipate that any
- * LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
*/
deadoffsets[lpdead_items++] = offnum;
continue;
@@ -1558,46 +1550,6 @@ lazy_scan_prune(LVRelState *vacrel,
* what acquire_sample_rows() does.
*/
live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A
- * FrozenTransactionId is seen as committed to everyone.
- * Otherwise, we check if there is a snapshot that
- * considers this xid to still be running, and if so, we
- * don't consider the page all-visible.
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (xmin != FrozenTransactionId &&
- !GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
- {
- all_visible = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
break;
case HEAPTUPLE_RECENTLY_DEAD:
@@ -1607,7 +1559,6 @@ lazy_scan_prune(LVRelState *vacrel,
* pruning.)
*/
recently_dead_tuples++;
- all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1618,16 +1569,13 @@ lazy_scan_prune(LVRelState *vacrel,
* results. This assumption is a bit shaky, but it is what
* acquire_sample_rows() does, so be consistent.
*/
- all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
- all_visible = false;
/*
- * Count such rows as live. As above, we assume the deleting
- * transaction will commit and update the counters after we
- * report.
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
*/
live_tuples++;
break;
@@ -1670,7 +1618,7 @@ lazy_scan_prune(LVRelState *vacrel,
* page all-frozen afterwards (might not happen until final heap pass).
*/
if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (all_visible && all_frozen &&
+ (presult.all_visible_except_removable && all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1708,11 +1656,11 @@ lazy_scan_prune(LVRelState *vacrel,
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (all_visible && all_frozen)
+ if (presult.all_visible_except_removable && all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = visibility_cutoff_xid;
- visibility_cutoff_xid = InvalidTransactionId;
+ snapshotConflictHorizon = presult.visibility_cutoff_xid;
+ presult.visibility_cutoff_xid = InvalidTransactionId;
}
else
{
@@ -1748,17 +1696,19 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && lpdead_items == 0)
+ if (presult.all_visible)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
+ Assert(lpdead_items == 0);
+
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
Assert(false);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == visibility_cutoff_xid);
+ debug_cutoff == presult.visibility_cutoff_xid);
}
#endif
@@ -1783,19 +1733,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(dead_items->num_items <= dead_items->max_items);
pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
dead_items->num_items);
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on
- * to make the choice of whether or not to freeze the page unaffected
- * by the short-term presence of LP_DEAD items. These LP_DEAD items
- * were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible. It needs
- * to reflect the present state of things, as expected by our caller.
- */
- all_visible = false;
}
/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1812,20 +1749,20 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (lpdead_items > 0);
- Assert(!all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_visible || !(*has_lpdead_items));
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (all_frozen)
{
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1845,7 +1782,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
+ vmbuffer, presult.visibility_cutoff_xid,
flags);
}
@@ -1893,7 +1830,7 @@ lazy_scan_prune(LVRelState *vacrel,
* it as all-frozen. Note that all_frozen is only valid if all_visible is
* true, so we must check both all_visible and all_frozen.
*/
- else if (all_visible_according_to_vm && all_visible &&
+ else if (all_visible_according_to_vm && presult.all_visible &&
all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
@@ -1914,7 +1851,7 @@ lazy_scan_prune(LVRelState *vacrel,
* since a snapshotConflictHorizon sufficient to make everything safe
* for REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f1122453738..29daab7aeb8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -199,6 +199,27 @@ typedef struct PruneResult
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ /*
+ * The rest of the fields in PruneResult are only guaranteed to be
+ * initialized if heap_page_prune is passed PruneReason VACUUM_SCAN.
+ */
+
+ /*
+ * Whether or not the page is truly all-visible after pruning. If there
+ * are LP_DEAD items on the page which cannot be removed until vacuum's
+ * second pass, this will be false.
+ */
+ bool all_visible;
+
+ /*
+ * Whether or not the page is all-visible except for tuples which will be
+ * removed during vacuum's second pass. This is used by VACUUM to
+ * determine whether or not to consider opportunistically freezing the
+ * page.
+ */
+ bool all_visible_except_removable;
+ TransactionId visibility_cutoff_xid; /* Newest xmin on the page */
+
/*
* Tuple visibility is only computed once for each tuple, for correctness
* and efficiency reasons; see comment in heap_page_prune() for details.
--
2.39.2
v8-0005-Add-reference-to-VacuumCutoffs-in-HeapPageFreeze.patchtext/x-patch; charset=UTF-8; name=v8-0005-Add-reference-to-VacuumCutoffs-in-HeapPageFreeze.patchDownload
From 28641a13e3a6332f8031e36cdce27929c727a916 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 16:22:17 -0500
Subject: [PATCH v8 05/22] Add reference to VacuumCutoffs in HeapPageFreeze
Future commits will move opportunistic freezing into the main path of
pruning in heap_page_prune(). Because on-access pruning will not do
opportunistic freezing, it is cleaner to keep the visibility information
required for calling heap_prepare_freeze_tuple() inside of the
HeapPageFreeze structure itself by saving a reference to VacuumCutoffs.
---
src/backend/access/heap/heapam.c | 16 ++++++++--------
src/backend/access/heap/vacuumlazy.c | 3 ++-
src/include/access/heapam.h | 2 +-
3 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2f6527df0dc..bb856690234 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6125,9 +6125,9 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
*/
static TransactionId
FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
- const struct VacuumCutoffs *cutoffs, uint16 *flags,
- HeapPageFreeze *pagefrz)
+ uint16 *flags, HeapPageFreeze *pagefrz)
{
+ const struct VacuumCutoffs *cutoffs = pagefrz->cutoffs;
TransactionId newxmax;
MultiXactMember *members;
int nmembers;
@@ -6475,10 +6475,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
*/
bool
heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen)
{
+ const struct VacuumCutoffs *cutoffs = pagefrz->cutoffs;
bool xmin_already_frozen = false,
xmax_already_frozen = false;
bool freeze_xmin = false,
@@ -6550,8 +6550,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* perform no-op xmax processing. The only constraint is that the
* FreezeLimit/MultiXactCutoff postcondition must never be violated.
*/
- newxmax = FreezeMultiXactId(xid, tuple->t_infomask, cutoffs,
- &flags, pagefrz);
+ newxmax = FreezeMultiXactId(xid, tuple->t_infomask, &flags, pagefrz);
if (flags & FRM_NOOP)
{
@@ -6729,7 +6728,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* Does this tuple force caller to freeze the entire page?
*/
pagefrz->freeze_required =
- heap_tuple_should_freeze(tuple, cutoffs,
+ heap_tuple_should_freeze(tuple, pagefrz->cutoffs,
&pagefrz->NoFreezePageRelfrozenXid,
&pagefrz->NoFreezePageRelminMxid);
}
@@ -6890,8 +6889,9 @@ heap_freeze_tuple(HeapTupleHeader tuple,
pagefrz.NoFreezePageRelfrozenXid = FreezeLimit;
pagefrz.NoFreezePageRelminMxid = MultiXactCutoff;
- do_freeze = heap_prepare_freeze_tuple(tuple, &cutoffs,
- &pagefrz, &frz, &totally_frozen);
+ pagefrz.cutoffs = &cutoffs;
+
+ do_freeze = heap_prepare_freeze_tuple(tuple, &pagefrz, &frz, &totally_frozen);
/*
* Note that because this is not a WAL-logged operation, we don't need to
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 17fb0b4f7b7..1b060124a3f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1442,6 +1442,7 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ pagefrz.cutoffs = &vacrel->cutoffs;
tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
@@ -1587,7 +1588,7 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
+ if (heap_prepare_freeze_tuple(htup, &pagefrz,
&frozen[tuples_frozen], &totally_frozen))
{
/* Save prepared freeze plan for later */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 29daab7aeb8..689427e2512 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -189,6 +189,7 @@ typedef struct HeapPageFreeze
TransactionId NoFreezePageRelfrozenXid;
MultiXactId NoFreezePageRelminMxid;
+ struct VacuumCutoffs *cutoffs;
} HeapPageFreeze;
/*
@@ -324,7 +325,6 @@ extern TM_Result heap_lock_tuple(Relation relation, ItemPointer tid,
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
--
2.39.2
v8-0006-Prepare-freeze-tuples-in-heap_page_prune.patchtext/x-patch; charset=UTF-8; name=v8-0006-Prepare-freeze-tuples-in-heap_page_prune.patchDownload
From c77c74630273cc861b1c9570243882a2e58851d1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 19:23:11 -0400
Subject: [PATCH v8 06/22] Prepare freeze tuples in heap_page_prune()
In order to combine the freeze and prune records, we must determine
which tuples are freezable before actually executing pruning. All of the
page modifications should be made in the same critical section along
with emitting the combined WAL. Determine whether or not tuples should
or must be frozen and whether or not the page will be all frozen as a
consequence during pruning.
---
src/backend/access/heap/pruneheap.c | 41 +++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 68 ++++++----------------------
src/include/access/heapam.h | 12 +++++
3 files changed, 64 insertions(+), 57 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5776ae84f4d..457650ab651 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -153,7 +153,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false,
+ heap_page_prune(relation, buffer, vistest, false, NULL,
&presult, PRUNE_ON_ACCESS, NULL);
/*
@@ -201,6 +201,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED
* during pruning.
*
+ * pagefrz contains both input and output parameters used if the caller is
+ * interested in potentially freezing tuples on the page.
+ *
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
* heap_page_prune() is responsible for initializing it.
@@ -215,6 +218,7 @@ void
heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
PruneReason reason,
OffsetNumber *off_loc)
@@ -250,11 +254,16 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ presult->nfrozen = 0;
/*
- * Keep track of whether or not the page is all_visible in case the caller
- * wants to use this information to update the VM.
+ * Caller will update the VM after pruning, collecting LP_DEAD items, and
+ * freezing tuples. Keep track of whether or not the page is all_visible
+ * and all_frozen and use this information to update the VM. all_visible
+ * implies lpdead_items == 0, but don't trust all_frozen result unless
+ * all_visible is also set to true.
*/
+ presult->all_frozen = true;
presult->all_visible = true;
/* for recovery conflicts */
presult->visibility_cutoff_xid = InvalidTransactionId;
@@ -388,6 +397,32 @@ heap_page_prune(Relation relation, Buffer buffer,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
+
+ /*
+ * Consider freezing any normal tuples which will not be removed
+ */
+ if (presult->htsv[offnum] != HEAPTUPLE_DEAD && pagefrz)
+ {
+ bool totally_frozen;
+
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, pagefrz,
+ &presult->frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ presult->frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to
+ * become totally frozen (according to its freeze plan), then the
+ * page definitely cannot be set all-frozen in the visibility map
+ * later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+ }
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1b060124a3f..2a3cc5c7cd3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1416,16 +1416,13 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff;
ItemId itemid;
PruneResult presult;
- int tuples_frozen,
- lpdead_items,
+ int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1443,7 +1440,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
recently_dead_tuples = 0;
@@ -1460,21 +1456,9 @@ lazy_scan_prune(LVRelState *vacrel,
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
* false otherwise.
*/
- heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+ heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0, &pagefrz,
&presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
- /*
- * Now scan the page to collect LP_DEAD items and check for tuples
- * requiring freezing among remaining tuples with storage. We will update
- * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
- * have determined whether or not the page is all_visible. Keep track of
- * whether or not the page is all_frozen and use this information to
- * update the VM. all_visible implies lpdead_items == 0, but don't trust
- * all_frozen result unless all_visible is also set to true.
- *
- */
- all_frozen = true;
-
/*
* Now scan the page to collect LP_DEAD items and update the variables set
* just above.
@@ -1483,9 +1467,6 @@ lazy_scan_prune(LVRelState *vacrel,
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
- HeapTupleHeader htup;
- bool totally_frozen;
-
/*
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
@@ -1521,8 +1502,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(ItemIdIsNormal(itemid));
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
/*
* The criteria for counting a tuple as live in this block need to
* match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
@@ -1587,29 +1566,8 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
- /* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &pagefrz,
- &frozen[tuples_frozen], &totally_frozen))
- {
- /* Save prepared freeze plan for later */
- frozen[tuples_frozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the page
- * definitely cannot be set all-frozen in the visibility map later on
- */
- if (!totally_frozen)
- all_frozen = false;
}
- /*
- * We have now divided every item on the page into either an LP_DEAD item
- * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
- * that remains and needs to be considered for freezing now (LP_UNUSED and
- * LP_REDIRECT items also remain, but are of no further interest to us).
- */
vacrel->offnum = InvalidOffsetNumber;
/*
@@ -1618,8 +1576,8 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (presult.all_visible_except_removable && all_frozen &&
+ if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ (presult.all_visible_except_removable && presult.all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1629,7 +1587,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (tuples_frozen == 0)
+ if (presult.nfrozen == 0)
{
/*
* We have no freeze plans to execute, so there's no added cost
@@ -1657,7 +1615,7 @@ lazy_scan_prune(LVRelState *vacrel,
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (presult.all_visible_except_removable && all_frozen)
+ if (presult.all_visible_except_removable && presult.all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
snapshotConflictHorizon = presult.visibility_cutoff_xid;
@@ -1673,7 +1631,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(vacrel->rel, buf,
snapshotConflictHorizon,
- frozen, tuples_frozen);
+ presult.frozen, presult.nfrozen);
}
}
else
@@ -1684,8 +1642,8 @@ lazy_scan_prune(LVRelState *vacrel,
*/
vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- all_frozen = false;
- tuples_frozen = 0; /* avoid miscounts in instrumentation */
+ presult.all_frozen = false;
+ presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1708,6 +1666,8 @@ lazy_scan_prune(LVRelState *vacrel,
&debug_cutoff, &debug_all_frozen))
Assert(false);
+ Assert(presult.all_frozen == debug_all_frozen);
+
Assert(!TransactionIdIsValid(debug_cutoff) ||
debug_cutoff == presult.visibility_cutoff_xid);
}
@@ -1738,7 +1698,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += tuples_frozen;
+ vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
vacrel->live_tuples += live_tuples;
vacrel->recently_dead_tuples += recently_dead_tuples;
@@ -1761,7 +1721,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (all_frozen)
+ if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1832,7 +1792,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && presult.all_visible &&
- all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 689427e2512..9d047621ea5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -219,6 +219,9 @@ typedef struct PruneResult
* page.
*/
bool all_visible_except_removable;
+
+ /* Whether or not the page can be set all-frozen in the VM */
+ bool all_frozen;
TransactionId visibility_cutoff_xid; /* Newest xmin on the page */
/*
@@ -231,6 +234,14 @@ typedef struct PruneResult
* 1. Otherwise every access would need to subtract 1.
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+
+ /* Number of tuples we may freeze */
+ int nfrozen;
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneResult;
/* 'reason' codes for heap_page_prune() */
@@ -353,6 +364,7 @@ extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
PruneReason reason,
OffsetNumber *off_loc);
--
2.39.2
v8-0007-lazy_scan_prune-reorder-freeze-execution-logic.patchtext/x-patch; charset=UTF-8; name=v8-0007-lazy_scan_prune-reorder-freeze-execution-logic.patchDownload
From de4d688cc4e88604d15dde45ffff2b9d3d870958 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 19:39:25 -0400
Subject: [PATCH v8 07/22] lazy_scan_prune reorder freeze execution logic
To combine the prune and freeze records, freezing must be done before a
pruning WAL record is emitted. We will move the freeze execution into
heap_page_prune() in future commits. lazy_scan_prune() currently
executes freezing, updates vacrel->NewRelfrozenXid and
vacrel->NewRelminMxid, and resets the snapshotConflictHorizon that the
visibility map update record may use in the same block of if statements.
This commit starts reordering that logic so that the freeze execution
can be separated from the other updates which should not be done in
pruning.
---
src/backend/access/heap/vacuumlazy.c | 93 +++++++++++++++-------------
1 file changed, 50 insertions(+), 43 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2a3cc5c7cd3..f474e661428 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1421,6 +1421,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
+ bool do_freeze;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1576,10 +1577,15 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ do_freeze = pagefrz.freeze_required ||
(presult.all_visible_except_removable && presult.all_frozen &&
- fpi_before != pgWalUsage.wal_fpi))
+ presult.nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+
+ if (do_freeze)
{
+ TransactionId snapshotConflictHorizon;
+
/*
* We're freezing the page. Our final NewRelfrozenXid doesn't need to
* be affected by the XIDs that are just about to be frozen anyway.
@@ -1587,52 +1593,53 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (presult.nfrozen == 0)
- {
- /*
- * We have no freeze plans to execute, so there's no added cost
- * from following the freeze path. That's why it was chosen. This
- * is important in the case where the page only contains totally
- * frozen tuples at this point (perhaps only following pruning).
- * Such pages can be marked all-frozen in the VM by our caller,
- * even though none of its tuples were newly frozen here (note
- * that the "no freeze" path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter
- * here, since it only counts pages with newly frozen tuples
- * (don't confuse that with pages newly set all-frozen in VM).
- */
- }
+ vacrel->frozen_pages++;
+
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts when
+ * the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin.
+ */
+ if (presult.all_visible_except_removable && presult.all_frozen)
+ snapshotConflictHorizon = presult.visibility_cutoff_xid;
else
{
- TransactionId snapshotConflictHorizon;
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ snapshotConflictHorizon = pagefrz.cutoffs->OldestXmin;
+ TransactionIdRetreat(snapshotConflictHorizon);
+ }
- vacrel->frozen_pages++;
+ /* Using same cutoff when setting VM is now unnecessary */
+ if (presult.all_visible_except_removable && presult.all_frozen)
+ presult.visibility_cutoff_xid = InvalidTransactionId;
- /*
- * We can use visibility_cutoff_xid as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM
- * once we're done with it. Otherwise we generate a conservative
- * cutoff by stepping back from OldestXmin.
- */
- if (presult.all_visible_except_removable && presult.all_frozen)
- {
- /* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = presult.visibility_cutoff_xid;
- presult.visibility_cutoff_xid = InvalidTransactionId;
- }
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(vacrel->rel, buf,
+ snapshotConflictHorizon,
+ presult.frozen, presult.nfrozen);
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
+ }
+ else if (presult.all_frozen && presult.nfrozen == 0)
+ {
+ /* Page should be all visible except to-be-removed tuples */
+ Assert(presult.all_visible_except_removable);
+
+ /*
+ * We have no freeze plans to execute, so there's no added cost from
+ * following the freeze path. That's why it was chosen. This is
+ * important in the case where the page only contains totally frozen
+ * tuples at this point (perhaps only following pruning). Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here (note that the "no freeze"
+ * path never sets pages all-frozen).
+ *
+ * We never increment the frozen_pages instrumentation counter here,
+ * since it only counts pages with newly frozen tuples (don't confuse
+ * that with pages newly set all-frozen in VM).
+ */
+ vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
+ vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
}
else
{
--
2.39.2
v8-0008-Execute-freezing-in-heap_page_prune.patchtext/x-patch; charset=UTF-8; name=v8-0008-Execute-freezing-in-heap_page_prune.patchDownload
From 715f95ad939e35760647fb02a2c62a808cd13566 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 20:32:11 -0400
Subject: [PATCH v8 08/22] Execute freezing in heap_page_prune()
As a step toward combining the prune and freeze WAL records, execute
freezing in heap_page_prune(). The logic to determine whether or not to
execute freeze plans was moved from lazy_scan_prune() over to
heap_page_prune() with little modification.
---
src/backend/access/heap/heapam_handler.c | 2 +-
src/backend/access/heap/pruneheap.c | 189 ++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 150 +++++-------------
src/backend/storage/ipc/procarray.c | 6 +-
src/include/access/heapam.h | 52 ++++---
src/tools/pgindent/typedefs.list | 2 +-
6 files changed, 224 insertions(+), 177 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6abfe36dec7..a793c0f56ee 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1106,7 +1106,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
* We ignore unused and redirect line pointers. DEAD line pointers
* should be counted as dead, because we need vacuum to run to get rid
* of them. Note that this rule agrees with the way that
- * heap_page_prune() counts things.
+ * heap_page_prune_and_freeze() counts things.
*/
if (!ItemIdIsNormal(itemid))
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 457650ab651..e009c7579dd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -17,16 +17,19 @@
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
+#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
+#include "commands/vacuum.h"
#include "access/xloginsert.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
-/* Working data for heap_page_prune and subroutines */
+/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
/* tuple visibility test, initialized for the relation */
@@ -51,6 +54,11 @@ typedef struct
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneState;
/* Local functions */
@@ -59,14 +67,15 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- PruneState *prstate, PruneResult *presult);
+ PruneState *prstate, PruneFreezeResult *presult);
+
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult);
+ PruneFreezeResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult);
+ PruneFreezeResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -146,15 +155,15 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
- PruneResult presult;
+ PruneFreezeResult presult;
/*
* For now, pass mark_unused_now as false regardless of whether or
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false, NULL,
- &presult, PRUNE_ON_ACCESS, NULL);
+ heap_page_prune_and_freeze(relation, buffer, vistest, false, NULL,
+ &presult, PRUNE_ON_ACCESS, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -188,7 +197,12 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
/*
- * Prune and repair fragmentation in the specified page.
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * If the page can be marked all-frozen in the visibility map, we may
+ * opportunistically freeze tuples on the page if either its tuples are old
+ * enough or freezing will be cheap enough.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -201,12 +215,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED
* during pruning.
*
- * pagefrz contains both input and output parameters used if the caller is
- * interested in potentially freezing tuples on the page.
+ * pagefrz is an input parameter containing visibility cutoff information and
+ * the current relfrozenxid and relminmxids used if the caller is interested in
+ * freezing tuples on the page.
*
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
- * heap_page_prune() is responsible for initializing it.
+ * heap_page_prune_and_freeze() is responsible for initializing it.
*
* reason indicates why the pruning is performed. It is included in the WAL
* record for debugging and analysis purposes, but otherwise has no effect.
@@ -215,13 +230,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* callback.
*/
void
-heap_page_prune(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc)
+heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ GlobalVisState *vistest,
+ bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -229,6 +244,10 @@ heap_page_prune(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
+ TransactionId visibility_cutoff_xid;
+ bool do_freeze;
+ bool all_visible_except_removable;
+ int64 fpi_before = pgWalUsage.wal_fpi;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -264,9 +283,20 @@ heap_page_prune(Relation relation, Buffer buffer,
* all_visible is also set to true.
*/
presult->all_frozen = true;
- presult->all_visible = true;
- /* for recovery conflicts */
- presult->visibility_cutoff_xid = InvalidTransactionId;
+
+ /*
+ * The visibility cutoff xid is the newest xmin of live tuples on the
+ * page. In the common case, this will be set as the conflict horizon the
+ * caller can use for updating the VM. If, at the end of freezing and
+ * pruning, the page is all-frozen, there is no possibility that any
+ * running transaction on the standby does not see tuples on the page as
+ * all-visible, so the conflict horizon remains InvalidTransactionId.
+ */
+ presult->vm_conflict_horizon = visibility_cutoff_xid = InvalidTransactionId;
+
+ /* For advancing relfrozenxid and relminmxid */
+ presult->new_relfrozenxid = InvalidTransactionId;
+ presult->new_relminmxid = InvalidMultiXactId;
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -291,6 +321,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* prefetching efficiency significantly / decreases the number of cache
* misses.
*/
+ all_visible_except_removable = true;
for (offnum = maxoff;
offnum >= FirstOffsetNumber;
offnum = OffsetNumberPrev(offnum))
@@ -351,13 +382,13 @@ heap_page_prune(Relation relation, Buffer buffer,
* asynchronously. See SetHintBits for more info. Check that
* the tuple is hinted xmin-committed because of that.
*/
- if (presult->all_visible)
+ if (all_visible_except_removable)
{
TransactionId xmin;
if (!HeapTupleHeaderXminCommitted(htup))
{
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
}
@@ -373,25 +404,25 @@ heap_page_prune(Relation relation, Buffer buffer,
if (xmin != FrozenTransactionId &&
!GlobalVisTestIsRemovableXid(vistest, xmin))
{
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
}
/* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, presult->visibility_cutoff_xid) &&
+ if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
- presult->visibility_cutoff_xid = xmin;
+ visibility_cutoff_xid = xmin;
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
/* This is an expected case during concurrent vacuum */
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
default:
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
@@ -407,11 +438,11 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Tuple with storage -- consider need to freeze */
if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &presult->frozen[presult->nfrozen],
+ &prstate.frozen[presult->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- presult->frozen[presult->nfrozen++].offset = offnum;
+ prstate.frozen[presult->nfrozen++].offset = offnum;
}
/*
@@ -438,7 +469,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* pruning and keep all_visible_except_removable to permit freezing if the
* whole page will eventually become all visible after removing tuples.
*/
- presult->all_visible_except_removable = presult->all_visible;
+ presult->all_visible = all_visible_except_removable;
/* Scan the page */
for (offnum = FirstOffsetNumber;
@@ -537,6 +568,86 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ */
+ if (pagefrz)
+ do_freeze = pagefrz->freeze_required ||
+ (all_visible_except_removable && presult->all_frozen &&
+ presult->nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+ else
+ do_freeze = false;
+
+ if (do_freeze)
+ {
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for conflicts
+ * when the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin. This avoids false conflicts when
+ * hot_standby_feedback is in use.
+ */
+ if (all_visible_except_removable && presult->all_frozen)
+ frz_conflict_horizon = visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(relation, buffer,
+ frz_conflict_horizon,
+ prstate.frozen, presult->nfrozen);
+ }
+ else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
+ {
+ /*
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all frozen and there
+ * will be no newly frozen tuples.
+ */
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+
+ /*
+ * For callers planning to update the visibility map, the conflict horizon
+ * for that record must be the newest xmin on the page. However, if the
+ * page is completely frozen, there can be no conflict and the
+ * vm_conflict_horizon should remain InvalidTransactionId.
+ */
+ if (!presult->all_frozen)
+ presult->vm_conflict_horizon = visibility_cutoff_xid;
+
+ if (pagefrz)
+ {
+ /*
+ * If we will freeze tuples on the page or, even if we don't freeze
+ * tuples on the page, if we will set the page all-frozen in the
+ * visibility map, we can advance relfrozenxid and relminmxid to the
+ * values in pagefrz->FreezePageRelfrozenXid and
+ * pagefrz->FreezePageRelminMxid.
+ */
+ if (presult->all_frozen || presult->nfrozen > 0)
+ {
+ presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
+ }
+ else
+ {
+ presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
+ }
+ }
}
@@ -594,7 +705,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- PruneState *prstate, PruneResult *presult)
+ PruneState *prstate, PruneFreezeResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -859,10 +970,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
/*
* We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune caused us to
- * visit the dead successor of a redirect item before visiting the
- * redirect item. We can clean up by setting the redirect item to
- * DEAD state or LP_UNUSED if the caller indicated.
+ * item. This can happen if the loop in heap_page_prune_and_freeze()
+ * caused us to visit the dead successor of a redirect item before
+ * visiting the redirect item. We can clean up by setting the
+ * redirect item to DEAD state or LP_UNUSED if the caller indicated.
*/
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
@@ -902,7 +1013,7 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult)
+ PruneFreezeResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
@@ -925,7 +1036,7 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
*/
static void
heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult)
+ PruneFreezeResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f474e661428..8beef4093ae 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,12 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* as an upper bound on the XIDs stored in the pages we'll actually scan
* (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
*
- * Next acquire vistest, a related cutoff that's used in heap_page_prune.
- * We expect vistest will always make heap_page_prune remove any deleted
- * tuple whose xmax is < OldestXmin. lazy_scan_prune must never become
- * confused about whether a tuple should be frozen or removed. (In the
- * future we might want to teach lazy_scan_prune to recompute vistest from
- * time to time, to increase the number of dead tuples it can prune away.)
+ * Next acquire vistest, a related cutoff that's used in
+ * heap_page_prune_and_freeze(). We expect vistest will always make
+ * heap_page_prune_and_freeze() remove any deleted tuple whose xmax is <
+ * OldestXmin. (In the future we might want to teach lazy_scan_prune to
+ * recompute vistest from time to time, to increase the number of dead
+ * tuples it can prune away.)
*/
vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
@@ -1378,21 +1378,21 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
- * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
- * whether or not a tuple should be considered DEAD. This happened when an
- * inserting transaction concurrently aborted (after our heap_page_prune()
- * call, before our HeapTupleSatisfiesVacuum() call). There was rather a lot
- * of complexity just so we could deal with tuples that were DEAD to VACUUM,
- * but nevertheless were left with storage after pruning.
+ * Prior to PostgreSQL 14 there were very rare cases where
+ * heap_page_prune_and_freeze() was allowed to disagree with our
+ * HeapTupleSatisfiesVacuum() call about whether or not a tuple should be
+ * considered DEAD. This happened when an inserting transaction concurrently
+ * aborted (after our heap_page_prune_and_freeze() call, before our
+ * HeapTupleSatisfiesVacuum() call). There was rather a lot of complexity just
+ * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
+ * left with storage after pruning.
*
* As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune()'s visibility check. Without the second call to
- * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
- * disagreement. We'll just handle such tuples as if they had become fully dead
- * right after this operation completes instead of in the middle of it. Note that
- * any tuple that becomes dead after the call to heap_page_prune() can't need to
- * be frozen, because it was visible to another session when vacuum started.
+ * result of heap_page_prune_and_freeze()'s visibility check. Without the
+ * second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
+ * there can be no disagreement. We'll just handle such tuples as if they had
+ * become fully dead right after this operation completes instead of in the
+ * middle of it.
*
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
@@ -1415,26 +1415,24 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
- PruneResult presult;
+ PruneFreezeResult presult;
int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool do_freeze;
- int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
/*
* maxoff might be reduced following line pointer array truncation in
- * heap_page_prune. That's safe for us to ignore, since the reclaimed
- * space will continue to look like LP_UNUSED items below.
+ * heap_page_prune_and_freeze(). That's safe for us to ignore, since the
+ * reclaimed space will continue to look like LP_UNUSED items below.
*/
maxoff = PageGetMaxOffsetNumber(page);
- /* Initialize (or reset) page-level state */
+ /* Initialize pagefrz */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
@@ -1446,7 +1444,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples = 0;
/*
- * Prune all HOT-update chains in this page.
+ * Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
* in presult.ndeleted. It should not be confused with lpdead_items;
@@ -1457,8 +1455,8 @@ lazy_scan_prune(LVRelState *vacrel,
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
* false otherwise.
*/
- heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0, &pagefrz,
- &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
+ heap_page_prune_and_freeze(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+ &pagefrz, &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and update the variables set
@@ -1571,86 +1569,20 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = InvalidOffsetNumber;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- do_freeze = pagefrz.freeze_required ||
- (presult.all_visible_except_removable && presult.all_frozen &&
- presult.nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
+ Assert(MultiXactIdIsValid(presult.new_relminmxid));
+ vacrel->NewRelfrozenXid = presult.new_relfrozenxid;
+ Assert(TransactionIdIsValid(presult.new_relfrozenxid));
+ vacrel->NewRelminMxid = presult.new_relminmxid;
- if (do_freeze)
+ if (presult.nfrozen > 0)
{
- TransactionId snapshotConflictHorizon;
-
/*
- * We're freezing the page. Our final NewRelfrozenXid doesn't need to
- * be affected by the XIDs that are just about to be frozen anyway.
+ * We never increment the frozen_pages instrumentation counter when
+ * nfrozen == 0, since it only counts pages with newly frozen tuples
+ * (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
-
vacrel->frozen_pages++;
- /*
- * We can use frz_conflict_horizon as our cutoff for conflicts when
- * the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin.
- */
- if (presult.all_visible_except_removable && presult.all_frozen)
- snapshotConflictHorizon = presult.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = pagefrz.cutoffs->OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
-
- /* Using same cutoff when setting VM is now unnecessary */
- if (presult.all_visible_except_removable && presult.all_frozen)
- presult.visibility_cutoff_xid = InvalidTransactionId;
-
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
-
- }
- else if (presult.all_frozen && presult.nfrozen == 0)
- {
- /* Page should be all visible except to-be-removed tuples */
- Assert(presult.all_visible_except_removable);
-
- /*
- * We have no freeze plans to execute, so there's no added cost from
- * following the freeze path. That's why it was chosen. This is
- * important in the case where the page only contains totally frozen
- * tuples at this point (perhaps only following pruning). Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here (note that the "no freeze"
- * path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter here,
- * since it only counts pages with newly frozen tuples (don't confuse
- * that with pages newly set all-frozen in VM).
- */
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- }
- else
- {
- /*
- * Page requires "no freeze" processing. It might be set all-visible
- * in the visibility map, but it can never be set all-frozen.
- */
- vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- presult.all_frozen = false;
- presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1676,7 +1608,7 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.visibility_cutoff_xid);
+ debug_cutoff == presult.vm_conflict_horizon);
}
#endif
@@ -1730,7 +1662,7 @@ lazy_scan_prune(LVRelState *vacrel,
if (presult.all_frozen)
{
- Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1750,7 +1682,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, presult.visibility_cutoff_xid,
+ vmbuffer, presult.vm_conflict_horizon,
flags);
}
@@ -1815,11 +1747,11 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Set the page all-frozen (and all-visible) in the VM.
*
- * We can pass InvalidTransactionId as our visibility_cutoff_xid,
- * since a snapshotConflictHorizon sufficient to make everything safe
- * for REDO was logged when the page's tuples were frozen.
+ * We can pass InvalidTransactionId as our vm_conflict_horizon, since
+ * a snapshotConflictHorizon sufficient to make everything safe for
+ * REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index b3cd248fb64..88a6d504dff 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1715,9 +1715,9 @@ TransactionIdIsActive(TransactionId xid)
* Note: the approximate horizons (see definition of GlobalVisState) are
* updated by the computations done here. That's currently required for
* correctness and a small optimization. Without doing so it's possible that
- * heap vacuum's call to heap_page_prune() uses a more conservative horizon
- * than later when deciding which tuples can be removed - which the code
- * doesn't expect (breaking HOT).
+ * heap vacuum's call to heap_page_prune_and_freeze() uses a more conservative
+ * horizon than later when deciding which tuples can be removed - which the
+ * code doesn't expect (breaking HOT).
*/
static void
ComputeXidHorizons(ComputeXidHorizonsResult *h)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9d047621ea5..de11c166575 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -195,13 +195,13 @@ typedef struct HeapPageFreeze
/*
* Per-page state returned from pruning
*/
-typedef struct PruneResult
+typedef struct PruneFreezeResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
/*
- * The rest of the fields in PruneResult are only guaranteed to be
+ * The rest of the fields in PruneFreezeResult are only guaranteed to be
* initialized if heap_page_prune is passed PruneReason VACUUM_SCAN.
*/
@@ -212,23 +212,22 @@ typedef struct PruneResult
*/
bool all_visible;
- /*
- * Whether or not the page is all-visible except for tuples which will be
- * removed during vacuum's second pass. This is used by VACUUM to
- * determine whether or not to consider opportunistically freezing the
- * page.
- */
- bool all_visible_except_removable;
-
/* Whether or not the page can be set all-frozen in the VM */
bool all_frozen;
- TransactionId visibility_cutoff_xid; /* Newest xmin on the page */
+
+ /*
+ * If the page is all-visible and not all-frozen this is the oldest xid
+ * that can see the page as all-visible. It is to be used as the snapshot
+ * conflict horizon when emitting a XLOG_HEAP2_VISIBLE record.
+ */
+ TransactionId vm_conflict_horizon;
/*
* Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
*
* This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
* 1. Otherwise every access would need to subtract 1.
@@ -242,9 +241,14 @@ typedef struct PruneResult
* One entry for every tuple that we may freeze.
*/
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
-} PruneResult;
+ /* New value of relfrozenxid found by heap_page_prune_and_freeze() */
+ TransactionId new_relfrozenxid;
+
+ /* New value of relminmxid found by heap_page_prune_and_freeze() */
+ MultiXactId new_relminmxid;
+} PruneFreezeResult;
-/* 'reason' codes for heap_page_prune() */
+/* 'reason' codes for heap_page_prune_and_freeze() */
typedef enum
{
PRUNE_ON_ACCESS, /* on-access pruning */
@@ -254,7 +258,7 @@ typedef enum
/*
* Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneResult.htsv for details. This helper function is meant to
+ * of int8. See PruneFreezeResult.htsv for details. This helper function is meant to
* guard against examining visibility status array members which have not yet
* been computed.
*/
@@ -361,13 +365,13 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc);
+extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ struct GlobalVisState *vistest,
+ bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cfa9d5aaeac..5737bc5b945 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2191,8 +2191,8 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
+PruneFreezeResult
PruneReason
-PruneResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.39.2
v8-0009-Make-opp-freeze-heuristic-compatible-with-prune-f.patchtext/x-patch; charset=UTF-8; name=v8-0009-Make-opp-freeze-heuristic-compatible-with-prune-f.patchDownload
From 11154f36cb24f38ce67ba54aa2e5e603643ed71a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 20:48:11 -0400
Subject: [PATCH v8 09/22] Make opp freeze heuristic compatible with
prune+freeze record
Once the prune and freeze records are combined, we will no longer be
able to use a test of whether or not pruning emitted an FPI to decide
whether or not to opportunistically freeze a freezable page.
While this heuristic should be improved, for now, approximate the
previous logic by keeping track of whether or not a hint bit FPI was
emitted during visibility checks (when checksums are on) and combine
that with checking XLogCheckBufferNeedsBackup(). If we just finished
deciding whether or not to prune and the current buffer seems to need an
FPI after modification, it is likely that pruning would have emitted an
FPI.
---
src/backend/access/heap/pruneheap.c | 57 +++++++++++++++++++++--------
1 file changed, 42 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e009c7579dd..d38de9b063d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -247,6 +247,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId visibility_cutoff_xid;
bool do_freeze;
bool all_visible_except_removable;
+ bool do_prune;
+ bool whole_page_freezable;
+ bool hint_bit_fpi;
+ bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
/*
@@ -456,6 +460,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
}
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted. Then reset fpi_before for no prune case.
+ */
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ fpi_before = pgWalUsage.wal_fpi;
+
/*
* For vacuum, if the whole page will become frozen, we consider
* opportunistically freezing tuples. Dead tuples which will be removed by
@@ -500,11 +511,41 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = InvalidOffsetNumber;
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
+ /*
+ * Only incur overhead of checking if we will do an FPI if we might use
+ * the information.
+ */
+ if (do_prune && pagefrz)
+ prune_fpi = XLogCheckBufferNeedsBackup(buffer);
+
+ /* Is the whole page freezable? And is there something to freeze */
+ whole_page_freezable = all_visible_except_removable &&
+ presult->all_frozen;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and prune
+ * records are combined, this heuristic couldn't be used anymore. The
+ * opportunistic freeze heuristic must be improved; however, for now, try
+ * to approximate it.
+ */
+ do_freeze = pagefrz &&
+ (pagefrz->freeze_required ||
+ (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
/* Have we found any prunable items? */
- if (prstate.nredirected > 0 || prstate.ndead > 0 || prstate.nunused > 0)
+ if (do_prune)
{
/*
* Apply the planned item changes, then repair page fragmentation, and
@@ -569,20 +610,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- if (pagefrz)
- do_freeze = pagefrz->freeze_required ||
- (all_visible_except_removable && presult->all_frozen &&
- presult->nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
- else
- do_freeze = false;
-
if (do_freeze)
{
TransactionId frz_conflict_horizon = InvalidTransactionId;
--
2.39.2
v8-0010-Separate-tuple-pre-freeze-checks-and-invoke-earli.patchtext/x-patch; charset=UTF-8; name=v8-0010-Separate-tuple-pre-freeze-checks-and-invoke-earli.patchDownload
From 958eb843b36b5e40f1af837a3a4d2a139402c5a4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 20:54:37 -0400
Subject: [PATCH v8 10/22] Separate tuple pre freeze checks and invoke earlier
When combining the prune and freeze records their critical sections will
have to be combined. heap_freeze_execute_prepared() does a set of pre
freeze validations before starting its critical section. Move these
validations into a helper function, heap_pre_freeze_checks(), and invoke
it in heap_page_prune() before the pruning critical section.
---
src/backend/access/heap/heapam.c | 58 ++++++++++++++++-------------
src/backend/access/heap/pruneheap.c | 41 +++++++++++---------
src/include/access/heapam.h | 3 ++
3 files changed, 59 insertions(+), 43 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index bb856690234..b3119de2aa6 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6762,35 +6762,19 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
- */
+* Perform xmin/xmax XID status sanity checks before calling
+* heap_freeze_execute_prepared().
+*
+* heap_prepare_freeze_tuple doesn't perform these checks directly because
+* pg_xact lookups are relatively expensive. They shouldn't be repeated
+* by successive VACUUMs that each decide against freezing the same page.
+*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- /*
- * Perform xmin/xmax XID status sanity checks before critical section.
- *
- * heap_prepare_freeze_tuple doesn't perform these checks directly because
- * pg_xact lookups are relatively expensive. They shouldn't be repeated
- * by successive VACUUMs that each decide against freezing the same page.
- */
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6829,6 +6813,30 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xmax)));
}
}
+}
+
+/*
+ * heap_freeze_execute_prepared
+ *
+ * Executes freezing of one or more heap tuples on a page on behalf of caller.
+ * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
+ * Caller must set 'offset' in each plan for us. Note that we destructively
+ * sort caller's tuples array in-place, so caller had better be done with it.
+ *
+ * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
+ * later on without any risk of unsafe pg_xact lookups, even following a hard
+ * crash (or when querying from a standby). We represent freezing by setting
+ * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
+ * See section on buffer access rules in src/backend/storage/buffer/README.
+ */
+void
+heap_freeze_execute_prepared(Relation rel, Buffer buffer,
+ TransactionId snapshotConflictHorizon,
+ HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
+
+ Assert(ntuples > 0);
START_CRIT_SECTION();
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d38de9b063d..fe463ad7146 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -245,6 +245,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PruneState prstate;
HeapTupleData tup;
TransactionId visibility_cutoff_xid;
+ TransactionId frz_conflict_horizon;
bool do_freeze;
bool all_visible_except_removable;
bool do_prune;
@@ -297,6 +298,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all-visible, so the conflict horizon remains InvalidTransactionId.
*/
presult->vm_conflict_horizon = visibility_cutoff_xid = InvalidTransactionId;
+ frz_conflict_horizon = InvalidTransactionId;
/* For advancing relfrozenxid and relminmxid */
presult->new_relfrozenxid = InvalidTransactionId;
@@ -541,6 +543,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
(pagefrz->freeze_required ||
(whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+ if (do_freeze)
+ {
+ heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
+
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for conflicts
+ * when the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin. This avoids false conflicts when
+ * hot_standby_feedback is in use.
+ */
+ if (all_visible_except_removable && presult->all_frozen)
+ frz_conflict_horizon = visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+ }
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -612,24 +635,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- TransactionId frz_conflict_horizon = InvalidTransactionId;
-
- /*
- * We can use the visibility_cutoff_xid as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin. This avoids false conflicts when
- * hot_standby_feedback is in use.
- */
- if (all_visible_except_removable && presult->all_frozen)
- frz_conflict_horizon = visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
-
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(relation, buffer,
frz_conflict_horizon,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index de11c166575..cc3b3346bc4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,6 +342,9 @@ extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
+
+extern void heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
TransactionId snapshotConflictHorizon,
HeapTupleFreeze *tuples, int ntuples);
--
2.39.2
v8-0011-Remove-heap_freeze_execute_prepared.patchtext/x-patch; charset=UTF-8; name=v8-0011-Remove-heap_freeze_execute_prepared.patchDownload
From d70c6dba26926c5141d2664104cd01bf521a3b6e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 09:10:14 -0400
Subject: [PATCH v8 11/22] Remove heap_freeze_execute_prepared()
In order to merge freeze and prune records, the execution of tuple
freezing and the WAL logging of the changes to the page must be
separated so that the WAL logging can be combined with prune WAL
logging. This commit makes a helper for the tuple freezing and then
inlines the contents of heap_freeze_execute_prepared() where it is
called in heap_page_prune().
---
src/backend/access/heap/heapam.c | 49 +++++++----------------------
src/backend/access/heap/pruneheap.c | 22 ++++++++++---
src/include/access/heapam.h | 28 +++++++++--------
3 files changed, 44 insertions(+), 55 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b3119de2aa6..41c1c7d286f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6445,9 +6445,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* XIDs or MultiXactIds that will need to be processed by a future VACUUM.
*
* VACUUM caller must assemble HeapTupleFreeze freeze plan entries for every
- * tuple that we returned true for, and call heap_freeze_execute_prepared to
- * execute freezing. Caller must initialize pagefrz fields for page as a
- * whole before first call here for each heap page.
+ * tuple that we returned true for, and then execute freezing. Caller must
+ * initialize pagefrz fields for page as a whole before first call here for
+ * each heap page.
*
* VACUUM caller decides on whether or not to freeze the page as a whole.
* We'll often prepare freeze plans for a page that caller just discards.
@@ -6762,8 +6762,8 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
-* Perform xmin/xmax XID status sanity checks before calling
-* heap_freeze_execute_prepared().
+* Perform xmin/xmax XID status sanity checks before actually executing freeze
+* plans.
*
* heap_prepare_freeze_tuple doesn't perform these checks directly because
* pg_xact lookups are relatively expensive. They shouldn't be repeated
@@ -6816,30 +6816,17 @@ heap_pre_freeze_checks(Buffer buffer,
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
+ * Helper which executes freezing of one or more heap tuples on a page on
+ * behalf of caller. Caller passes an array of tuple plans from
+ * heap_prepare_freeze_tuple. Caller must set 'offset' in each plan for us.
+ * Must be called in a critical section that also marks the buffer dirty and,
+ * if needed, emits WAL.
*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- START_CRIT_SECTION();
-
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6851,20 +6838,6 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
}
MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(rel))
- {
- log_heap_prune_and_freeze(rel, buffer, snapshotConflictHorizon,
- false, /* no cleanup lock required */
- PRUNE_VACUUM_SCAN,
- tuples, ntuples,
- NULL, 0, /* redirected */
- NULL, 0, /* dead */
- NULL, 0); /* unused */
- }
-
- END_CRIT_SECTION();
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fe463ad7146..8914d4bf5c8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -635,10 +635,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(relation, buffer,
- frz_conflict_horizon,
- prstate.frozen, presult->nfrozen);
+ START_CRIT_SECTION();
+
+ Assert(presult->nfrozen > 0);
+
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+
+ MarkBufferDirty(buffer);
+
+ /* Now WAL-log freezing if necessary */
+ if (RelationNeedsWAL(relation))
+ log_heap_prune_and_freeze(relation, buffer,
+ frz_conflict_horizon, false, reason,
+ prstate.frozen, presult->nfrozen,
+ NULL, 0, /* redirected */
+ NULL, 0, /* dead */
+ NULL, 0); /* unused */
+
+ END_CRIT_SECTION();
}
else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
{
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index cc3b3346bc4..897f3bc50c9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -14,6 +14,7 @@
#ifndef HEAPAM_H
#define HEAPAM_H
+#include "access/heapam_xlog.h"
#include "access/relation.h" /* for backward compatibility */
#include "access/relscan.h"
#include "access/sdir.h"
@@ -101,8 +102,8 @@ typedef enum
} HTSV_Result;
/*
- * heap_prepare_freeze_tuple may request that heap_freeze_execute_prepared
- * check any tuple's to-be-frozen xmin and/or xmax status using pg_xact
+ * heap_prepare_freeze_tuple may request that any tuple's to-be-frozen xmin
+ * and/or xmax status is checked using pg_xact during freezing execution.
*/
#define HEAP_FREEZE_CHECK_XMIN_COMMITTED 0x01
#define HEAP_FREEZE_CHECK_XMAX_ABORTED 0x02
@@ -154,14 +155,14 @@ typedef struct HeapPageFreeze
/*
* "Freeze" NewRelfrozenXid/NewRelminMxid trackers.
*
- * Trackers used when heap_freeze_execute_prepared freezes, or when there
- * are zero freeze plans for a page. It is always valid for vacuumlazy.c
- * to freeze any page, by definition. This even includes pages that have
- * no tuples with storage to consider in the first place. That way the
- * 'totally_frozen' results from heap_prepare_freeze_tuple can always be
- * used in the same way, even when no freeze plans need to be executed to
- * "freeze the page". Only the "freeze" path needs to consider the need
- * to set pages all-frozen in the visibility map under this scheme.
+ * Trackers used when tuples will be frozen, or when there are zero freeze
+ * plans for a page. It is always valid for vacuumlazy.c to freeze any
+ * page, by definition. This even includes pages that have no tuples with
+ * storage to consider in the first place. That way the 'totally_frozen'
+ * results from heap_prepare_freeze_tuple can always be used in the same
+ * way, even when no freeze plans need to be executed to "freeze the
+ * page". Only the "freeze" path needs to consider the need to set pages
+ * all-frozen in the visibility map under this scheme.
*
* When we freeze a page, we generally freeze all XIDs < OldestXmin, only
* leaving behind XIDs that are ineligible for freezing, if any. And so
@@ -345,12 +346,13 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
extern void heap_pre_freeze_checks(Buffer buffer,
HeapTupleFreeze *tuples, int ntuples);
-extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples);
+
+extern void heap_freeze_prepared_tuples(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern bool heap_freeze_tuple(HeapTupleHeader tuple,
TransactionId relfrozenxid, TransactionId relminmxid,
TransactionId FreezeLimit, TransactionId MultiXactCutoff);
+
extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
TransactionId *NoFreezePageRelfrozenXid,
--
2.39.2
v8-0012-Merge-prune-and-freeze-records.patchtext/x-patch; charset=UTF-8; name=v8-0012-Merge-prune-and-freeze-records.patchDownload
From b66212de1d046dea5bbc238a6f26f1d8d9f712a7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 09:37:46 -0400
Subject: [PATCH v8 12/22] Merge prune and freeze records
When both pruning and freezing is done, this means a single, combined
WAL record is emitted for both operations. This will reduce the number
of WAL records emitted.
When there are only tuples to freeze present, we can avoid taking a full
cleanup lock when replaying the record.
---
src/backend/access/heap/heapam.c | 2 -
src/backend/access/heap/pruneheap.c | 215 +++++++++++++++-------------
2 files changed, 114 insertions(+), 103 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 41c1c7d286f..aefc0be0dd3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6836,8 +6836,6 @@ heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
htup = (HeapTupleHeader) PageGetItem(page, itemid);
heap_execute_freeze_tuple(htup, frz);
}
-
- MarkBufferDirty(buffer);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8914d4bf5c8..db8a182a197 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -249,9 +249,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool all_visible_except_removable;
bool do_prune;
- bool whole_page_freezable;
+ bool do_hint;
bool hint_bit_fpi;
- bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
/*
@@ -464,10 +463,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
- * an FPI to be emitted. Then reset fpi_before for no prune case.
+ * an FPI to be emitted.
*/
hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
- fpi_before = pgWalUsage.wal_fpi;
/*
* For vacuum, if the whole page will become frozen, we consider
@@ -517,16 +515,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /* Record number of newly-set-LP_DEAD items for caller */
+ presult->nnewlpdead = prstate.ndead;
+
/*
- * Only incur overhead of checking if we will do an FPI if we might use
- * the information.
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
*/
- if (do_prune && pagefrz)
- prune_fpi = XLogCheckBufferNeedsBackup(buffer);
-
- /* Is the whole page freezable? And is there something to freeze */
- whole_page_freezable = all_visible_except_removable &&
- presult->all_frozen;
+ do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
/*
* Freeze the page when heap_prepare_freeze_tuple indicates that at least
@@ -539,46 +537,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* opportunistic freeze heuristic must be improved; however, for now, try
* to approximate it.
*/
- do_freeze = pagefrz &&
- (pagefrz->freeze_required ||
- (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
- if (do_freeze)
+ do_freeze = false;
+ if (pagefrz)
{
- heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
+ /* Is the whole page freezable? And is there something to freeze? */
+ bool whole_page_freezable = all_visible_except_removable &&
+ presult->all_frozen;
- /*
- * We can use the visibility_cutoff_xid as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin. This avoids false conflicts when
- * hot_standby_feedback is in use.
- */
- if (all_visible_except_removable && presult->all_frozen)
- frz_conflict_horizon = visibility_cutoff_xid;
- else
+ if (pagefrz->freeze_required)
+ do_freeze = true;
+ else if (whole_page_freezable && presult->nfrozen > 0)
{
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
+ /*
+ * Freezing would make the page all-frozen. In this case, we will
+ * freeze if we have already emitted an FPI or will do so anyway.
+ * Be sure only to incur the overhead of checking if we will do an
+ * FPI if we may use that information.
+ */
+ if (hint_bit_fpi ||
+ ((do_prune || do_hint) && XLogCheckBufferNeedsBackup(buffer)))
+ {
+ do_freeze = true;
+ }
}
}
- /* Any error while applying the changes is critical */
- START_CRIT_SECTION();
+ /*
+ * Validate the tuples we are considering freezing. We do this even if
+ * pruning and hint bit setting have not emitted an FPI so far because we
+ * still may emit an FPI while setting the page hint bit later. But we
+ * want to avoid doing the pre-freeze checks in a critical section.
+ */
+ if (do_freeze)
+ heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
- /* Have we found any prunable items? */
- if (do_prune)
+ if (!do_freeze && (!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
{
/*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all-frozen and there
+ * will be no newly frozen tuples.
*/
- heap_page_prune_execute(buffer, false,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+
+ /* Any error while applying the changes is critical */
+ START_CRIT_SECTION();
+ if (do_hint)
+ {
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
* XID of any soon-prunable tuple.
@@ -586,12 +595,52 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
/*
- * Also clear the "page is full" flag, since there's no point in
- * repeating the prune/defrag process until something else happens to
- * the page.
+ * Clear the "page is full" flag if it is set since there's no point
+ * in repeating the prune/defrag process until something else happens
+ * to the page.
*/
PageClearFull(page);
+ /*
+ * We only needed to update pd_prune_xid and clear the page-is-full
+ * hint bit, this is a non-WAL-logged hint. If we will also freeze or
+ * prune the page, we will mark the buffer dirty below.
+ */
+ if (!do_freeze && !do_prune)
+ MarkBufferDirtyHint(buffer, true);
+ }
+
+ if (do_prune || do_freeze)
+ {
+ /* Apply the planned item changes, then repair page fragmentation. */
+ if (do_prune)
+ {
+ heap_page_prune_execute(buffer, false,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for
+ * conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin. This
+ * avoids false conflicts when hot_standby_feedback is in use.
+ */
+ if (all_visible_except_removable && presult->all_frozen)
+ frz_conflict_horizon = visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -599,72 +648,35 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
+ /*
+ * The snapshotConflictHorizon for the whole record should be the
+ * most conservative of all the horizons calculated for any of the
+ * possible modifications. If this record will prune tuples, any
+ * transactions on the standby older than the youngest xmax of the
+ * most recently removed tuple this record will prune will
+ * conflict. If this record will freeze tuples, any transactions
+ * on the standby with xids older than the youngest tuple this
+ * record will freeze will conflict.
+ */
+ TransactionId conflict_xid;
+
+ if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
+ conflict_xid = frz_conflict_horizon;
+ else
+ conflict_xid = prstate.latest_xid_removed;
+
log_heap_prune_and_freeze(relation, buffer,
- prstate.latest_xid_removed,
+ conflict_xid,
true, reason,
- NULL, 0,
+ prstate.frozen, presult->nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
}
}
- else
- {
- /*
- * If we didn't prune anything, but have found a new value for the
- * pd_prune_xid field, update it and mark the buffer dirty. This is
- * treated as a non-WAL-logged hint.
- *
- * Also clear the "page is full" flag if it is set, since there's no
- * point in repeating the prune/defrag process until something else
- * happens to the page.
- */
- if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page))
- {
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
- PageClearFull(page);
- MarkBufferDirtyHint(buffer, true);
- }
- }
END_CRIT_SECTION();
- /* Record number of newly-set-LP_DEAD items for caller */
- presult->nnewlpdead = prstate.ndead;
-
- if (do_freeze)
- {
- START_CRIT_SECTION();
-
- Assert(presult->nfrozen > 0);
-
- heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
-
- MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(relation))
- log_heap_prune_and_freeze(relation, buffer,
- frz_conflict_horizon, false, reason,
- prstate.frozen, presult->nfrozen,
- NULL, 0, /* redirected */
- NULL, 0, /* dead */
- NULL, 0); /* unused */
-
- END_CRIT_SECTION();
- }
- else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
- {
- /*
- * If we will neither freeze tuples on the page nor set the page all
- * frozen in the visibility map, the page is not all frozen and there
- * will be no newly frozen tuples.
- */
- presult->all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumentation */
- }
-
/*
* For callers planning to update the visibility map, the conflict horizon
* for that record must be the newest xmin on the page. However, if the
@@ -681,9 +693,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* tuples on the page, if we will set the page all-frozen in the
* visibility map, we can advance relfrozenxid and relminmxid to the
* values in pagefrz->FreezePageRelfrozenXid and
- * pagefrz->FreezePageRelminMxid.
+ * pagefrz->FreezePageRelminMxid. MFIXME: which one should be pick if
+ * presult->nfrozen == 0 and presult->all_frozen = True.
*/
- if (presult->all_frozen || presult->nfrozen > 0)
+ if (presult->nfrozen > 0)
{
presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
--
2.39.2
v8-0013-Set-hastup-in-heap_page_prune.patchtext/x-patch; charset=UTF-8; name=v8-0013-Set-hastup-in-heap_page_prune.patchDownload
From d03659b9c41a083d07421a2d18da5c81663b5a4d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 09:56:02 -0400
Subject: [PATCH v8 13/22] Set hastup in heap_page_prune
lazy_scan_prune() loops through the line pointers and tuple visibility
information for each tuple on a page, setting hastup to true if there
are any LP_REDIRECT line pointers or tuples with storage which will not
be removed. We want to remove this extra loop from lazy_scan_prune(),
and we know about non-removable tuples during heap_page_prune() anyway.
Set hastup when recording LP_REDIRECT line pointers in
heap_prune_chain() and when LP_NORMAL line pointers refer to tuples
whose visibility status is not HEAPTUPLE_DEAD.
---
src/backend/access/heap/pruneheap.c | 64 ++++++++++++++++++----------
src/backend/access/heap/vacuumlazy.c | 24 +----------
src/include/access/heapam.h | 3 ++
3 files changed, 46 insertions(+), 45 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index db8a182a197..f8966d06cd2 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -71,7 +71,8 @@ static int heap_prune_chain(Buffer buffer,
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum);
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneFreezeResult *presult);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
PruneFreezeResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
@@ -279,6 +280,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->nnewlpdead = 0;
presult->nfrozen = 0;
+ presult->hastup = false;
+
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
* freezing tuples. Keep track of whether or not the page is all_visible
@@ -434,30 +437,42 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
}
- /*
- * Consider freezing any normal tuples which will not be removed
- */
- if (presult->htsv[offnum] != HEAPTUPLE_DEAD && pagefrz)
+ if (presult->htsv[offnum] != HEAPTUPLE_DEAD)
{
- bool totally_frozen;
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the
+ * soft assumption that any LP_DEAD items encountered here will
+ * become LP_UNUSED later on, before count_nondeletable_pages is
+ * reached. If we don't make this assumption then rel truncation
+ * will only happen every other VACUUM, at most. Besides, VACUUM
+ * must treat hastup/nonempty_pages as provisional no matter how
+ * LP_DEAD items are handled (handled here, or handled later on).
+ */
+ presult->hastup = true;
- /* Tuple with storage -- consider need to freeze */
- if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &prstate.frozen[presult->nfrozen],
- &totally_frozen)))
+ /* Consider freezing any normal tuples which will not be removed */
+ if (pagefrz)
{
- /* Save prepared freeze plan for later */
- prstate.frozen[presult->nfrozen++].offset = offnum;
- }
+ bool totally_frozen;
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the
- * page definitely cannot be set all-frozen in the visibility map
- * later on
- */
- if (!totally_frozen)
- presult->all_frozen = false;
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, pagefrz,
+ &prstate.frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ prstate.frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or
+ * eligible to become totally frozen (according to its freeze
+ * plan), then the page definitely cannot be set all-frozen in
+ * the visibility map later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+ }
}
}
@@ -1023,7 +1038,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (i >= nchain)
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
- heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
+ heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], presult);
}
else if (nchain < 2 && ItemIdIsRedirected(rootlp))
{
@@ -1057,7 +1072,8 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
/* Record line pointer to be redirected */
static void
heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum)
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneFreezeResult *presult)
{
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
@@ -1067,6 +1083,8 @@ heap_prune_record_redirect(PruneState *prstate,
prstate->marked[offnum] = true;
Assert(!prstate->marked[rdoffnum]);
prstate->marked[rdoffnum] = true;
+
+ presult->hastup = true;
}
/* Record line pointer to be marked dead */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8beef4093ae..68258d083ab 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1420,7 +1420,6 @@ lazy_scan_prune(LVRelState *vacrel,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
- bool hastup = false;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1473,28 +1472,12 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- if (!ItemIdIsUsed(itemid))
- continue;
-
/* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid))
- {
- /* page makes rel truncation unsafe */
- hastup = true;
+ if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
continue;
- }
if (ItemIdIsDead(itemid))
{
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- */
deadoffsets[lpdead_items++] = offnum;
continue;
}
@@ -1562,9 +1545,6 @@ lazy_scan_prune(LVRelState *vacrel,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
-
- hastup = true; /* page makes rel truncation unsafe */
-
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1643,7 +1623,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->recently_dead_tuples += recently_dead_tuples;
/* Can't truncate this page */
- if (hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 897f3bc50c9..71c59793da7 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -216,6 +216,9 @@ typedef struct PruneFreezeResult
/* Whether or not the page can be set all-frozen in the VM */
bool all_frozen;
+ /* Whether or not the page makes rel truncation unsafe */
+ bool hastup;
+
/*
* If the page is all-visible and not all-frozen this is the oldest xid
* that can see the page as all-visible. It is to be used as the snapshot
--
2.39.2
v8-0014-Count-tuples-for-vacuum-logging-in-heap_page_prun.patchtext/x-patch; charset=UTF-8; name=v8-0014-Count-tuples-for-vacuum-logging-in-heap_page_prun.patchDownload
From d73fbc0fe0e2df16b993c53940e6c20850fadbff Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 10:04:38 -0400
Subject: [PATCH v8 14/22] Count tuples for vacuum logging in heap_page_prune
lazy_scan_prune() loops through all of the tuple visibility information
that was recorded in heap_page_prune() and then counts live and recently
dead tuples. That information is available in heap_page_prune(), so just
record it there. Add live and recently dead tuple counters to the
PruneResult. Doing this counting in heap_page_prune() eliminates the
need for saving the tuple visibility status information in the
PruneResult. Instead, save it in the PruneState where it can be
referenced by heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 98 ++++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 93 +-------------------------
src/include/access/heapam.h | 36 ++--------
3 files changed, 97 insertions(+), 130 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f8966d06cd2..ee557c9ed35 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -55,6 +55,18 @@ typedef struct
*/
bool marked[MaxHeapTuplesPerPage + 1];
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
+
/*
* One entry for every tuple that we may freeze.
*/
@@ -69,6 +81,7 @@ static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
PruneState *prstate, PruneFreezeResult *presult);
+static inline HTSV_Result htsv_get_valid_status(int status);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
@@ -273,7 +286,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
memset(prstate.marked, 0, sizeof(prstate.marked));
/*
- * presult->htsv is not initialized here because all ntuple spots in the
+ * prstate.htsv is not initialized here because all ntuple spots in the
* array will be set either to a valid HTSV_Result value or -1.
*/
presult->ndeleted = 0;
@@ -282,6 +295,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = false;
+ presult->live_tuples = 0;
+ presult->recently_dead_tuples = 0;
+
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
* freezing tuples. Keep track of whether or not the page is all_visible
@@ -340,7 +356,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsNormal(itemid))
{
- presult->htsv[offnum] = -1;
+ prstate.htsv[offnum] = -1;
continue;
}
@@ -356,13 +372,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
- presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
if (reason == PRUNE_ON_ACCESS)
continue;
- switch (presult->htsv[offnum])
+ /*
+ * The criteria for counting a tuple as live in this block need to
+ * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
+ * and ANALYZE may produce wildly different reltuples values, e.g.
+ * when there are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as
+ * VACUUM can't run inside a transaction block, which makes some cases
+ * impossible (e.g. in-progress insert from the same transaction).
+ *
+ * We treat LP_DEAD items (which are the closest thing to DEAD tuples
+ * that might be seen here) differently, too: we assume that they'll
+ * become LP_UNUSED before VACUUM finishes. This difference is only
+ * superficial. VACUUM effectively agrees with ANALYZE about DEAD
+ * items, in the end. VACUUM won't remember LP_DEAD items, but only
+ * because they're not supposed to be left behind when it is done.
+ * (Cases where we bypass index vacuuming will violate this optimistic
+ * assumption, but the overall impact of that should be negligible.)
+ */
+ switch (prstate.htsv[offnum])
{
case HEAPTUPLE_DEAD:
@@ -382,6 +417,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
case HEAPTUPLE_LIVE:
+ /*
+ * Count it as live. Not only is this natural, but it's also
+ * what acquire_sample_rows() does.
+ */
+ presult->live_tuples++;
+
/*
* Is the tuple definitely visible to all transactions?
*
@@ -423,13 +464,34 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
+
+ /*
+ * If tuple is recently dead then we must not remove it from
+ * the relation. (We only remove items that are LP_DEAD from
+ * pruning.)
+ */
+ presult->recently_dead_tuples++;
all_visible_except_removable = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and
+ * we assume that will happen only after we report our
+ * results. This assumption is a bit shaky, but it is what
+ * acquire_sample_rows() does, so be consistent.
+ */
all_visible_except_removable = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
+
+ /*
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
+ */
+ presult->live_tuples++;
all_visible_except_removable = false;
break;
default:
@@ -437,7 +499,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
}
- if (presult->htsv[offnum] != HEAPTUPLE_DEAD)
+ if (prstate.htsv[offnum] != HEAPTUPLE_DEAD)
{
/*
* Deliberately don't set hastup for LP_DEAD items. We make the
@@ -746,10 +808,24 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
}
+/*
+ * Pruning calculates tuple visibility once and saves the results in an array
+ * of int8. See PruneState.htsv for details. This helper function is meant to
+ * guard against examining visibility status array members which have not yet
+ * been computed.
+ */
+static inline HTSV_Result
+htsv_get_valid_status(int status)
+{
+ Assert(status >= HEAPTUPLE_DEAD &&
+ status <= HEAPTUPLE_DELETE_IN_PROGRESS);
+ return (HTSV_Result) status;
+}
+
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in presult->htsv.
+ * Tuple visibility information is provided in prstate->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -800,7 +876,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(presult->htsv[rootoffnum] != -1);
+ Assert(prstate->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -823,7 +899,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -924,7 +1000,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(presult->htsv[offnum]))
+ switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 68258d083ab..c28e786a1e0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1378,22 +1378,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where
- * heap_page_prune_and_freeze() was allowed to disagree with our
- * HeapTupleSatisfiesVacuum() call about whether or not a tuple should be
- * considered DEAD. This happened when an inserting transaction concurrently
- * aborted (after our heap_page_prune_and_freeze() call, before our
- * HeapTupleSatisfiesVacuum() call). There was rather a lot of complexity just
- * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
- * left with storage after pruning.
- *
- * As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune_and_freeze()'s visibility check. Without the
- * second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
- * there can be no disagreement. We'll just handle such tuples as if they had
- * become fully dead right after this operation completes instead of in the
- * middle of it.
- *
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
@@ -1415,10 +1399,8 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
+ int lpdead_items = 0;
PruneFreezeResult presult;
- int lpdead_items,
- live_tuples,
- recently_dead_tuples;
HeapPageFreeze pagefrz;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1438,9 +1420,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- lpdead_items = 0;
- live_tuples = 0;
- recently_dead_tuples = 0;
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1472,9 +1451,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- /* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
- continue;
if (ItemIdIsDead(itemid))
{
@@ -1482,69 +1458,6 @@ lazy_scan_prune(LVRelState *vacrel,
continue;
}
- Assert(ItemIdIsNormal(itemid));
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (htsv_get_valid_status(presult.htsv[offnum]))
- {
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- live_tuples++;
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- recently_dead_tuples++;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This an expected case during concurrent vacuum. Count such
- * rows as live. As above, we assume the deleting transaction
- * will commit and update the counters after we report.
- */
- live_tuples++;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1619,8 +1532,8 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
+ vacrel->live_tuples += presult.live_tuples;
+ vacrel->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 71c59793da7..79ec4049f12 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -203,8 +203,14 @@ typedef struct PruneFreezeResult
/*
* The rest of the fields in PruneFreezeResult are only guaranteed to be
- * initialized if heap_page_prune is passed PruneReason VACUUM_SCAN.
+ * initialized if heap_page_prune_and_freeze() is passed a PruneReason
+ * other than PRUNE_ON_ACCESS.
*/
+ int live_tuples;
+ int recently_dead_tuples;
+
+ /* Number of tuples we froze */
+ int nfrozen;
/*
* Whether or not the page is truly all-visible after pruning. If there
@@ -226,21 +232,6 @@ typedef struct PruneFreezeResult
*/
TransactionId vm_conflict_horizon;
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
- * details. This is of type int8[], instead of HTSV_Result[], so we can
- * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
- * items.
- *
- * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
- * 1. Otherwise every access would need to subtract 1.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
-
- /* Number of tuples we may freeze */
- int nfrozen;
-
/*
* One entry for every tuple that we may freeze.
*/
@@ -260,19 +251,6 @@ typedef enum
PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
} PruneReason;
-/*
- * Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneFreezeResult.htsv for details. This helper function is meant to
- * guard against examining visibility status array members which have not yet
- * been computed.
- */
-static inline HTSV_Result
-htsv_get_valid_status(int status)
-{
- Assert(status >= HEAPTUPLE_DEAD &&
- status <= HEAPTUPLE_DELETE_IN_PROGRESS);
- return (HTSV_Result) status;
-}
/* ----------------
* function prototypes for heap access method
--
2.39.2
v8-0015-Save-dead-tuple-offsets-during-heap_page_prune.patchtext/x-patch; charset=UTF-8; name=v8-0015-Save-dead-tuple-offsets-during-heap_page_prune.patchDownload
From b46b208d8f5733b798555c978747af47e51b411d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 10:16:11 -0400
Subject: [PATCH v8 15/22] Save dead tuple offsets during heap_page_prune
After heap_page_prune() returned, lazy_scan_prune() looped through all
of the offsets of LP_DEAD items which it later added to
LVRelState->dead_items. Instead take care of this when marking a line
pointer or when an existing non-removable LP_DEAD item is encountered in
heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 7 ++++
src/backend/access/heap/vacuumlazy.c | 60 +++++++---------------------
src/include/access/heapam.h | 2 +
3 files changed, 23 insertions(+), 46 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ee557c9ed35..6d5f8ba4417 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,6 +297,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->live_tuples = 0;
presult->recently_dead_tuples = 0;
+ presult->lpdead_items = 0;
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
@@ -975,7 +976,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
+ {
presult->all_visible = false;
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
+ }
break;
}
@@ -1179,6 +1183,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
* all_visible.
*/
presult->all_visible = false;
+
+ /* Record the dead offset for vacuum */
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c28e786a1e0..0fb5a7dd24d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1396,23 +1396,11 @@ lazy_scan_prune(LVRelState *vacrel,
bool *has_lpdead_items)
{
Relation rel = vacrel->rel;
- OffsetNumber offnum,
- maxoff;
- ItemId itemid;
- int lpdead_items = 0;
PruneFreezeResult presult;
HeapPageFreeze pagefrz;
- OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
- /*
- * maxoff might be reduced following line pointer array truncation in
- * heap_page_prune_and_freeze(). That's safe for us to ignore, since the
- * reclaimed space will continue to look like LP_UNUSED items below.
- */
- maxoff = PageGetMaxOffsetNumber(page);
-
/* Initialize pagefrz */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
@@ -1425,41 +1413,21 @@ lazy_scan_prune(LVRelState *vacrel,
* Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
- * in presult.ndeleted. It should not be confused with lpdead_items;
- * lpdead_items's final value can be thought of as the number of tuples
- * that were deleted from indexes.
+ * in presult.ndeleted. It should not be confused with
+ * presult.lpdead_items; presult.lpdead_items's final value can be thought
+ * of as the number of tuples that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
* false otherwise.
+ *
+ * We will update the VM after collecting LP_DEAD items and freezing
+ * tuples. Pruning will have determined whether or not the page is
+ * all-visible.
*/
heap_page_prune_and_freeze(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
&pagefrz, &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
- /*
- * Now scan the page to collect LP_DEAD items and update the variables set
- * just above.
- */
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
- {
- /*
- * Set the offset number so that we can display it along with any
- * error that occurred while processing this tuple.
- */
- vacrel->offnum = offnum;
- itemid = PageGetItemId(page, offnum);
-
-
- if (ItemIdIsDead(itemid))
- {
- deadoffsets[lpdead_items++] = offnum;
- continue;
- }
-
- }
-
vacrel->offnum = InvalidOffsetNumber;
Assert(MultiXactIdIsValid(presult.new_relminmxid));
@@ -1492,7 +1460,7 @@ lazy_scan_prune(LVRelState *vacrel,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(lpdead_items == 0);
+ Assert(presult.lpdead_items == 0);
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
@@ -1508,7 +1476,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
- if (lpdead_items > 0)
+ if (presult.lpdead_items > 0)
{
VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
@@ -1517,9 +1485,9 @@ lazy_scan_prune(LVRelState *vacrel,
ItemPointerSetBlockNumber(&tmp, blkno);
- for (int i = 0; i < lpdead_items; i++)
+ for (int i = 0; i < presult.lpdead_items; i++)
{
- ItemPointerSetOffsetNumber(&tmp, deadoffsets[i]);
+ ItemPointerSetOffsetNumber(&tmp, presult.deadoffsets[i]);
dead_items->items[dead_items->num_items++] = tmp;
}
@@ -1531,7 +1499,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += lpdead_items;
+ vacrel->lpdead_items += presult.lpdead_items;
vacrel->live_tuples += presult.live_tuples;
vacrel->recently_dead_tuples += presult.recently_dead_tuples;
@@ -1540,7 +1508,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
- *has_lpdead_items = (lpdead_items > 0);
+ *has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
@@ -1608,7 +1576,7 @@ lazy_scan_prune(LVRelState *vacrel,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (lpdead_items > 0 && PageIsAllVisible(page))
+ else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 79ec4049f12..68b4d5b859c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -241,6 +241,8 @@ typedef struct PruneFreezeResult
/* New value of relminmxid found by heap_page_prune_and_freeze() */
MultiXactId new_relminmxid;
+ int lpdead_items; /* includes existing LP_DEAD items */
+ OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
/* 'reason' codes for heap_page_prune_and_freeze() */
--
2.39.2
v8-0016-move-live-tuple-accounting-to-heap_prune_chain.patchtext/x-patch; charset=UTF-8; name=v8-0016-move-live-tuple-accounting-to-heap_prune_chain.patchDownload
From c1ed1a7d4dcb1687516425c5a6bbba136f3303d8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 13:54:19 -0400
Subject: [PATCH v8 16/22] move live tuple accounting to heap_prune_chain()
ci-os-only:
---
src/backend/access/heap/pruneheap.c | 636 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 38 +-
src/include/access/heapam.h | 59 ++-
3 files changed, 424 insertions(+), 309 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6d5f8ba4417..744f3b5fabd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -34,8 +34,9 @@ typedef struct
{
/* tuple visibility test, initialized for the relation */
GlobalVisState *vistest;
- /* whether or not dead items can be set LP_UNUSED during pruning */
- bool mark_unused_now;
+ uint8 actions;
+ TransactionId visibility_cutoff_xid;
+ bool all_visible_except_removable;
TransactionId new_prune_xid; /* new prune hint value for page */
TransactionId latest_xid_removed;
@@ -67,10 +68,14 @@ typedef struct
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+ HeapPageFreeze pagefrz;
+
/*
- * One entry for every tuple that we may freeze.
+ * Whether or not this tuple has been counted toward vacuum stats. In
+ * heap_prune_chain(), we have to be sure that Heap Only Tuples that are
+ * not part of any chain are counted correctly.
*/
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+ bool counted[MaxHeapTuplesPerPage + 1];
} PruneState;
/* Local functions */
@@ -83,7 +88,7 @@ static int heap_prune_chain(Buffer buffer,
static inline HTSV_Result htsv_get_valid_status(int status);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
-static void heap_prune_record_redirect(PruneState *prstate,
+static void heap_prune_record_redirect(Page page, PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
PruneFreezeResult *presult);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
@@ -91,6 +96,9 @@ static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
PruneFreezeResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
+
+static void heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate,
+ OffsetNumber offnum, PruneFreezeResult *presult);
static void page_verify_redirects(Page page);
@@ -172,12 +180,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeResult presult;
/*
- * For now, pass mark_unused_now as false regardless of whether or
- * not the relation has indexes, since we cannot safely determine
- * that during on-access pruning with the current implementation.
+ * For now, do not set PRUNE_DO_MARK_UNUSED_NOW regardless of
+ * whether or not the relation has indexes, since we cannot safely
+ * determine that during on-access pruning with the current
+ * implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, false, NULL,
- &presult, PRUNE_ON_ACCESS, NULL);
+ heap_page_prune_and_freeze(relation, buffer, 0, vistest,
+ NULL, &presult, PRUNE_ON_ACCESS, NULL, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -209,7 +218,6 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
-
/*
* Prune and repair fragmentation and potentially freeze tuples on the
* specified page.
@@ -223,16 +231,12 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * actions are the pruning actions that heap_page_prune_and_freeze() should
+ * take.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
- * mark_unused_now indicates whether or not dead items can be set LP_UNUSED
- * during pruning.
- *
- * pagefrz is an input parameter containing visibility cutoff information and
- * the current relfrozenxid and relminmxids used if the caller is interested in
- * freezing tuples on the page.
- *
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
* heap_page_prune_and_freeze() is responsible for initializing it.
@@ -242,15 +246,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*
* off_loc is the offset location required by the caller to use in error
* callback.
+ *
+ * new_relfrozen_xid and new_relmin_xid are provided by the caller if they
+ * would like the current values of those updated as part of advancing
+ * relfrozenxid/relminmxid.
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ uint8 actions,
GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
+ struct VacuumCutoffs *cutoffs,
PruneFreezeResult *presult,
PruneReason reason,
- OffsetNumber *off_loc)
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -258,15 +268,43 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
- TransactionId visibility_cutoff_xid;
TransactionId frz_conflict_horizon;
bool do_freeze;
- bool all_visible_except_removable;
bool do_prune;
bool do_hint;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ /*
+ * pagefrz contains visibility cutoff information and the current
+ * relfrozenxid and relminmxids used if the caller is interested in
+ * freezing tuples on the page.
+ */
+ prstate.pagefrz.cutoffs = cutoffs;
+ prstate.pagefrz.freeze_required = false;
+
+ if (new_relmin_mxid)
+ {
+ prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
+ prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+ }
+ else
+ {
+ prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ }
+
+ if (new_relfrozen_xid)
+ {
+ prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
+ }
+ else
+ {
+ prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ }
+
/*
* Our strategy is to scan the page and make lists of items to change,
* then apply the changes within a critical section. This keeps as much
@@ -280,10 +318,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
prstate.new_prune_xid = InvalidTransactionId;
prstate.vistest = vistest;
- prstate.mark_unused_now = mark_unused_now;
+ prstate.actions = actions;
prstate.latest_xid_removed = InvalidTransactionId;
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
+ memset(prstate.counted, 0, sizeof(prstate.counted));
/*
* prstate.htsv is not initialized here because all ntuple spots in the
@@ -291,7 +330,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
presult->ndeleted = 0;
presult->nnewlpdead = 0;
- presult->nfrozen = 0;
presult->hastup = false;
@@ -300,13 +338,45 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = 0;
/*
- * Caller will update the VM after pruning, collecting LP_DEAD items, and
+ * Caller may update the VM after pruning, collecting LP_DEAD items, and
* freezing tuples. Keep track of whether or not the page is all_visible
* and all_frozen and use this information to update the VM. all_visible
* implies lpdead_items == 0, but don't trust all_frozen result unless
- * all_visible is also set to true.
+ * all_visible is also set to true. If we won't even try freezing,
+ * initialize all_frozen to false.
+ *
+ * For vacuum, if the whole page will become frozen, we consider
+ * opportunistically freezing tuples. Dead tuples which will be removed by
+ * the end of vacuuming should not preclude us from opportunistically
+ * freezing. We will not be able to freeze the whole page if there are
+ * tuples present which are not visible to everyone or if there are dead
+ * tuples which are not yet removable. We need all_visible to be false if
+ * LP_DEAD tuples remain after pruning so that we do not incorrectly
+ * update the visibility map or page hint bit. So, we will update
+ * presult->all_visible to reflect the presence of LP_DEAD items while
+ * pruning and keep all_visible_except_removable to permit freezing if the
+ * whole page will eventually become all visible after removing tuples.
*/
- presult->all_frozen = true;
+ presult->all_visible = true;
+
+ if (prstate.actions & PRUNE_DO_TRY_FREEZE)
+ presult->set_all_frozen = true;
+ else
+ presult->set_all_frozen = false;
+ presult->nfrozen = 0;
+
+ /*
+ * Deliberately delay unsetting all_visible until later during pruning.
+ * Removable dead tuples shouldn't preclude freezing the page. After
+ * finishing this first pass of tuple visibility checks, initialize
+ * all_visible_except_removable with the current value of all_visible to
+ * indicate whether or not the page is all visible except for dead tuples.
+ * This will allow us to attempt to freeze the page after pruning. Later
+ * during pruning, if we encounter an LP_DEAD item or are setting an item
+ * LP_DEAD, we will unset all_visible. As long as we unset it before
+ * updating the visibility map, this will be correct.
+ */
+ prstate.all_visible_except_removable = true;
/*
* The visibility cutoff xid is the newest xmin of live tuples on the
@@ -316,13 +386,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* running transaction on the standby does not see tuples on the page as
* all-visible, so the conflict horizon remains InvalidTransactionId.
*/
- presult->vm_conflict_horizon = visibility_cutoff_xid = InvalidTransactionId;
+ presult->vm_conflict_horizon = prstate.visibility_cutoff_xid = InvalidTransactionId;
frz_conflict_horizon = InvalidTransactionId;
- /* For advancing relfrozenxid and relminmxid */
- presult->new_relfrozenxid = InvalidTransactionId;
- presult->new_relminmxid = InvalidMultiXactId;
-
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -346,7 +412,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* prefetching efficiency significantly / decreases the number of cache
* misses.
*/
- all_visible_except_removable = true;
for (offnum = maxoff;
offnum >= FirstOffsetNumber;
offnum = OffsetNumberPrev(offnum))
@@ -375,168 +440,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
-
- if (reason == PRUNE_ON_ACCESS)
- continue;
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (prstate.htsv[offnum])
- {
- case HEAPTUPLE_DEAD:
-
- /*
- * Deliberately delay unsetting all_visible until later during
- * pruning. Removable dead tuples shouldn't preclude freezing
- * the page. After finishing this first pass of tuple
- * visibility checks, initialize all_visible_except_removable
- * with the current value of all_visible to indicate whether
- * or not the page is all visible except for dead tuples. This
- * will allow us to attempt to freeze the page after pruning.
- * Later during pruning, if we encounter an LP_DEAD item or
- * are setting an item LP_DEAD, we will unset all_visible. As
- * long as we unset it before updating the visibility map,
- * this will be correct.
- */
- break;
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- presult->live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible_except_removable)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible_except_removable = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A
- * FrozenTransactionId is seen as committed to everyone.
- * Otherwise, we check if there is a snapshot that
- * considers this xid to still be running, and if so, we
- * don't consider the page all-visible.
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (xmin != FrozenTransactionId &&
- !GlobalVisTestIsRemovableXid(vistest, xmin))
- {
- all_visible_except_removable = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- presult->recently_dead_tuples++;
- all_visible_except_removable = false;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- all_visible_except_removable = false;
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This an expected case during concurrent vacuum. Count such
- * rows as live. As above, we assume the deleting transaction
- * will commit and update the counters after we report.
- */
- presult->live_tuples++;
- all_visible_except_removable = false;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
-
- if (prstate.htsv[offnum] != HEAPTUPLE_DEAD)
- {
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- */
- presult->hastup = true;
-
- /* Consider freezing any normal tuples which will not be removed */
- if (pagefrz)
- {
- bool totally_frozen;
-
- /* Tuple with storage -- consider need to freeze */
- if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &prstate.frozen[presult->nfrozen],
- &totally_frozen)))
- {
- /* Save prepared freeze plan for later */
- prstate.frozen[presult->nfrozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or
- * eligible to become totally frozen (according to its freeze
- * plan), then the page definitely cannot be set all-frozen in
- * the visibility map later on
- */
- if (!totally_frozen)
- presult->all_frozen = false;
- }
- }
}
/*
@@ -545,21 +448,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
- /*
- * For vacuum, if the whole page will become frozen, we consider
- * opportunistically freezing tuples. Dead tuples which will be removed by
- * the end of vacuuming should not preclude us from opportunistically
- * freezing. We will not be able to freeze the whole page if there are
- * tuples present which are not visible to everyone or if there are dead
- * tuples which are not yet removable. We need all_visible to be false if
- * LP_DEAD tuples remain after pruning so that we do not incorrectly
- * update the visibility map or page hint bit. So, we will update
- * presult->all_visible to reflect the presence of LP_DEAD items while
- * pruning and keep all_visible_except_removable to permit freezing if the
- * whole page will eventually become all visible after removing tuples.
- */
- presult->all_visible = all_visible_except_removable;
-
/* Scan the page */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -615,15 +503,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* opportunistic freeze heuristic must be improved; however, for now, try
* to approximate it.
*/
-
do_freeze = false;
- if (pagefrz)
+ if (prstate.actions & PRUNE_DO_TRY_FREEZE)
{
/* Is the whole page freezable? And is there something to freeze? */
- bool whole_page_freezable = all_visible_except_removable &&
- presult->all_frozen;
+ bool whole_page_freezable = prstate.all_visible_except_removable &&
+ presult->set_all_frozen;
- if (pagefrz->freeze_required)
+ if (prstate.pagefrz.freeze_required)
do_freeze = true;
else if (whole_page_freezable && presult->nfrozen > 0)
{
@@ -648,17 +535,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* want to avoid doing the pre-freeze checks in a critical section.
*/
if (do_freeze)
- heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
-
- if (!do_freeze && (!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
+ heap_pre_freeze_checks(buffer, prstate.pagefrz.frozen, presult->nfrozen);
+ else if (!presult->set_all_frozen || presult->nfrozen > 0)
{
/*
* If we will neither freeze tuples on the page nor set the page all
* frozen in the visibility map, the page is not all-frozen and there
* will be no newly frozen tuples.
*/
- presult->all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ presult->set_all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumenation */
}
/* Any error while applying the changes is critical */
@@ -708,15 +594,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* conservative cutoff by stepping back from OldestXmin. This
* avoids false conflicts when hot_standby_feedback is in use.
*/
- if (all_visible_except_removable && presult->all_frozen)
- frz_conflict_horizon = visibility_cutoff_xid;
+ if (prstate.all_visible_except_removable && presult->set_all_frozen)
+ frz_conflict_horizon = prstate.visibility_cutoff_xid;
else
{
/* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ frz_conflict_horizon = prstate.pagefrz.cutoffs->OldestXmin;
TransactionIdRetreat(frz_conflict_horizon);
}
- heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+ heap_freeze_prepared_tuples(buffer, prstate.pagefrz.frozen, presult->nfrozen);
}
MarkBufferDirty(buffer);
@@ -746,7 +632,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
log_heap_prune_and_freeze(relation, buffer,
conflict_xid,
true, reason,
- prstate.frozen, presult->nfrozen,
+ prstate.pagefrz.frozen, presult->nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
@@ -761,29 +647,31 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* page is completely frozen, there can be no conflict and the
* vm_conflict_horizon should remain InvalidTransactionId.
*/
- if (!presult->all_frozen)
- presult->vm_conflict_horizon = visibility_cutoff_xid;
+ if (!presult->set_all_frozen)
+ presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * If we will freeze tuples on the page or, even if we don't freeze tuples
+ * on the page, if we will set the page all-frozen in the visibility map,
+ * we can advance relfrozenxid and relminmxid to the values in
+ * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
+ * MFIXME: which one should be pick if presult->nfrozen == 0 and
+ * presult->all_frozen = True.
+ */
+ if (new_relfrozen_xid)
+ {
+ if (presult->nfrozen > 0)
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ else
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ }
- if (pagefrz)
+ if (new_relmin_mxid)
{
- /*
- * If we will freeze tuples on the page or, even if we don't freeze
- * tuples on the page, if we will set the page all-frozen in the
- * visibility map, we can advance relfrozenxid and relminmxid to the
- * values in pagefrz->FreezePageRelfrozenXid and
- * pagefrz->FreezePageRelminMxid. MFIXME: which one should be pick if
- * presult->nfrozen == 0 and presult->all_frozen = True.
- */
if (presult->nfrozen > 0)
- {
- presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
- presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
- }
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
else
- {
- presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
- presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
- }
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
@@ -900,13 +788,32 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
- !HeapTupleHeaderIsHotUpdated(htup))
+ if (!HeapTupleHeaderIsHotUpdated(htup))
{
- heap_prune_record_unused(prstate, rootoffnum);
- HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->latest_xid_removed);
- ndeleted++;
+ if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD)
+ {
+ heap_prune_record_unused(prstate, rootoffnum);
+ HeapTupleHeaderAdvanceConflictHorizon(htup,
+ &prstate->latest_xid_removed);
+ ndeleted++;
+ }
+ else
+ {
+ Assert(!prstate->marked[rootoffnum]);
+
+ /*
+ * MFIXME: not sure if this is right -- maybe counting too
+ * many
+ */
+
+ /*
+ * Ensure that this tuple is counted. If it is later
+ * redirected to, it would have been counted then, but we
+ * won't double count because we check if it has already
+ * been counted first.
+ */
+ heap_prune_record_live_or_recently_dead(dp, prstate, rootoffnum, presult);
+ }
}
/* Nothing more to do */
@@ -967,13 +874,13 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (ItemIdIsDead(lp))
{
/*
- * If the caller set mark_unused_now true, we can set dead line
- * pointers LP_UNUSED now. We don't increment ndeleted here since
- * the LP was already marked dead. If it will not be marked
+ * If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can set dead
+ * line pointers LP_UNUSED now. We don't increment ndeleted here
+ * since the LP was already marked dead. If it will not be marked
* LP_UNUSED, it will remain LP_DEAD, making the page not
* all_visible.
*/
- if (unlikely(prstate->mark_unused_now))
+ if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
heap_prune_record_unused(prstate, offnum);
else
{
@@ -1118,7 +1025,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (i >= nchain)
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
- heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], presult);
+ heap_prune_record_redirect(dp, prstate, rootoffnum, chainitems[i], presult);
}
else if (nchain < 2 && ItemIdIsRedirected(rootlp))
{
@@ -1132,6 +1039,14 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
+ /*
+ * If not marked for pruning, consider if the tuple should be counted as
+ * live or recently dead. Note that line pointers redirected to will
+ * already have been counted.
+ */
+ if (ItemIdIsNormal(rootlp) && !prstate->marked[rootoffnum])
+ heap_prune_record_live_or_recently_dead(dp, prstate, rootoffnum, presult);
+
return ndeleted;
}
@@ -1151,13 +1066,15 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
/* Record line pointer to be redirected */
static void
-heap_prune_record_redirect(PruneState *prstate,
+heap_prune_record_redirect(Page page, PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
PruneFreezeResult *presult)
{
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
prstate->redirected[prstate->nredirected * 2 + 1] = rdoffnum;
+ heap_prune_record_live_or_recently_dead(page, prstate, rdoffnum, presult);
+
prstate->nredirected++;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
@@ -1189,22 +1106,22 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
}
/*
- * Depending on whether or not the caller set mark_unused_now to true, record that a
- * line pointer should be marked LP_DEAD or LP_UNUSED. There are other cases in
- * which we will mark line pointers LP_UNUSED, but we will not mark line
- * pointers LP_DEAD if mark_unused_now is true.
+ * Depending on whether or not the caller set PRUNE_DO_MARK_UNUSED_NOW, record
+ * that a line pointer should be marked LP_DEAD or LP_UNUSED. There are other
+ * cases in which we will mark line pointers LP_UNUSED, but we will not mark
+ * line pointers LP_DEAD if PRUNE_DO_MARK_UNUSED_NOW is set.
*/
static void
heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
PruneFreezeResult *presult)
{
/*
- * If the caller set mark_unused_now to true, we can remove dead tuples
+ * If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can remove dead tuples
* during pruning instead of marking their line pointers dead. Set this
* tuple's line pointer LP_UNUSED. We hint that this option is less
* likely.
*/
- if (unlikely(prstate->mark_unused_now))
+ if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
heap_prune_record_unused(prstate, offnum);
else
heap_prune_record_dead(prstate, offnum, presult);
@@ -1221,6 +1138,187 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
prstate->marked[offnum] = true;
}
+static void
+heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNumber offnum,
+ PruneFreezeResult *presult)
+{
+ HTSV_Result status;
+ HeapTupleHeader htup;
+ bool totally_frozen;
+
+ /* This could happen for items which are redirected to. */
+ if (prstate->counted[offnum])
+ return;
+
+ prstate->counted[offnum] = true;
+
+ /*
+ * If we don't want to do any of the special defined actions, we don't
+ * need to continue.
+ */
+ if (prstate->actions == 0)
+ return;
+
+ status = htsv_get_valid_status(prstate->htsv[offnum]);
+
+ Assert(status != HEAPTUPLE_DEAD);
+
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the soft
+ * assumption that any LP_DEAD items encountered here will become
+ * LP_UNUSED later on, before count_nondeletable_pages is reached. If we
+ * don't make this assumption then rel truncation will only happen every
+ * other VACUUM, at most. Besides, VACUUM must treat
+ * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
+ * handled (handled here, or handled later on).
+ */
+ presult->hastup = true;
+
+ /*
+ * The criteria for counting a tuple as live in this block need to match
+ * what analyze.c's acquire_sample_rows() does, otherwise VACUUM and
+ * ANALYZE may produce wildly different reltuples values, e.g. when there
+ * are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as VACUUM
+ * can't run inside a transaction block, which makes some cases impossible
+ * (e.g. in-progress insert from the same transaction).
+ *
+ * We treat LP_DEAD items (which are the closest thing to DEAD tuples that
+ * might be seen here) differently, too: we assume that they'll become
+ * LP_UNUSED before VACUUM finishes. This difference is only superficial.
+ * VACUUM effectively agrees with ANALYZE about DEAD items, in the end.
+ * VACUUM won't remember LP_DEAD items, but only because they're not
+ * supposed to be left behind when it is done. (Cases where we bypass
+ * index vacuuming will violate this optimistic assumption, but the
+ * overall impact of that should be negligible.)
+ *
+ * HEAPTUPLE_LIVE tuples are naturally counted as live. This is also what
+ * acquire_sample_rows() does.
+ *
+ * HEAPTUPLE_DELETE_IN_PROGRESS tuples are expected during concurrent
+ * vacuum. We expect the deleting transaction to update the counters at
+ * commit after we report our results, so count these tuples as live to
+ * ensure the math works out. The assumption that the transaction will
+ * commit and update the counters after we report is a bit shaky; but it
+ * is what acquire_sample_rows() does, so we do the same to be consistent.
+ */
+ htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+
+ switch (status)
+ {
+ case HEAPTUPLE_LIVE:
+
+ /*
+ * Count it as live. Not only is this natural, but it's also what
+ * acquire_sample_rows() does.
+ */
+ presult->live_tuples++;
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed asynchronously.
+ * See SetHintBits for more info. Check that the tuple is hinted
+ * xmin-committed because of that.
+ */
+ if (prstate->all_visible_except_removable)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed? A FrozenTransactionId
+ * is seen as committed to everyone. Otherwise, we check if
+ * there is a snapshot that considers this xid to still be
+ * running, and if so, we don't consider the page all-visible.
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+
+ /* For now always use pagefrz->cutoffs */
+ Assert(prstate->pagefrz.cutoffs);
+ if (!TransactionIdPrecedes(xmin, prstate->pagefrz.cutoffs->OldestXmin))
+ {
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
+ TransactionIdIsNormal(xmin))
+ prstate->visibility_cutoff_xid = xmin;
+ }
+ break;
+ case HEAPTUPLE_RECENTLY_DEAD:
+
+ /*
+ * If tuple is recently dead then we must not remove it from the
+ * relation. (We only remove items that are LP_DEAD from
+ * pruning.)
+ */
+ presult->recently_dead_tuples++;
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and we
+ * assume that will happen only after we report our results. This
+ * assumption is a bit shaky, but it is what acquire_sample_rows()
+ * does, so be consistent.
+ */
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+ /*
+ * This an expected case during concurrent vacuum. Count such rows
+ * as live. As above, we assume the deleting transaction will
+ * commit and update the counters after we report.
+ */
+ presult->live_tuples++;
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ default:
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+ break;
+ }
+
+ /* Consider freezing any normal tuples which will not be removed */
+ if (prstate->actions & PRUNE_DO_TRY_FREEZE)
+ {
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, &prstate->pagefrz,
+ &prstate->pagefrz.frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ prstate->pagefrz.frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to
+ * become totally frozen (according to its freeze plan), then the page
+ * definitely cannot be set all-frozen in the visibility map later on
+ */
+ if (!totally_frozen)
+ presult->set_all_frozen = false;
+ }
+
+}
/*
* Perform the actual page changes needed by heap_page_prune.
@@ -1354,12 +1452,12 @@ heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
else
{
/*
- * When heap_page_prune() was called, mark_unused_now may have
- * been passed as true, which allows would-be LP_DEAD items to be
- * made LP_UNUSED instead. This is only possible if the relation
- * has no indexes. If there are any dead items, then
- * mark_unused_now was not true and every item being marked
- * LP_UNUSED must refer to a heap-only tuple.
+ * When heap_page_prune() was called, PRUNE_DO_MARK_UNUSED_NOW may
+ * have been set, which allows would-be LP_DEAD items to be made
+ * LP_UNUSED instead. This is only possible if the relation has
+ * no indexes. If there are any dead items, then
+ * PRUNE_DO_MARK_UNUSED_NOW was not set and every item being
+ * marked LP_UNUSED must refer to a heap-only tuple.
*/
if (ndead > 0)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0fb5a7dd24d..04e86347a0b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1397,18 +1397,10 @@ lazy_scan_prune(LVRelState *vacrel,
{
Relation rel = vacrel->rel;
PruneFreezeResult presult;
- HeapPageFreeze pagefrz;
+ uint8 actions = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
- /* Initialize pagefrz */
- pagefrz.freeze_required = false;
- pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
- pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
- pagefrz.cutoffs = &vacrel->cutoffs;
-
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
*
@@ -1418,22 +1410,26 @@ lazy_scan_prune(LVRelState *vacrel,
* of as the number of tuples that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
- * items LP_UNUSED, so mark_unused_now should be true if no indexes and
- * false otherwise.
+ * items LP_UNUSED, so PRUNE_DO_MARK_UNUSED_NOW should be set if no
+ * indexes and unset otherwise.
*
* We will update the VM after collecting LP_DEAD items and freezing
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
- &pagefrz, &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
+ actions |= PRUNE_DO_TRY_FREEZE;
- vacrel->offnum = InvalidOffsetNumber;
+ if (vacrel->nindexes == 0)
+ actions |= PRUNE_DO_MARK_UNUSED_NOW;
- Assert(MultiXactIdIsValid(presult.new_relminmxid));
- vacrel->NewRelfrozenXid = presult.new_relfrozenxid;
- Assert(TransactionIdIsValid(presult.new_relfrozenxid));
- vacrel->NewRelminMxid = presult.new_relminmxid;
+ heap_page_prune_and_freeze(rel, buf, actions, vacrel->vistest,
+ &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum,
+ &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
+
+ Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
+ Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
+
+ vacrel->offnum = InvalidOffsetNumber;
if (presult.nfrozen > 0)
{
@@ -1466,7 +1462,7 @@ lazy_scan_prune(LVRelState *vacrel,
&debug_cutoff, &debug_all_frozen))
Assert(false);
- Assert(presult.all_frozen == debug_all_frozen);
+ Assert(presult.set_all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
debug_cutoff == presult.vm_conflict_horizon);
@@ -1521,7 +1517,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (presult.all_frozen)
+ if (presult.set_all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1592,7 +1588,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.set_all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 68b4d5b859c..a0420bea2eb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -191,8 +191,35 @@ typedef struct HeapPageFreeze
MultiXactId NoFreezePageRelminMxid;
struct VacuumCutoffs *cutoffs;
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} HeapPageFreeze;
+/*
+ * Actions that can be taken during pruning and freezing. By default, we will
+ * at least attempt regular pruning.
+ */
+
+/*
+ * mark_unused_now indicates whether or not dead items can be set LP_UNUSED
+ * during pruning.
+ */
+#define PRUNE_DO_MARK_UNUSED_NOW (1 << 1)
+
+/*
+ * Freeze if advantageous or required and try to advance relfrozenxid and
+ * relminmxid. To attempt freezing, we will need to determine if the page is
+ * all frozen. So, if this action is set, we will also inform the caller if the
+ * page is all-visible and/or all-frozen and calculate a snapshot conflict
+ * horizon for updating the visibility map. While doing this, we also count if
+ * tuples are live or recently dead.
+ */
+#define PRUNE_DO_TRY_FREEZE (1 << 2)
+
+
/*
* Per-page state returned from pruning
*/
@@ -203,14 +230,17 @@ typedef struct PruneFreezeResult
/*
* The rest of the fields in PruneFreezeResult are only guaranteed to be
- * initialized if heap_page_prune_and_freeze() is passed a PruneReason
- * other than PRUNE_ON_ACCESS.
+ * initialized if heap_page_prune_and_freeze() is passed
+ * PRUNE_DO_TRY_FREEZE.
*/
- int live_tuples;
- int recently_dead_tuples;
-
/* Number of tuples we froze */
int nfrozen;
+ /* Whether or not the page should be set all-frozen in the VM */
+ bool set_all_frozen;
+
+ /* Number of live and recently dead tuples */
+ int live_tuples;
+ int recently_dead_tuples;
/*
* Whether or not the page is truly all-visible after pruning. If there
@@ -219,8 +249,6 @@ typedef struct PruneFreezeResult
*/
bool all_visible;
- /* Whether or not the page can be set all-frozen in the VM */
- bool all_frozen;
/* Whether or not the page makes rel truncation unsafe */
bool hastup;
@@ -232,15 +260,6 @@ typedef struct PruneFreezeResult
*/
TransactionId vm_conflict_horizon;
- /*
- * One entry for every tuple that we may freeze.
- */
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
- /* New value of relfrozenxid found by heap_page_prune_and_freeze() */
- TransactionId new_relfrozenxid;
-
- /* New value of relminmxid found by heap_page_prune_and_freeze() */
- MultiXactId new_relminmxid;
int lpdead_items; /* includes existing LP_DEAD items */
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
@@ -354,12 +373,14 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ uint8 actions,
struct GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
+ struct VacuumCutoffs *cutoffs,
PruneFreezeResult *presult,
PruneReason reason,
- OffsetNumber *off_loc);
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid);
extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
--
2.39.2
v8-0017-Move-frozen-array-to-PruneState.patchtext/x-patch; charset=UTF-8; name=v8-0017-Move-frozen-array-to-PruneState.patchDownload
From 7c65489e552457d4d28c7f204250742831d97894 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 27 Mar 2024 23:37:35 +0200
Subject: [PATCH v8 17/22] Move 'frozen' array to PruneState.
It can be internal to heap_page_prune_and_freeze(), like the other
arrays. The freeze subroutines don't need it.
---
src/backend/access/heap/pruneheap.c | 22 ++++++++++++----------
src/include/access/heapam.h | 8 +-------
2 files changed, 13 insertions(+), 17 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 744f3b5fabd..e242d752f9f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,10 +43,12 @@ typedef struct
int nredirected; /* numbers of entries in arrays below */
int ndead;
int nunused;
+ int nfrozen;
/* arrays that accumulate indexes of items to be changed */
OffsetNumber redirected[MaxHeapTuplesPerPage * 2];
OffsetNumber nowdead[MaxHeapTuplesPerPage];
OffsetNumber nowunused[MaxHeapTuplesPerPage];
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
/*
* marked[i] is true if item i is entered in one of the above arrays.
@@ -320,7 +322,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.vistest = vistest;
prstate.actions = actions;
prstate.latest_xid_removed = InvalidTransactionId;
- prstate.nredirected = prstate.ndead = prstate.nunused = 0;
+ prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
memset(prstate.counted, 0, sizeof(prstate.counted));
@@ -363,7 +365,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->set_all_frozen = true;
else
presult->set_all_frozen = false;
- presult->nfrozen = 0;
/*
* Deliberately delay unsetting all_visible until later during pruning.
@@ -512,7 +513,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (prstate.pagefrz.freeze_required)
do_freeze = true;
- else if (whole_page_freezable && presult->nfrozen > 0)
+ else if (whole_page_freezable && prstate.nfrozen > 0)
{
/*
* Freezing would make the page all-frozen. In this case, we will
@@ -535,8 +536,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* want to avoid doing the pre-freeze checks in a critical section.
*/
if (do_freeze)
- heap_pre_freeze_checks(buffer, prstate.pagefrz.frozen, presult->nfrozen);
- else if (!presult->set_all_frozen || presult->nfrozen > 0)
+ heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
+ else if (!presult->set_all_frozen || prstate.nfrozen > 0)
{
/*
* If we will neither freeze tuples on the page nor set the page all
@@ -544,7 +545,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* will be no newly frozen tuples.
*/
presult->set_all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumenation */
+ prstate.nfrozen = 0; /* avoid miscounts in instrumenation */
}
/* Any error while applying the changes is critical */
@@ -602,7 +603,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
frz_conflict_horizon = prstate.pagefrz.cutoffs->OldestXmin;
TransactionIdRetreat(frz_conflict_horizon);
}
- heap_freeze_prepared_tuples(buffer, prstate.pagefrz.frozen, presult->nfrozen);
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
}
MarkBufferDirty(buffer);
@@ -632,7 +633,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
log_heap_prune_and_freeze(relation, buffer,
conflict_xid,
true, reason,
- prstate.pagefrz.frozen, presult->nfrozen,
+ prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
@@ -649,6 +650,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (!presult->set_all_frozen)
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ presult->nfrozen = prstate.nfrozen;
/*
* If we will freeze tuples on the page or, even if we don't freeze tuples
@@ -1302,11 +1304,11 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
{
/* Tuple with storage -- consider need to freeze */
if ((heap_prepare_freeze_tuple(htup, &prstate->pagefrz,
- &prstate->pagefrz.frozen[presult->nfrozen],
+ &prstate->frozen[presult->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- prstate->pagefrz.frozen[presult->nfrozen++].offset = offnum;
+ prstate->frozen[presult->nfrozen++].offset = offnum;
}
/*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a0420bea2eb..ef61e0277ee 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -191,11 +191,6 @@ typedef struct HeapPageFreeze
MultiXactId NoFreezePageRelminMxid;
struct VacuumCutoffs *cutoffs;
-
- /*
- * One entry for every tuple that we may freeze.
- */
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} HeapPageFreeze;
/*
@@ -227,14 +222,13 @@ typedef struct PruneFreezeResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ int nfrozen; /* Number of tuples we froze */
/*
* The rest of the fields in PruneFreezeResult are only guaranteed to be
* initialized if heap_page_prune_and_freeze() is passed
* PRUNE_DO_TRY_FREEZE.
*/
- /* Number of tuples we froze */
- int nfrozen;
/* Whether or not the page should be set all-frozen in the VM */
bool set_all_frozen;
--
2.39.2
v8-0018-Cosmetic-fixes.patchtext/x-patch; charset=UTF-8; name=v8-0018-Cosmetic-fixes.patchDownload
From a515ea3d52a0f131a71ebfca57a71159afa07dde Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 27 Mar 2024 23:41:15 +0200
Subject: [PATCH v8 18/22] Cosmetic fixes
---
src/backend/access/heap/heapam.c | 14 +++++++-------
src/backend/access/heap/pruneheap.c | 2 +-
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index aefc0be0dd3..ed4045925bd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6762,13 +6762,13 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
-* Perform xmin/xmax XID status sanity checks before actually executing freeze
-* plans.
-*
-* heap_prepare_freeze_tuple doesn't perform these checks directly because
-* pg_xact lookups are relatively expensive. They shouldn't be repeated
-* by successive VACUUMs that each decide against freezing the same page.
-*/
+ * Perform xmin/xmax XID status sanity checks before actually executing freeze
+ * plans.
+ *
+ * heap_prepare_freeze_tuple doesn't perform these checks directly because
+ * pg_xact lookups are relatively expensive. They shouldn't be repeated
+ * by successive VACUUMs that each decide against freezing the same page.
+ */
void
heap_pre_freeze_checks(Buffer buffer,
HeapTupleFreeze *tuples, int ntuples)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e242d752f9f..e05224c1f38 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -545,7 +545,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* will be no newly frozen tuples.
*/
presult->set_all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumenation */
+ prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/* Any error while applying the changes is critical */
--
2.39.2
v8-0019-Almost-cosmetic-fixes.patchtext/x-patch; charset=UTF-8; name=v8-0019-Almost-cosmetic-fixes.patchDownload
From 01797e2ed58855f214842f0caf9a0c790571546e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 27 Mar 2024 23:44:17 +0200
Subject: [PATCH v8 19/22] Almost cosmetic fixes
---
src/backend/access/heap/pruneheap.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e05224c1f38..07fc6b139bd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -38,7 +38,10 @@ typedef struct
TransactionId visibility_cutoff_xid;
bool all_visible_except_removable;
- TransactionId new_prune_xid; /* new prune hint value for page */
+ /*
+ * Fields describing what to do to the page
+ */
+ TransactionId new_prune_xid; /* new prune hint value */
TransactionId latest_xid_removed;
int nredirected; /* numbers of entries in arrays below */
int ndead;
@@ -61,7 +64,7 @@ typedef struct
/*
* Tuple visibility is only computed once for each tuple, for correctness
* and efficiency reasons; see comment in heap_page_prune_and_freeze() for
- * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
* use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
* items.
*
--
2.39.2
v8-0020-Move-frz_conflict_horizon-to-tighter-scope.patchtext/x-patch; charset=UTF-8; name=v8-0020-Move-frz_conflict_horizon-to-tighter-scope.patchDownload
From 2e998811b71ff0bfc6d2d88be489ec726494cc01 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 27 Mar 2024 23:47:24 +0200
Subject: [PATCH v8 20/22] Move 'frz_conflict_horizon' to tighter scope
---
src/backend/access/heap/pruneheap.c | 38 ++++++++++++++---------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 07fc6b139bd..e37ba655a7d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -273,7 +273,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
- TransactionId frz_conflict_horizon;
bool do_freeze;
bool do_prune;
bool do_hint;
@@ -391,7 +390,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all-visible, so the conflict horizon remains InvalidTransactionId.
*/
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid = InvalidTransactionId;
- frz_conflict_horizon = InvalidTransactionId;
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -590,24 +588,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
if (do_freeze)
- {
- /*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin. This
- * avoids false conflicts when hot_standby_feedback is in use.
- */
- if (prstate.all_visible_except_removable && presult->set_all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.pagefrz.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- }
MarkBufferDirty(buffer);
@@ -626,8 +607,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
TransactionId conflict_xid;
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for
+ * conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (do_freeze)
+ {
+ if (prstate.all_visible_except_removable && presult->set_all_frozen)
+ frz_conflict_horizon = prstate.visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = prstate.pagefrz.cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+ }
+
if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
conflict_xid = frz_conflict_horizon;
else
--
2.39.2
v8-0021-Add-comment-about-a-pre-existing-issue.patchtext/x-patch; charset=UTF-8; name=v8-0021-Add-comment-about-a-pre-existing-issue.patchDownload
From 2f38628373ccfb6e8f8fd883955056030092569d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 28 Mar 2024 00:16:09 +0200
Subject: [PATCH v8 21/22] Add comment about a pre-existing issue
Not sure if we want to keep this, but I wanted to document it for
discussion at least.
---
src/backend/access/heap/pruneheap.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e37ba655a7d..2b720ab6aa1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -792,6 +792,23 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* Note that we might first arrive at a dead heap-only tuple
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
+ *
+ * FIXME: The next paragraph isn't new with these patches, but
+ * just something I realized while looking at this. But maybe we should
+ * add a comment like this? Or is it too much detail?
+ *
+ * Whether we arrive at the dead HOT tuple first here or while
+ * following a chain below affects whether preceding RECENTLY_DEAD
+ * tuples in the chain can be removed or not. Imagine that you
+ * have a chain with two tuples: RECENTLY_DEAD -> DEAD. If we
+ * reach the RECENTLY_DEAD tuple first, the chain-following logic
+ * will find the DEAD tuple and conclude that both tuples are in
+ * fact dead and can be removed. But if we reach the DEAD tuple
+ * at the end of the chain first, when we reach the RECENTLY_DEAD
+ * tuple later, we will not follow the chain because the DEAD
+ * TUPLE is already 'marked', and will not remove the
+ * RECENTLY_DEAD tuple. This is not a correctness issue, and the
+ * RECENTLY_DEAD tuple will be removed by a later VACUUM.
*/
if (!HeapTupleHeaderIsHotUpdated(htup))
{
--
2.39.2
v8-0022-WIP.patchtext/x-patch; charset=UTF-8; name=v8-0022-WIP.patchDownload
From c2ee7508456d0e76de985f9997a6840450e342a8 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 28 Mar 2024 00:45:26 +0200
Subject: [PATCH v8 22/22] WIP
- Got rid of all_visible_except_removable again. We're back to the
roughly the same mechanism as on 'master', where the all_visible
doesn't include LP_DEAD items, but at the end of
heap_page_prune_and_freeze() when we return to the caller, we clear
it if there were any LP_DEAD items. I considered calling the
variable 'all_visible_except_lp_dead', which would be more accurate,
but it's also very long.
- I duplicated all the fields from PageFreezeResult to PruneState. Now
heap_prune_chain() and all the subroutines don't need the
PageFreezeResult argument, and you don't need to remember which
struct each field is kept in. It's all now in PruneState, and the
fields that the caller is interested in are copied to
PageFreezeResult at the end of heap_page_prune_and_freeze()
- Move more of the bookkeeping of live and dead tuples to the
heap_prune_record_*() subroutines.
- Replaced the 'counted' array with 'revisit' array. I thought I could
get rid of it altogether, by just being careful to call the right
heap_prune_record_*() subroutine for each tuple in heap_prune_chain(),
but with live and recently-dead tuples that are part of a HOT chain,
we might visit the tuple as part of the HOT chain or not, depending
on what it's position in the chain is. So I invented a new revisit
phase. All live heap-only tuples that we find, that haven't already
been processed as part of a hot chain, are stashed away in the
'revisit' array. After processing all the HOT chains, the 'revisit'
tuples are re-checked, and counted if they haven't already been counted.
- Live tuples are now also marked in the 'marked' array, when they are
counted. This gives a nice invariant: all tuples must be marked
exactly once, as part of a hot chain or otherwise. Added an
assertion for that.
---
src/backend/access/heap/pruneheap.c | 706 +++++++++++++++++----------
src/backend/access/heap/vacuumlazy.c | 6 +-
src/include/access/heapam.h | 37 +-
3 files changed, 464 insertions(+), 285 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2b720ab6aa1..a8ed11a1858 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -32,16 +32,16 @@
/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
+ /* PRUNE_DO_* arguments */
+ uint8 actions;
+
/* tuple visibility test, initialized for the relation */
GlobalVisState *vistest;
- uint8 actions;
- TransactionId visibility_cutoff_xid;
- bool all_visible_except_removable;
/*
* Fields describing what to do to the page
*/
- TransactionId new_prune_xid; /* new prune hint value */
+ TransactionId new_prune_xid; /* new prune hint value */
TransactionId latest_xid_removed;
int nredirected; /* numbers of entries in arrays below */
int ndead;
@@ -53,14 +53,20 @@ typedef struct
OffsetNumber nowunused[MaxHeapTuplesPerPage];
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+ HeapPageFreeze pagefrz;
+
/*
- * marked[i] is true if item i is entered in one of the above arrays.
+ * marked[i] is true when heap_prune_chain() has already processed item i.
*
* This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
+ /* Live tuples stashed for later processing in heap_prune_chain() */
+ int nrevisit;
+ OffsetNumber revisit[MaxHeapTuplesPerPage];
+
/*
* Tuple visibility is only computed once for each tuple, for correctness
* and efficiency reasons; see comment in heap_page_prune_and_freeze() for
@@ -73,37 +79,73 @@ typedef struct
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
- HeapPageFreeze pagefrz;
+ /*
+ * The rest of the fields are not used by pruning itself, but are used to
+ * collect information about what was pruned and what state the page is in
+ * after pruning, for the benefit of the caller. They are copied to
+ * PruneFreezeResult at the end.
+ */
+
+ int ndeleted; /* Number of tuples deleted from the page */
+
+ /* Number of live and recently dead tuples, after pruning */
+ int live_tuples;
+ int recently_dead_tuples;
+
+ /* Whether or not the page makes rel truncation unsafe */
+ bool hastup;
+
+ /*
+ * LP_DEAD items on the page after pruning. Includes existing LP_DEAD
+ * items
+ */
+ int lpdead_items; /* includes existing LP_DEAD items */
+ OffsetNumber *deadoffsets; /* points directly to PruneResult->deadoffsets */
/*
- * Whether or not this tuple has been counted toward vacuum stats. In
- * heap_prune_chain(), we have to be sure that Heap Only Tuples that are
- * not part of any chain are counted correctly.
+ * all_visible and all_frozen indicate if the all-visible and all-frozen
+ * bits in the visibility map can be set for this page, after pruning.
+ *
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page.
+ * The caller can use it as the conflict horizon, when setting the VM
+ * bits. It is only valid if we froze some tuples, and all_frozen is
+ * true.
+ *
+ * These are only set if the PRUNE_DO_TRY_FREEZE action flag is set.
+ *
+ * NOTE: This 'all_visible' doesn't include LP_DEAD items. That's
+ * convenient for heap_page_prune_and_freeze(), to use this to decide
+ * whether to freeze the page or not. The 'all_visible' value returned to
+ * the caller is adjusted to include LP_DEAD items at the end.
*/
- bool counted[MaxHeapTuplesPerPage + 1];
+ bool all_visible;
+ bool all_frozen;
+ TransactionId visibility_cutoff_xid;
+
} PruneState;
/* Local functions */
static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
HeapTuple tup,
Buffer buffer);
-static int heap_prune_chain(Buffer buffer,
- OffsetNumber rootoffnum,
- PruneState *prstate, PruneFreezeResult *presult);
-
static inline HTSV_Result htsv_get_valid_status(int status);
+static void heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
+ PruneState *prstate);
+
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
-static void heap_prune_record_redirect(Page page, PruneState *prstate,
+static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
- PruneFreezeResult *presult);
+ bool was_normal);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneFreezeResult *presult);
+ bool was_normal);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneFreezeResult *presult);
-static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
+ bool was_normal);
+static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
+
+static void heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate,
- OffsetNumber offnum, PruneFreezeResult *presult);
static void page_verify_redirects(Page page);
@@ -242,6 +284,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
+ * cutoffs TODO
+ *
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
* heap_page_prune_and_freeze() is responsible for initializing it.
@@ -326,70 +370,63 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.latest_xid_removed = InvalidTransactionId;
prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
- memset(prstate.counted, 0, sizeof(prstate.counted));
+ prstate.nrevisit = 0;
/*
* prstate.htsv is not initialized here because all ntuple spots in the
* array will be set either to a valid HTSV_Result value or -1.
*/
- presult->ndeleted = 0;
- presult->nnewlpdead = 0;
-
- presult->hastup = false;
- presult->live_tuples = 0;
- presult->recently_dead_tuples = 0;
- presult->lpdead_items = 0;
+ prstate.ndeleted = 0;
+ prstate.hastup = false;
+ prstate.live_tuples = 0;
+ prstate.recently_dead_tuples = 0;
+ prstate.lpdead_items = 0;
+ prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after pruning, collecting LP_DEAD items, and
- * freezing tuples. Keep track of whether or not the page is all_visible
- * and all_frozen and use this information to update the VM. all_visible
- * implies lpdead_items == 0, but don't trust all_frozen result unless
- * all_visible is also set to true. If we won't even try freezing,
- * initialize all_frozen to false.
+ * Caller may update the VM after we're done. We keep track of whether
+ * the page will be all_visible and all_frozen, once we're done with the
+ * pruning and freezing, to help the caller to do that.
*
- * For vacuum, if the whole page will become frozen, we consider
- * opportunistically freezing tuples. Dead tuples which will be removed by
- * the end of vacuuming should not preclude us from opportunistically
- * freezing. We will not be able to freeze the whole page if there are
- * tuples present which are not visible to everyone or if there are dead
- * tuples which are not yet removable. We need all_visible to be false if
- * LP_DEAD tuples remain after pruning so that we do not incorrectly
- * update the visibility map or page hint bit. So, we will update
- * presult->all_visible to reflect the presence of LP_DEAD items while
- * pruning and keep all_visible_except_removable to permit freezing if the
- * whole page will eventually become all visible after removing tuples.
+ * Currently, only VACUUM sets the VM bits. To save the effort, only do
+ * only the bookkeeping if the caller needs it. Currently, that's tied to
+ * PRUNE_DO_TRY_FREEZE, but it could be a separate flag, if you wanted to
+ * update the VM bits without also freezing, or freezing without setting
+ * the VM bits.
+ *
+ * In addition to telling the caller whether it can set the VM bit, we
+ * also use 'all_visible' and 'all_frozen' for our own decision-making. If
+ * the whole page will become frozen, we consider opportunistically
+ * freezing tuples. We will not be able to freeze the whole page if there
+ * are tuples present which are not visible to everyone or if there are
+ * dead tuples which are not yet removable. However, dead tuples which
+ * will be removed by the end of vacuuming should not preclude us from
+ * opportunistically freezing. Because of that, we do not clear
+ * all_visible when we see LP_DEAD items. We fix that at the end of the
+ * function, when we return the value to the caller, so that the caller
+ * doesn't set the VM bit incorrectly.
*/
- presult->all_visible = true;
-
if (prstate.actions & PRUNE_DO_TRY_FREEZE)
- presult->set_all_frozen = true;
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = true;
+ }
else
- presult->set_all_frozen = false;
-
- /*
- * Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page. After
- * finishing this first pass of tuple visibility checks, initialize
- * all_visible_except_removable with the current value of all_visible to
- * indicate whether or not the page is all visible except for dead tuples.
- * This will allow us to attempt to freeze the page after pruning. Later
- * during pruning, if we encounter an LP_DEAD item or are setting an item
- * LP_DEAD, we will unset all_visible. As long as we unset it before
- * updating the visibility map, this will be correct.
- */
- prstate.all_visible_except_removable = true;
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
/*
* The visibility cutoff xid is the newest xmin of live tuples on the
* page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
+ * caller can use for updating the VM. If, at the end of freezing and
* pruning, the page is all-frozen, there is no possibility that any
* running transaction on the standby does not see tuples on the page as
* all-visible, so the conflict horizon remains InvalidTransactionId.
*/
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid = InvalidTransactionId;
+ prstate.visibility_cutoff_xid = InvalidTransactionId;
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -450,7 +487,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
- /* Scan the page */
+ /*
+ * Scan the page, processing each tuple.
+ *
+ * heap_prune_chain() decides for each tuple, whether it can be pruned,
+ * redirected or frozen. It follows HOT chains, processing each HOT chain
+ * as a unit.
+ */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
@@ -471,10 +514,38 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
continue;
/* Process this item or chain of items */
- presult->ndeleted += heap_prune_chain(buffer, offnum,
- &prstate, presult);
+ heap_prune_chain(buffer, offnum, &prstate);
}
+ /*
+ * If heap_prune_chain() stashed any live tuples, recheck and count them
+ * now.
+ */
+ for (int i = 0; i < prstate.nrevisit; i++)
+ {
+ offnum = prstate.revisit[i];
+ if (!prstate.marked[offnum])
+ heap_prune_record_unchanged(page, &prstate, offnum);
+ }
+
+ /* We should now have processed every tuple exactly once */
+#ifdef USE_ASSERT_CHECKING
+ for (offnum = FirstOffsetNumber;
+ offnum <= maxoff;
+ offnum = OffsetNumberNext(offnum))
+ {
+ ItemId itemid;
+
+ if (off_loc)
+ *off_loc = offnum;
+ itemid = PageGetItemId(page, offnum);
+ if (ItemIdIsUsed(itemid))
+ Assert(prstate.marked[offnum]);
+ else
+ Assert(!prstate.marked[offnum]);
+ }
+#endif
+
/* Clear the offset information once we have processed the given page. */
if (off_loc)
*off_loc = InvalidOffsetNumber;
@@ -483,9 +554,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
- /* Record number of newly-set-LP_DEAD items for caller */
- presult->nnewlpdead = prstate.ndead;
-
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update the hint
@@ -509,8 +577,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (prstate.actions & PRUNE_DO_TRY_FREEZE)
{
/* Is the whole page freezable? And is there something to freeze? */
- bool whole_page_freezable = prstate.all_visible_except_removable &&
- presult->set_all_frozen;
+ bool whole_page_freezable = prstate.all_visible &&
+ prstate.all_frozen;
if (prstate.pagefrz.freeze_required)
do_freeze = true;
@@ -538,14 +606,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (do_freeze)
heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- else if (!presult->set_all_frozen || prstate.nfrozen > 0)
+ else if (!prstate.all_frozen || prstate.nfrozen > 0)
{
/*
* If we will neither freeze tuples on the page nor set the page all
* frozen in the visibility map, the page is not all-frozen and there
* will be no newly frozen tuples.
*/
- presult->set_all_frozen = false;
+ prstate.all_frozen = false;
prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
}
@@ -618,7 +686,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (do_freeze)
{
- if (prstate.all_visible_except_removable && presult->set_all_frozen)
+ if (prstate.all_visible && prstate.all_frozen)
frz_conflict_horizon = prstate.visibility_cutoff_xid;
else
{
@@ -645,15 +713,52 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
+ /* Copy data back to 'presult' */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which heap pass (initial pass or final pass) ends up setting the
+ * page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state
+ * of things, as expected by our caller.
+ */
+ if (prstate.lpdead_items == 0)
+ {
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
+ }
+ else
+ {
+ presult->all_visible = false;
+ presult->all_frozen = false;
+ }
+ presult->hastup = prstate.hastup;
+
/*
* For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
+ * for that record must be the newest xmin on the page. However, if the
* page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId.
+ * vm_conflict_horizon should remain InvalidTransactionId. This includes
+ * the case that we just froze all the tuples; the prune-freeze record
+ * included the conflict XID already so the caller doesn't need it.
*/
- if (!presult->set_all_frozen)
+ if (!presult->all_frozen)
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->nfrozen = prstate.nfrozen;
+ else
+ presult->vm_conflict_horizon = InvalidTransactionId;
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
/*
* If we will freeze tuples on the page or, even if we don't freeze tuples
@@ -670,7 +775,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
else
*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
}
-
if (new_relmin_mxid)
{
if (presult->nfrozen > 0)
@@ -728,10 +832,12 @@ htsv_get_valid_status(int status)
* DEAD, our visibility test is just too coarse to detect it.
*
* In general, pruning must never leave behind a DEAD tuple that still has
- * tuple storage. VACUUM isn't prepared to deal with that case. That's why
+ * tuple storage. VACUUM isn't prepared to deal with that case (FIXME: it no longer cares, right?).
+ * That's why
* VACUUM prunes the same heap page a second time (without dropping its lock
* in the interim) when it sees a newly DEAD tuple that we initially saw as
- * in-progress. Retrying pruning like this can only happen when an inserting
+ * in-progress (FIXME: Really? Does it still do that?).
+ * Retrying pruning like this can only happen when an inserting
* transaction concurrently aborts.
*
* The root line pointer is redirected to the tuple immediately after the
@@ -743,15 +849,18 @@ htsv_get_valid_status(int status)
* prstate showing the changes to be made. Items to be redirected are added
* to the redirected[] array (two entries per redirection); items to be set to
* LP_DEAD state are added to nowdead[]; and items to be set to LP_UNUSED
- * state are added to nowunused[].
- *
- * Returns the number of tuples (to be) deleted from the page.
+ * state are added to nowunused[]. We perform bookkeeping of live tuples,
+ * visibility etc. based on what the page will look like after the changes
+ * applied. All that bookkeeping is performed in the heap_prune_record_*()
+ * subroutines. The division of labor is that heap_prune_chain() decides the
+ * fate of each tuple, ie. whether it's going to be removed, redirected or
+ * left unchanged, and the heap_prune_record_*() subroutines update PruneState
+ * based on that outcome.
*/
-static int
+static void
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- PruneState *prstate, PruneFreezeResult *presult)
+ PruneState *prstate)
{
- int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
TransactionId priorXmax = InvalidTransactionId;
ItemId rootlp;
@@ -794,8 +903,8 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* gets there first will mark the tuple unused.
*
* FIXME: The next paragraph isn't new with these patches, but
- * just something I realized while looking at this. But maybe we should
- * add a comment like this? Or is it too much detail?
+ * just something I realized while looking at this. But maybe we
+ * should add a comment like this? Or is it too much detail?
*
* Whether we arrive at the dead HOT tuple first here or while
* following a chain below affects whether preceding RECENTLY_DEAD
@@ -809,43 +918,52 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* TUPLE is already 'marked', and will not remove the
* RECENTLY_DEAD tuple. This is not a correctness issue, and the
* RECENTLY_DEAD tuple will be removed by a later VACUUM.
+ *
+ * FIXME: Now that we have the 'revisit' array, we could stash
+ * these DEAD items there too, instead of processing them here
+ * immediately. That way, DEAD tuples that are still part of a
+ * chain would always get processed as part of the chain.
*/
if (!HeapTupleHeaderIsHotUpdated(htup))
{
if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD)
{
- heap_prune_record_unused(prstate, rootoffnum);
+ heap_prune_record_unused(prstate, rootoffnum, true);
HeapTupleHeaderAdvanceConflictHorizon(htup,
&prstate->latest_xid_removed);
- ndeleted++;
- }
- else
- {
- Assert(!prstate->marked[rootoffnum]);
-
- /*
- * MFIXME: not sure if this is right -- maybe counting too
- * many
- */
-
- /*
- * Ensure that this tuple is counted. If it is later
- * redirected to, it would have been counted then, but we
- * won't double count because we check if it has already
- * been counted first.
- */
- heap_prune_record_live_or_recently_dead(dp, prstate, rootoffnum, presult);
+ return;
}
}
+ /*
+ * This tuple might be processed as part of a tuple chain later,
+ * so we don't want to mark it as processed just yet. We'll
+ * revisit it after processing all the chains, and count it then
+ * if it's still uncounted.
+ */
+ prstate->revisit[prstate->nrevisit++] = rootoffnum;
+
/* Nothing more to do */
- return ndeleted;
+ return;
}
}
/* Start from the root tuple */
offnum = rootoffnum;
+ /*----
+ * FIXME: this helped me to visualize how different chains might look like
+ * here. It's not an exhaustive list, just some examples to help with
+ * thinking. Remove this comment from final version, or refine.
+ *
+ * REDIRECT -> LIVE (stop) -> ...
+ * REDIRECT -> RECENTY_DEAD -> LIVE (stop) -> ...
+ * REDIRECT -> RECENTY_DEAD -> RECENTLY_DEAD
+ * REDIRECT -> RECENTY_DEAD -> DEAD
+ * REDIRECT -> RECENTY_DEAD -> DEAD -> RECENTLY_DEAD -> DEAD
+ * RECENTLY_DEAD -> ...
+ */
+
/* while not end of the chain */
for (;;)
{
@@ -897,19 +1015,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
/*
* If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can set dead
- * line pointers LP_UNUSED now. We don't increment ndeleted here
- * since the LP was already marked dead. If it will not be marked
- * LP_UNUSED, it will remain LP_DEAD, making the page not
- * all_visible.
+ * line pointers LP_UNUSED now.
*/
if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
- heap_prune_record_unused(prstate, offnum);
+ heap_prune_record_unused(prstate, offnum, false);
else
- {
- presult->all_visible = false;
- presult->deadoffsets[presult->lpdead_items++] = offnum;
- }
-
+ heap_prune_record_unchanged_lp_dead(prstate, offnum);
break;
}
@@ -941,34 +1052,11 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
case HEAPTUPLE_RECENTLY_DEAD:
recent_dead = true;
-
- /*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- */
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- */
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
- break;
-
case HEAPTUPLE_LIVE:
case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * If we wanted to optimize for aborts, we might consider
- * marking the page prunable when we see INSERT_IN_PROGRESS.
- * But we don't. See related decisions about when to mark the
- * page prunable in heapam.c.
- */
break;
default:
@@ -981,7 +1069,18 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* RECENTLY_DEAD tuples just in case there's a DEAD one after them;
* but we can't advance past anything else. We have to make sure that
* we don't miss any DEAD tuples, since DEAD tuples that still have
- * tuple storage after pruning will confuse VACUUM.
+ * tuple storage after pruning will confuse VACUUM (FIXME: not anymore
+ * I think?).
+ *
+ * FIXME: Not a new issue, but spotted it now : If there is a chain
+ * like RECENTLY_DEAD -> DEAD, we will remove both tuples, but will
+ * not call HeapTupleHeaderAdvanceConflictHorizon() for the
+ * RECENTLY_DEAD tuple. Is that OK? I think it is. In a HOT chain,
+ * we know that the later tuple committed before any earlier tuples in
+ * the chain, therefore it ought to be enough to set the conflict
+ * horizon based on the later tuple. If all snapshots on the standby
+ * see the deleter of the last tuple as committed, they must consider
+ * all the earlier ones as committed too.
*/
if (tupdead)
{
@@ -1026,18 +1125,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* right candidate for redirection.
*/
for (i = 1; (i < nchain) && (chainitems[i - 1] != latestdead); i++)
- {
- heap_prune_record_unused(prstate, chainitems[i]);
- ndeleted++;
- }
-
- /*
- * If the root entry had been a normal tuple, we are deleting it, so
- * count it in the result. But changing a redirect (even to DEAD
- * state) doesn't count.
- */
- if (ItemIdIsNormal(rootlp))
- ndeleted++;
+ heap_prune_record_unused(prstate, chainitems[i], true);
/*
* If the DEAD tuple is at the end of the chain, the entire chain is
@@ -1045,31 +1133,41 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
- heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, ItemIdIsNormal(rootlp));
else
- heap_prune_record_redirect(dp, prstate, rootoffnum, chainitems[i], presult);
+ {
+ heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], ItemIdIsNormal(rootlp));
+
+ /* the rest of tuples in the chain are normal, unchanged tuples */
+ for (; i < nchain; i++)
+ heap_prune_record_unchanged(dp, prstate, chainitems[i]);
+ }
}
- else if (nchain < 2 && ItemIdIsRedirected(rootlp))
+ else
{
- /*
- * We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune_and_freeze()
- * caused us to visit the dead successor of a redirect item before
- * visiting the redirect item. We can clean up by setting the
- * redirect item to DEAD state or LP_UNUSED if the caller indicated.
- */
- heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
+ i = 0;
+ if (ItemIdIsRedirected(rootlp))
+ {
+ if (nchain < 2)
+ {
+ /*
+ * We found a redirect item that doesn't point to a valid
+ * follow-on item. This can happen if the loop in
+ * heap_page_prune_and_freeze() caused us to visit the dead
+ * successor of a redirect item before visiting the redirect
+ * item. We can clean up by setting the redirect item to DEAD
+ * state or LP_UNUSED if the caller indicated.
+ */
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, false);
+ }
+ else
+ heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum);
+ i++;
+ }
+ /* the rest of tuples in the chain are normal, unchanged tuples */
+ for (; i < nchain; i++)
+ heap_prune_record_unchanged(dp, prstate, chainitems[i]);
}
-
- /*
- * If not marked for pruning, consider if the tuple should be counted as
- * live or recently dead. Note that line pointers redirected to will
- * already have been counted.
- */
- if (ItemIdIsNormal(rootlp) && !prstate->marked[rootoffnum])
- heap_prune_record_live_or_recently_dead(dp, prstate, rootoffnum, presult);
-
- return ndeleted;
}
/* Record lowest soon-prunable XID */
@@ -1088,43 +1186,69 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
/* Record line pointer to be redirected */
static void
-heap_prune_record_redirect(Page page, PruneState *prstate,
+heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
- PruneFreezeResult *presult)
+ bool was_normal)
{
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+
+ /*
+ * Do not mark the redirect target here. It needs to be counted
+ * separately as an unchanged tuple.
+ */
+
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
prstate->redirected[prstate->nredirected * 2 + 1] = rdoffnum;
- heap_prune_record_live_or_recently_dead(page, prstate, rdoffnum, presult);
prstate->nredirected++;
- Assert(!prstate->marked[offnum]);
- prstate->marked[offnum] = true;
- Assert(!prstate->marked[rdoffnum]);
- prstate->marked[rdoffnum] = true;
- presult->hastup = true;
+ /*
+ * If the root entry had been a normal tuple, we are deleting it, so count
+ * it in the result. But changing a redirect (even to DEAD state) doesn't
+ * count.
+ */
+ if (was_normal)
+ prstate->ndeleted++;
+
+ prstate->hastup = true;
}
/* Record line pointer to be marked dead */
static void
heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneFreezeResult *presult)
+ bool was_normal)
{
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
- Assert(!prstate->marked[offnum]);
- prstate->marked[offnum] = true;
/*
- * Setting the line pointer LP_DEAD means the page will definitely not be
- * all_visible.
+ * Deliberately delay unsetting all_visible until later during pruning.
+ * Removable dead tuples shouldn't preclude freezing the page. After
+ * finishing this first pass of tuple visibility checks, initialize
+ * all_visible_except_removable with the current value of all_visible to
+ * indicate whether or not the page is all visible except for dead tuples.
+ * This will allow us to attempt to freeze the page after pruning. Later
+ * during pruning, if we encounter an LP_DEAD item or are setting an item
+ * LP_DEAD, we will unset all_visible. As long as we unset it before
+ * updating the visibility map, this will be correct.
*/
- presult->all_visible = false;
/* Record the dead offset for vacuum */
- presult->deadoffsets[presult->lpdead_items++] = offnum;
+ prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+ /*
+ * If the root entry had been a normal tuple, we are deleting it, so count
+ * it in the result. But changing a redirect (even to DEAD state) doesn't
+ * count.
+ */
+ if (was_normal)
+ prstate->ndeleted++;
}
/*
@@ -1135,7 +1259,7 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
*/
static void
heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneFreezeResult *presult)
+ bool was_normal)
{
/*
* If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can remove dead tuples
@@ -1144,57 +1268,45 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
* likely.
*/
if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
- heap_prune_record_unused(prstate, offnum);
+ heap_prune_record_unused(prstate, offnum, was_normal);
else
- heap_prune_record_dead(prstate, offnum, presult);
+ heap_prune_record_dead(prstate, offnum, was_normal);
}
/* Record line pointer to be marked unused */
static void
-heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal)
{
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+
Assert(prstate->nunused < MaxHeapTuplesPerPage);
prstate->nowunused[prstate->nunused] = offnum;
prstate->nunused++;
- Assert(!prstate->marked[offnum]);
- prstate->marked[offnum] = true;
+
+ /*
+ * If the root entry had been a normal tuple, we are deleting it, so count
+ * it in the result. But changing a redirect (even to DEAD state) doesn't
+ * count.
+ */
+ if (was_normal)
+ prstate->ndeleted++;
}
+
+/*
+ * Record line pointer that is left unchanged. We consider freezing it, and
+ * update bookkeeping of tuple counts and page visibility.
+ */
static void
-heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNumber offnum,
- PruneFreezeResult *presult)
+heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum)
{
- HTSV_Result status;
HeapTupleHeader htup;
- bool totally_frozen;
-
- /* This could happen for items which are redirected to. */
- if (prstate->counted[offnum])
- return;
-
- prstate->counted[offnum] = true;
-
- /*
- * If we don't want to do any of the special defined actions, we don't
- * need to continue.
- */
- if (prstate->actions == 0)
- return;
- status = htsv_get_valid_status(prstate->htsv[offnum]);
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
- Assert(status != HEAPTUPLE_DEAD);
-
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the soft
- * assumption that any LP_DEAD items encountered here will become
- * LP_UNUSED later on, before count_nondeletable_pages is reached. If we
- * don't make this assumption then rel truncation will only happen every
- * other VACUUM, at most. Besides, VACUUM must treat
- * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
- * handled (handled here, or handled later on).
- */
- presult->hastup = true;
+ prstate->hastup = true; /* the page is not empty */
/*
* The criteria for counting a tuple as live in this block need to match
@@ -1206,15 +1318,6 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
* can't run inside a transaction block, which makes some cases impossible
* (e.g. in-progress insert from the same transaction).
*
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples that
- * might be seen here) differently, too: we assume that they'll become
- * LP_UNUSED before VACUUM finishes. This difference is only superficial.
- * VACUUM effectively agrees with ANALYZE about DEAD items, in the end.
- * VACUUM won't remember LP_DEAD items, but only because they're not
- * supposed to be left behind when it is done. (Cases where we bypass
- * index vacuuming will violate this optimistic assumption, but the
- * overall impact of that should be negligible.)
- *
* HEAPTUPLE_LIVE tuples are naturally counted as live. This is also what
* acquire_sample_rows() does.
*
@@ -1224,10 +1327,21 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
* ensure the math works out. The assumption that the transaction will
* commit and update the counters after we report is a bit shaky; but it
* is what acquire_sample_rows() does, so we do the same to be consistent.
+ *
+ * HEAPTUPLE_DEAD are handled by the other heap_prune_record_*()
+ * subroutines. They don't count dead items like acquire_sample_rows()
+ * does, because we assume that all dead items will become LP_UNUSED
+ * before VACUUM finishes. This difference is only superficial. VACUUM
+ * effectively agrees with ANALYZE about DEAD items, in the end. VACUUM
+ * won't remember LP_DEAD items, but only because they're not supposed to
+ * be left behind when it is done. (Cases where we bypass index vacuuming
+ * will violate this optimistic assumption, but the overall impact of that
+ * should be negligible.) FIXME: I don't understand that last sentence in
+ * parens. It was copied from elsewhere.
*/
htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
- switch (status)
+ switch (prstate->htsv[offnum])
{
case HEAPTUPLE_LIVE:
@@ -1235,7 +1349,7 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
* Count it as live. Not only is this natural, but it's also what
* acquire_sample_rows() does.
*/
- presult->live_tuples++;
+ prstate->live_tuples++;
/*
* Is the tuple definitely visible to all transactions?
@@ -1245,14 +1359,13 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
* See SetHintBits for more info. Check that the tuple is hinted
* xmin-committed because of that.
*/
- if (prstate->all_visible_except_removable)
+ if (prstate->all_visible)
{
TransactionId xmin;
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible_except_removable = false;
- presult->all_visible = false;
+ prstate->all_visible = false;
break;
}
@@ -1269,8 +1382,7 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
Assert(prstate->pagefrz.cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->pagefrz.cutoffs->OldestXmin))
{
- prstate->all_visible_except_removable = false;
- presult->all_visible = false;
+ prstate->all_visible = false;
break;
}
@@ -1280,6 +1392,7 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
prstate->visibility_cutoff_xid = xmin;
}
break;
+
case HEAPTUPLE_RECENTLY_DEAD:
/*
@@ -1287,10 +1400,35 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
* relation. (We only remove items that are LP_DEAD from
* pruning.)
*/
- presult->recently_dead_tuples++;
- prstate->all_visible_except_removable = false;
- presult->all_visible = false;
+ prstate->recently_dead_tuples++;
+ prstate->all_visible = false;
+
+ /*
+ * This tuple may soon become DEAD. Update the hint field so that
+ * the page is reconsidered for pruning in future.
+ */
+ heap_prune_record_prunable(prstate,
+ HeapTupleHeaderGetUpdateXid(htup));
break;
+
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+ /*
+ * This an expected case during concurrent vacuum. Count such rows
+ * as live. As above, we assume the deleting transaction will
+ * commit and update the counters after we report.
+ */
+ prstate->live_tuples++;
+ prstate->all_visible = false;
+
+ /*
+ * This tuple may soon become DEAD. Update the hint field so that
+ * the page is reconsidered for pruning in future.
+ */
+ heap_prune_record_prunable(prstate,
+ HeapTupleHeaderGetUpdateXid(htup));
+ break;
+
case HEAPTUPLE_INSERT_IN_PROGRESS:
/*
@@ -1300,22 +1438,23 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible_except_removable = false;
- presult->all_visible = false;
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
+ prstate->all_visible = false;
/*
- * This an expected case during concurrent vacuum. Count such rows
- * as live. As above, we assume the deleting transaction will
- * commit and update the counters after we report.
+ * If we wanted to optimize for aborts, we might consider marking
+ * the page prunable when we see INSERT_IN_PROGRESS. But we
+ * don't. See related decisions about when to mark the page
+ * prunable in heapam.c.
*/
- presult->live_tuples++;
- prstate->all_visible_except_removable = false;
- presult->all_visible = false;
break;
+
default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+
+ /*
+ * DEAD tuples should've been passed to heap_prune_record_dead()
+ * or heap_prune_record_unused() instead.
+ */
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d", prstate->htsv[offnum]);
break;
}
@@ -1323,12 +1462,14 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
if (prstate->actions & PRUNE_DO_TRY_FREEZE)
{
/* Tuple with storage -- consider need to freeze */
+ bool totally_frozen;
+
if ((heap_prepare_freeze_tuple(htup, &prstate->pagefrz,
- &prstate->frozen[presult->nfrozen],
+ &prstate->frozen[prstate->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- prstate->frozen[presult->nfrozen++].offset = offnum;
+ prstate->frozen[prstate->nfrozen++].offset = offnum;
}
/*
@@ -1337,9 +1478,50 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
* definitely cannot be set all-frozen in the visibility map later on
*/
if (!totally_frozen)
- presult->set_all_frozen = false;
+ prstate->all_frozen = false;
}
+}
+/*
+ * Record line pointer that was already LP_DEAD and is left unchanged.
+ */
+static void
+heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
+{
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the soft
+ * assumption that any LP_DEAD items encountered here will become
+ * LP_UNUSED later on, before count_nondeletable_pages is reached. If we
+ * don't make this assumption then rel truncation will only happen every
+ * other VACUUM, at most. Besides, VACUUM must treat
+ * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
+ * handled (handled here, or handled later on).
+ *
+ * Similarly, don't unset all_visible until later, at the end of
+ * heap_page_prune_and_freeze(). This will allow us to attempt to freeze
+ * the page after pruning. As long as we unset it before updating the
+ * visibility map, this will be correct.
+ */
+
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+
+ prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+}
+
+static void
+heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum)
+{
+ /*
+ * A redirect line pointer doesn't count as a live tuple.
+ *
+ * If we leave a redirect line pointer in place, there will be another
+ * tuple on the page that it points to. We will do the bookkeeping for
+ * that separately. So we have nothing to do here, except remember that we
+ * processed this item.
+ */
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04e86347a0b..92e02863e2d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1462,7 +1462,7 @@ lazy_scan_prune(LVRelState *vacrel,
&debug_cutoff, &debug_all_frozen))
Assert(false);
- Assert(presult.set_all_frozen == debug_all_frozen);
+ Assert(presult.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
debug_cutoff == presult.vm_conflict_horizon);
@@ -1517,7 +1517,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (presult.set_all_frozen)
+ if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1588,7 +1588,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && presult.all_visible &&
- presult.set_all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ef61e0277ee..a1765886447 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -224,37 +224,34 @@ typedef struct PruneFreezeResult
int nnewlpdead; /* Number of newly LP_DEAD items */
int nfrozen; /* Number of tuples we froze */
- /*
- * The rest of the fields in PruneFreezeResult are only guaranteed to be
- * initialized if heap_page_prune_and_freeze() is passed
- * PRUNE_DO_TRY_FREEZE.
- */
- /* Whether or not the page should be set all-frozen in the VM */
- bool set_all_frozen;
-
- /* Number of live and recently dead tuples */
+ /* Number of live and recently dead tuples on the page, after pruning */
int live_tuples;
int recently_dead_tuples;
/*
- * Whether or not the page is truly all-visible after pruning. If there
- * are LP_DEAD items on the page which cannot be removed until vacuum's
- * second pass, this will be false.
+ * Whether or not the page makes rel truncation unsafe
+ *
+ * This is set to 'true', even if the page contains LP_DEAD items. VACUUM
+ * will remove them before attempting to truncate.
*/
- bool all_visible;
-
-
- /* Whether or not the page makes rel truncation unsafe */
bool hastup;
/*
- * If the page is all-visible and not all-frozen this is the oldest xid
- * that can see the page as all-visible. It is to be used as the snapshot
- * conflict horizon when emitting a XLOG_HEAP2_VISIBLE record.
+ * all_visible and all_frozen indicate if the all-visible and all-frozen
+ * bits in the visibility map can be set for this page, after pruning.
+ *
+ * vm_conflict_horizon is the newest xmin of live tuples on the page. The
+ * caller can use it as the conflict horizon, when setting the VM bits.
+ * It is only valid if we froze some tuples, and all_frozen is true.
+ *
+ * These are only set if the PRUNE_DO_TRY_FREEZE action flag is set.
*/
+ bool all_visible;
+ bool all_frozen;
TransactionId vm_conflict_horizon;
- int lpdead_items; /* includes existing LP_DEAD items */
+ /* LP_DEAD items on the page after pruning. Includes existing LP_DEAD items */
+ int lpdead_items;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
--
2.39.2
On Thu, Mar 28, 2024 at 01:04:04AM +0200, Heikki Linnakangas wrote:
On 27/03/2024 20:26, Melanie Plageman wrote:
On Wed, Mar 27, 2024 at 12:18 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 27/03/2024 17:18, Melanie Plageman wrote:
I need some way to modify the control flow or accounting such that I
know which HEAPTUPLE_RECENTLY_DEAD tuples will not be marked LP_DEAD.
And a way to consider freezing and do live tuple accounting for these
and HEAPTUPLE_LIVE tuples exactly once.Just a quick update: I've been massaging this some more today, and I
think I'm onto got something palatable. I'll send an updated patch later
today, but the key is to note that for each item on the page, there is
one point where we determine the fate of the item, whether it's pruned
or not. That can happen in different points in in heap_page_prune().
That's also when we set marked[offnum] = true. Whenever that happens, we
all call one of the a heap_page_prune_record_*() subroutines. We already
have those subroutines for when a tuple is marked as dead or unused, but
let's add similar subroutines for the case that we're leaving the tuple
unchanged. If we move all the bookkeeping logic to those subroutines, we
can ensure that it gets done exactly once for each tuple, and at that
point we know what we are going to do to the tuple, so we can count it
correctly. So heap_prune_chain() decides what to do with each tuple, and
ensures that each tuple is marked only once, and the subroutines update
all the variables, add the item to the correct arrays etc. depending on
what we're doing with it.Yes, this would be ideal.
Well, that took me a lot longer than expected. My approach of "make sure you
all the right heap_prune_record_*() subroutine in all cases didn't work out
quite as easily as I thought. Because, as you pointed out, it's difficult to
know if a non-DEAD tuple that is part of a HOT chain will be visited later
as part of the chain processing, or needs to be counted at the top of
heap_prune_chain().The solution I came up with is to add a third phase to pruning. At the top
of heap_prune_chain(), if we see a live heap-only tuple, and we're not sure
if it will be counted later as part of a HOT chain, we stash it away and
revisit it later, after processing all the hot chains. That's somewhat
similar to your 'counted' array, but not quite.
I like this idea (the third phase). I've just started reviewing this but
I thought I would give the initial thoughts I had inline.
One change with this is that live_tuples and many of the other fields are
now again updated, even if the caller doesn't need them. It was hard to skip
them in a way that would save any cycles, with the other refactorings.
I am worried we are writing checks that are going to have to come out of
SELECT queries' bank accounts, but I'll do some benchmarking when we're
all done with major refactoring.
From 2f38628373ccfb6e8f8fd883955056030092569d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 28 Mar 2024 00:16:09 +0200
Subject: [PATCH v8 21/22] Add comment about a pre-existing issueNot sure if we want to keep this, but I wanted to document it for
discussion at least.
---
src/backend/access/heap/pruneheap.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index e37ba655a7d..2b720ab6aa1 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -792,6 +792,23 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, * Note that we might first arrive at a dead heap-only tuple * either here or while following a chain below. Whichever path * gets there first will mark the tuple unused. + * + * FIXME: The next paragraph isn't new with these patches, but + * just something I realized while looking at this. But maybe we should + * add a comment like this? Or is it too much detail?
I think a comment is a good idea.
+ * + * Whether we arrive at the dead HOT tuple first here or while + * following a chain below affects whether preceding RECENTLY_DEAD + * tuples in the chain can be removed or not. Imagine that you + * have a chain with two tuples: RECENTLY_DEAD -> DEAD. If we + * reach the RECENTLY_DEAD tuple first, the chain-following logic + * will find the DEAD tuple and conclude that both tuples are in + * fact dead and can be removed. But if we reach the DEAD tuple + * at the end of the chain first, when we reach the RECENTLY_DEAD + * tuple later, we will not follow the chain because the DEAD + * TUPLE is already 'marked', and will not remove the + * RECENTLY_DEAD tuple. This is not a correctness issue, and the + * RECENTLY_DEAD tuple will be removed by a later VACUUM. */ if (!HeapTupleHeaderIsHotUpdated(htup))
Is this intentional? Like would it be correct to remove the
RECENTLY_DEAD tuple during the current vacuum?
From c2ee7508456d0e76de985f9997a6840450e342a8 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 28 Mar 2024 00:45:26 +0200
Subject: [PATCH v8 22/22] WIP- Got rid of all_visible_except_removable again. We're back to the
roughly the same mechanism as on 'master', where the all_visible
doesn't include LP_DEAD items, but at the end of
heap_page_prune_and_freeze() when we return to the caller, we clear
it if there were any LP_DEAD items. I considered calling the
variable 'all_visible_except_lp_dead', which would be more accurate,
but it's also very long.
not longer than all_visible_except_removable. I would be happy to keep
it more exact, but I'm also okay with just all_visible.
- I duplicated all the fields from PageFreezeResult to PruneState. Now
heap_prune_chain() and all the subroutines don't need the
PageFreezeResult argument, and you don't need to remember which
struct each field is kept in. It's all now in PruneState, and the
fields that the caller is interested in are copied to
PageFreezeResult at the end of heap_page_prune_and_freeze()
yea, this makes sense to me. Makes me wonder if we shouldn't just have
PruneFreezeResult->live_tuples/recently_dead_tuples/etc be pointers and
then lazy_scan_prune() can pass the actual vacrel->live_tuples counter
and heap_page_prune_and_freeze() can increment it itself. Maybe that's
weird though.
- Replaced the 'counted' array with 'revisit' array. I thought I could
get rid of it altogether, by just being careful to call the right
heap_prune_record_*() subroutine for each tuple in heap_prune_chain(),
but with live and recently-dead tuples that are part of a HOT chain,
we might visit the tuple as part of the HOT chain or not, depending
on what it's position in the chain is. So I invented a new revisit
phase. All live heap-only tuples that we find, that haven't already
been processed as part of a hot chain, are stashed away in the
'revisit' array. After processing all the HOT chains, the 'revisit'
tuples are re-checked, and counted if they haven't already been counted.
makes sense.
- Live tuples are now also marked in the 'marked' array, when they are
counted. This gives a nice invariant: all tuples must be marked
exactly once, as part of a hot chain or otherwise. Added an
assertion for that.
this is a nice thing to have.
---
src/backend/access/heap/pruneheap.c | 706 +++++++++++++++++----------
src/backend/access/heap/vacuumlazy.c | 6 +-
src/include/access/heapam.h | 37 +-
3 files changed, 464 insertions(+), 285 deletions(-)diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index 2b720ab6aa1..a8ed11a1858 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c-static void heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate,
- OffsetNumber offnum, PruneFreezeResult *presult);
static void page_verify_redirects(Page page);@@ -242,6 +284,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
What's this "cutoffs TODO"?
+ * cutoffs TODO + * * presult contains output parameters needed by callers such as the number of * tuples removed and the number of line pointers newly marked LP_DEAD. * heap_page_prune_and_freeze() is responsible for initializing it. @@ -326,70 +370,63 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer, prstate.latest_xid_removed = InvalidTransactionId; prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0; memset(prstate.marked, 0, sizeof(prstate.marked)); - memset(prstate.counted, 0, sizeof(prstate.counted)); + prstate.nrevisit = 0;
if (presult->nfrozen > 0) @@ -728,10 +832,12 @@ htsv_get_valid_status(int status) * DEAD, our visibility test is just too coarse to detect it. * * In general, pruning must never leave behind a DEAD tuple that still has - * tuple storage. VACUUM isn't prepared to deal with that case. That's why + * tuple storage. VACUUM isn't prepared to deal with that case (FIXME: it no longer cares, right?). + * That's why * VACUUM prunes the same heap page a second time (without dropping its lock * in the interim) when it sees a newly DEAD tuple that we initially saw as - * in-progress. Retrying pruning like this can only happen when an inserting + * in-progress (FIXME: Really? Does it still do that?).
Yea, I think all that is no longer true. I missed this comment back
then.
+ * Retrying pruning like this can only happen when an inserting * transaction concurrently aborts. * * The root line pointer is redirected to the tuple immediately after the @@ -743,15 +849,18 @@ htsv_get_valid_status(int status) * prstate showing the changes to be made. Items to be redirected are added * to the redirected[] array (two entries per redirection); items to be set to * LP_DEAD state are added to nowdead[]; and items to be set to LP_UNUSED - * state are added to nowunused[]. - * - * Returns the number of tuples (to be) deleted from the page. + * state are added to nowunused[]. We perform bookkeeping of live tuples, + * visibility etc. based on what the page will look like after the changes + * applied. All that bookkeeping is performed in the heap_prune_record_*() + * subroutines. The division of labor is that heap_prune_chain() decides the + * fate of each tuple, ie. whether it's going to be removed, redirected or + * left unchanged, and the heap_prune_record_*() subroutines update PruneState + * based on that outcome. */ -static int +static void heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, - PruneState *prstate, PruneFreezeResult *presult) + PruneState *prstate) { - int ndeleted = 0; Page dp = (Page) BufferGetPage(buffer); TransactionId priorXmax = InvalidTransactionId; ItemId rootlp; @@ -794,8 +903,8 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, * gets there first will mark the tuple unused. * * FIXME: The next paragraph isn't new with these patches, but - * just something I realized while looking at this. But maybe we should - * add a comment like this? Or is it too much detail? + * just something I realized while looking at this. But maybe we + * should add a comment like this? Or is it too much detail?
i don't think it is too much detail.
* * Whether we arrive at the dead HOT tuple first here or while * following a chain below affects whether preceding RECENTLY_DEAD @@ -809,43 +918,52 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, * TUPLE is already 'marked', and will not remove the * RECENTLY_DEAD tuple. This is not a correctness issue, and the * RECENTLY_DEAD tuple will be removed by a later VACUUM. + * + * FIXME: Now that we have the 'revisit' array, we could stash + * these DEAD items there too, instead of processing them here + * immediately. That way, DEAD tuples that are still part of a + * chain would always get processed as part of the chain. */
I really like this idea!
if (!HeapTupleHeaderIsHotUpdated(htup)) { if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD) { - heap_prune_record_unused(prstate, rootoffnum); + heap_prune_record_unused(prstate, rootoffnum, true); HeapTupleHeaderAdvanceConflictHorizon(htup, &prstate->latest_xid_removed); - ndeleted++; - }
I think we could really do with some more comments with examples like
this in the pruning code (that go through an example series of steps).
Not least of which because you can't see RECENTLY_DEAD in pageinspect
(you'd have to create some kind of status for it from the different
tuples on the page).
For example, I hadn't thought of this one:
REDIRECT -> RECENTY_DEAD -> DEAD -> RECENTLY_DEAD -> DEAD
+ /*---- + * FIXME: this helped me to visualize how different chains might look like + * here. It's not an exhaustive list, just some examples to help with + * thinking. Remove this comment from final version, or refine. + * + * REDIRECT -> LIVE (stop) -> ... + * REDIRECT -> RECENTY_DEAD -> LIVE (stop) -> ... + * REDIRECT -> RECENTY_DEAD -> RECENTLY_DEAD + * REDIRECT -> RECENTY_DEAD -> DEAD + * REDIRECT -> RECENTY_DEAD -> DEAD -> RECENTLY_DEAD -> DEAD + * RECENTLY_DEAD -> ... + */ + /* while not end of the chain */ for (;;) { @@ -897,19 +1015,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,case HEAPTUPLE_RECENTLY_DEAD:
recent_dead = true;
-
- /*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- */
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
break;case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- */
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
- break;
-
case HEAPTUPLE_LIVE:
case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * If we wanted to optimize for aborts, we might consider
- * marking the page prunable when we see INSERT_IN_PROGRESS.
- * But we don't. See related decisions about when to mark the
- * page prunable in heapam.c.
- */
break;default: @@ -981,7 +1069,18 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, * RECENTLY_DEAD tuples just in case there's a DEAD one after them; * but we can't advance past anything else. We have to make sure that * we don't miss any DEAD tuples, since DEAD tuples that still have - * tuple storage after pruning will confuse VACUUM. + * tuple storage after pruning will confuse VACUUM (FIXME: not anymore + * I think?).
Meaning, it won't confuse vacuum anymore or there won't be DEAD tuples
with storage after pruning anymore?
+ * + * FIXME: Not a new issue, but spotted it now : If there is a chain + * like RECENTLY_DEAD -> DEAD, we will remove both tuples, but will + * not call HeapTupleHeaderAdvanceConflictHorizon() for the + * RECENTLY_DEAD tuple. Is that OK? I think it is. In a HOT chain, + * we know that the later tuple committed before any earlier tuples in + * the chain, therefore it ought to be enough to set the conflict + * horizon based on the later tuple. If all snapshots on the standby + * see the deleter of the last tuple as committed, they must consider + * all the earlier ones as committed too. */
This makes sense to me. (that if all the snapshots on the standby see
the deleter of the last tuple as committed, then they consider the
earlier ones deleted too). Probably wouldn't hurt to call this out here
though.
@@ -1206,15 +1318,6 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu * can't run inside a transaction block, which makes some cases impossible * (e.g. in-progress insert from the same transaction). * - * We treat LP_DEAD items (which are the closest thing to DEAD tuples that - * might be seen here) differently, too: we assume that they'll become - * LP_UNUSED before VACUUM finishes. This difference is only superficial. - * VACUUM effectively agrees with ANALYZE about DEAD items, in the end. - * VACUUM won't remember LP_DEAD items, but only because they're not - * supposed to be left behind when it is done. (Cases where we bypass - * index vacuuming will violate this optimistic assumption, but the - * overall impact of that should be negligible.) - * * HEAPTUPLE_LIVE tuples are naturally counted as live. This is also what * acquire_sample_rows() does. * @@ -1224,10 +1327,21 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu * ensure the math works out. The assumption that the transaction will * commit and update the counters after we report is a bit shaky; but it * is what acquire_sample_rows() does, so we do the same to be consistent. + * + * HEAPTUPLE_DEAD are handled by the other heap_prune_record_*() + * subroutines. They don't count dead items like acquire_sample_rows() + * does, because we assume that all dead items will become LP_UNUSED + * before VACUUM finishes. This difference is only superficial. VACUUM + * effectively agrees with ANALYZE about DEAD items, in the end. VACUUM + * won't remember LP_DEAD items, but only because they're not supposed to + * be left behind when it is done. (Cases where we bypass index vacuuming + * will violate this optimistic assumption, but the overall impact of that + * should be negligible.) FIXME: I don't understand that last sentence in + * parens. It was copied from elsewhere.
If we bypass index vacuuming, there will be LP_DEAD items left behind
when we are done because we didn't do index vacuuming and then reaping
of those dead items. All of these comments are kind of a copypasta,
though.
*/
htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));- switch (status)
Okay, that's all for now. I'll do more in-depth review tomorrow.
- Melanie
On 28/03/2024 03:53, Melanie Plageman wrote:
On Thu, Mar 28, 2024 at 01:04:04AM +0200, Heikki Linnakangas wrote:
One change with this is that live_tuples and many of the other fields are
now again updated, even if the caller doesn't need them. It was hard to skip
them in a way that would save any cycles, with the other refactorings.I am worried we are writing checks that are going to have to come out of
SELECT queries' bank accounts, but I'll do some benchmarking when we're
all done with major refactoring.
Sounds good, thanks.
+ * + * Whether we arrive at the dead HOT tuple first here or while + * following a chain below affects whether preceding RECENTLY_DEAD + * tuples in the chain can be removed or not. Imagine that you + * have a chain with two tuples: RECENTLY_DEAD -> DEAD. If we + * reach the RECENTLY_DEAD tuple first, the chain-following logic + * will find the DEAD tuple and conclude that both tuples are in + * fact dead and can be removed. But if we reach the DEAD tuple + * at the end of the chain first, when we reach the RECENTLY_DEAD + * tuple later, we will not follow the chain because the DEAD + * TUPLE is already 'marked', and will not remove the + * RECENTLY_DEAD tuple. This is not a correctness issue, and the + * RECENTLY_DEAD tuple will be removed by a later VACUUM. */ if (!HeapTupleHeaderIsHotUpdated(htup))Is this intentional? Like would it be correct to remove the
RECENTLY_DEAD tuple during the current vacuum?
Yes, it would be correct. And if we happen to visit the items in
different order, the RECENTLY_DEAD tuple first, it will get removed.
(This is just based on me reading the code, I'm not sure if I've missed
something. Would be nice to construct a test case for that and step
through it with a debugger to really see what happens. But this is not a
new issue, doesn't need to be part of this patch)
From c2ee7508456d0e76de985f9997a6840450e342a8 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 28 Mar 2024 00:45:26 +0200
Subject: [PATCH v8 22/22] WIP- Got rid of all_visible_except_removable again. We're back to the
roughly the same mechanism as on 'master', where the all_visible
doesn't include LP_DEAD items, but at the end of
heap_page_prune_and_freeze() when we return to the caller, we clear
it if there were any LP_DEAD items. I considered calling the
variable 'all_visible_except_lp_dead', which would be more accurate,
but it's also very long.not longer than all_visible_except_removable. I would be happy to keep
it more exact, but I'm also okay with just all_visible.
Ok, let's make it 'all_visible_except_lp_dead' for clarity.
What's this "cutoffs TODO"?
+ * cutoffs TODO
+ *
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
* heap_page_prune_and_freeze() is responsible for initializing it.
All the other arguments are documented in the comment, except 'cutoffs'.
@@ -728,10 +832,12 @@ htsv_get_valid_status(int status) * DEAD, our visibility test is just too coarse to detect it. * * In general, pruning must never leave behind a DEAD tuple that still has - * tuple storage. VACUUM isn't prepared to deal with that case. That's why + * tuple storage. VACUUM isn't prepared to deal with that case (FIXME: it no longer cares, right?). + * That's why * VACUUM prunes the same heap page a second time (without dropping its lock * in the interim) when it sees a newly DEAD tuple that we initially saw as - * in-progress. Retrying pruning like this can only happen when an inserting + * in-progress (FIXME: Really? Does it still do that?).Yea, I think all that is no longer true. I missed this comment back
then.
Committed a patch to remove it.
@@ -981,7 +1069,18 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, * RECENTLY_DEAD tuples just in case there's a DEAD one after them; * but we can't advance past anything else. We have to make sure that * we don't miss any DEAD tuples, since DEAD tuples that still have - * tuple storage after pruning will confuse VACUUM. + * tuple storage after pruning will confuse VACUUM (FIXME: not anymore + * I think?).Meaning, it won't confuse vacuum anymore or there won't be DEAD tuples
with storage after pruning anymore?
I meant that it won't confuse VACUUM anymore. lazy_scan_prune() doesn't
loop through the items on the page checking their visibility anymore.
Hmm, one confusion remains though: In the 2nd phase of vacuum, we remove
all the dead line pointers that we have now removed from the indexes.
When we do that, we assume them all to be dead line pointers, without
storage, rather than normal tuples that happen to be HEAPTUPLE_DEAD. So
it's important that if pruning would leave behind HEAPTUPLE_DEAD tuples,
they are not included in 'deadoffsets'.
In any case, let's just make sure that pruning doesn't leave
HEAPTUPLE_DEAD tuples. There's no reason it should.
@@ -1224,10 +1327,21 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu * ensure the math works out. The assumption that the transaction will * commit and update the counters after we report is a bit shaky; but it * is what acquire_sample_rows() does, so we do the same to be consistent. + * + * HEAPTUPLE_DEAD are handled by the other heap_prune_record_*() + * subroutines. They don't count dead items like acquire_sample_rows() + * does, because we assume that all dead items will become LP_UNUSED + * before VACUUM finishes. This difference is only superficial. VACUUM + * effectively agrees with ANALYZE about DEAD items, in the end. VACUUM + * won't remember LP_DEAD items, but only because they're not supposed to + * be left behind when it is done. (Cases where we bypass index vacuuming + * will violate this optimistic assumption, but the overall impact of that + * should be negligible.) FIXME: I don't understand that last sentence in + * parens. It was copied from elsewhere.If we bypass index vacuuming, there will be LP_DEAD items left behind
when we are done because we didn't do index vacuuming and then reaping
of those dead items. All of these comments are kind of a copypasta,
though.
Ah, gotcha, makes sense now. I didn't remember that we sometimes by pass
index vacuuming if there are very few dead items. I thought this talked
about the case that there are no indexes, but that case is OK.
--
Heikki Linnakangas
Neon (https://neon.tech)
On Thu, Mar 28, 2024 at 4:49 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 28/03/2024 03:53, Melanie Plageman wrote:
On Thu, Mar 28, 2024 at 01:04:04AM +0200, Heikki Linnakangas wrote:
One change with this is that live_tuples and many of the other fields are
now again updated, even if the caller doesn't need them. It was hard to skip
them in a way that would save any cycles, with the other refactorings.I am worried we are writing checks that are going to have to come out of
SELECT queries' bank accounts, but I'll do some benchmarking when we're
all done with major refactoring.Sounds good, thanks.
Actually, after having looked at it again, there really are only a few
more increments of various counters, the memory consumed by revisit,
and the additional setting of offsets in marked. I think a few
carefully constructed queries which do on-access pruning could test
the impact of this (as opposed to a bigger benchmarking endeavor).
I also wonder if there would be any actual impact of marking the
various record_*() routines inline.
@@ -728,10 +832,12 @@ htsv_get_valid_status(int status) * DEAD, our visibility test is just too coarse to detect it. * * In general, pruning must never leave behind a DEAD tuple that still has - * tuple storage. VACUUM isn't prepared to deal with that case. That's why + * tuple storage. VACUUM isn't prepared to deal with that case (FIXME: it no longer cares, right?). + * That's why * VACUUM prunes the same heap page a second time (without dropping its lock * in the interim) when it sees a newly DEAD tuple that we initially saw as - * in-progress. Retrying pruning like this can only happen when an inserting + * in-progress (FIXME: Really? Does it still do that?).Yea, I think all that is no longer true. I missed this comment back
then.Committed a patch to remove it.
@@ -981,7 +1069,18 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, * RECENTLY_DEAD tuples just in case there's a DEAD one after them; * but we can't advance past anything else. We have to make sure that * we don't miss any DEAD tuples, since DEAD tuples that still have - * tuple storage after pruning will confuse VACUUM. + * tuple storage after pruning will confuse VACUUM (FIXME: not anymore + * I think?).Meaning, it won't confuse vacuum anymore or there won't be DEAD tuples
with storage after pruning anymore?I meant that it won't confuse VACUUM anymore. lazy_scan_prune() doesn't
loop through the items on the page checking their visibility anymore.Hmm, one confusion remains though: In the 2nd phase of vacuum, we remove
all the dead line pointers that we have now removed from the indexes.
When we do that, we assume them all to be dead line pointers, without
storage, rather than normal tuples that happen to be HEAPTUPLE_DEAD. So
it's important that if pruning would leave behind HEAPTUPLE_DEAD tuples,
they are not included in 'deadoffsets'.
It seems like master only adds items it is marking LP_DEAD to
deadoffsets. And things marked LP_DEAD have lp_len set to 0.
In any case, let's just make sure that pruning doesn't leave
HEAPTUPLE_DEAD tuples. There's no reason it should.
Maybe worth adding an assert to
static void
heap_prune_record_unchanged_lp_dead(ItemId itemid, PruneState
*prstate, OffsetNumber offnum)
{
...
Assert(!ItemIdHasStorage(itemid));
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
}
By the way, I wasn't quite sure about the purpose of
heap_prune_record_unchanged_lp_dead(). It is called in
heap_prune_chain() in a place where we didn't add things to
deadoffsets before, no?
/*
* Likewise, a dead line pointer can't be part of the chain. (We
* already eliminated the case of dead root tuple outside this
* function.)
*/
if (ItemIdIsDead(lp))
{
/*
* If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can set dead
* line pointers LP_UNUSED now.
*/
if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
heap_prune_record_unused(prstate, offnum, false);
else
heap_prune_record_unchanged_lp_dead(lp, prstate, offnum);
break;
}
@@ -1224,10 +1327,21 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu * ensure the math works out. The assumption that the transaction will * commit and update the counters after we report is a bit shaky; but it * is what acquire_sample_rows() does, so we do the same to be consistent. + * + * HEAPTUPLE_DEAD are handled by the other heap_prune_record_*() + * subroutines. They don't count dead items like acquire_sample_rows() + * does, because we assume that all dead items will become LP_UNUSED + * before VACUUM finishes. This difference is only superficial. VACUUM + * effectively agrees with ANALYZE about DEAD items, in the end. VACUUM + * won't remember LP_DEAD items, but only because they're not supposed to + * be left behind when it is done. (Cases where we bypass index vacuuming + * will violate this optimistic assumption, but the overall impact of that + * should be negligible.) FIXME: I don't understand that last sentence in + * parens. It was copied from elsewhere.If we bypass index vacuuming, there will be LP_DEAD items left behind
when we are done because we didn't do index vacuuming and then reaping
of those dead items. All of these comments are kind of a copypasta,
though.Ah, gotcha, makes sense now. I didn't remember that we sometimes by pass
index vacuuming if there are very few dead items. I thought this talked
about the case that there are no indexes, but that case is OK.
These comments could use another pass. I had added some extra
(probably redundant) content when I thought I was refactoring it a
certain way and then changed my mind.
Attached is a diff with some ideas I had for a bit of code simplification.
Are you working on cleaning this patch up or should I pick it up?
I wonder if it makes sense to move some of the changes to
heap_prune_chain() with setting things in marked/revisited to the
start of the patch set (to be committed first).
- Melanie
Attachments:
suggested_edits.difftext/x-patch; charset=US-ASCII; name=suggested_edits.diffDownload
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a8ed11a1858..2f477aa43b1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -143,7 +143,7 @@ static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber o
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_dead(ItemId itemid, PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -766,7 +766,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* we can advance relfrozenxid and relminmxid to the values in
* pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
* MFIXME: which one should be pick if presult->nfrozen == 0 and
- * presult->all_frozen = True.
+ * presult->all_frozen = True. MTODO: see Peter's response here
+ * https://www.postgresql.org/message-id/CAH2-Wz%3DLmOs%3DiJ%3DFfCERnma0q7QjaNSnCgWEp7zOK7hD24YC_w%40mail.gmail.com
*/
if (new_relfrozen_xid)
{
@@ -868,9 +869,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
OffsetNumber latestdead = InvalidOffsetNumber,
maxoff = PageGetMaxOffsetNumber(dp),
offnum;
+
+ /* TODO: while maybe self-explanatory, I would prefer if chainitems and */
+ /* nchain had a comment up here */
OffsetNumber chainitems[MaxHeapTuplesPerPage];
- int nchain = 0,
- i;
+ int nchain = 0;
+ int i = 0;
rootlp = PageGetItemId(dp, rootoffnum);
@@ -943,6 +947,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
prstate->revisit[prstate->nrevisit++] = rootoffnum;
+ /* TODO: I don't like this comment now */
/* Nothing more to do */
return;
}
@@ -1020,7 +1025,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
heap_prune_record_unused(prstate, offnum, false);
else
- heap_prune_record_unchanged_lp_dead(prstate, offnum);
+ heap_prune_record_unchanged_lp_dead(lp, prstate, offnum);
break;
}
@@ -1044,6 +1049,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
+ /* TODO: maybe this should just be an if statement now */
switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
@@ -1132,42 +1138,34 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* dead and the root line pointer can be marked dead. Otherwise just
* redirect the root to the correct chain member.
*/
- if (i >= nchain)
- heap_prune_record_dead_or_unused(prstate, rootoffnum, ItemIdIsNormal(rootlp));
- else
- {
+ if (i < nchain)
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], ItemIdIsNormal(rootlp));
-
- /* the rest of tuples in the chain are normal, unchanged tuples */
- for (; i < nchain; i++)
- heap_prune_record_unchanged(dp, prstate, chainitems[i]);
- }
+ else
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, ItemIdIsNormal(rootlp));
}
- else
+ else if ((i = ItemIdIsRedirected(rootlp)))
{
- i = 0;
- if (ItemIdIsRedirected(rootlp))
+ if (i < nchain)
{
- if (nchain < 2)
- {
- /*
- * We found a redirect item that doesn't point to a valid
- * follow-on item. This can happen if the loop in
- * heap_page_prune_and_freeze() caused us to visit the dead
- * successor of a redirect item before visiting the redirect
- * item. We can clean up by setting the redirect item to DEAD
- * state or LP_UNUSED if the caller indicated.
- */
- heap_prune_record_dead_or_unused(prstate, rootoffnum, false);
- }
- else
- heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum);
- i++;
+ heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum);
+ }
+ else
+ {
+ /*
+ * We found a redirect item that doesn't point to a valid
+ * follow-on item. This can happen if the loop in
+ * heap_page_prune_and_freeze() caused us to visit the dead
+ * successor of a redirect item before visiting the redirect item.
+ * We can clean up by setting the redirect item to DEAD state or
+ * LP_UNUSED if the caller indicated.
+ */
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, false);
}
- /* the rest of tuples in the chain are normal, unchanged tuples */
- for (; i < nchain; i++)
- heap_prune_record_unchanged(dp, prstate, chainitems[i]);
}
+
+ /* the rest of tuples in the chain are normal, unchanged tuples */
+ for (; i < nchain; i++)
+ heap_prune_record_unchanged(dp, prstate, chainitems[i]);
}
/* Record lowest soon-prunable XID */
@@ -1486,7 +1484,7 @@ heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum)
* Record line pointer that was already LP_DEAD and is left unchanged.
*/
static void
-heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_dead(ItemId itemid, PruneState *prstate, OffsetNumber offnum)
{
/*
* Deliberately don't set hastup for LP_DEAD items. We make the soft
@@ -1506,6 +1504,7 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+ Assert(!ItemIdHasStorage(itemid));
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
}
On Thu, Mar 28, 2024 at 11:07:10AM -0400, Melanie Plageman wrote:
These comments could use another pass. I had added some extra
(probably redundant) content when I thought I was refactoring it a
certain way and then changed my mind.Attached is a diff with some ideas I had for a bit of code simplification.
Are you working on cleaning this patch up or should I pick it up?
Attached v9 is rebased over master. But, more importantly, I took
another pass at heap_prune_chain() and am pretty happy with what I came
up with. See 0021. I simplified the traversal logic and then grouped the
chain processing into three branches at the end. I find it much easier
to understand what we are doing for different types of HOT chains.
I got rid of revisited. We can put it back, but I was thinking: we stash
all HOT tuples and then loop over them later, calling record_unchanged()
on the ones that aren't marked. But, if we have a lot of HOT tuples, is
this really that much better than just looping through all the offsets
and calling record_unchanged() on just the ones that aren't marked?
I've done that in my version. While testing this, I found that only
on-access pruning needed this final loop calling record_unchanged() on
items not yet marked. I know we can't skip this final loop entirely in
the ON ACCESS case because it calls record_prunable(), but we could
consider moving that back out into heap_prune_chain()? Or what do you
think?
I haven't finished updating all the comments, but I am really interested
to know what you think about heap_prune_chain() now.
Note that patches 0001-0020 are still the same as before. Only 0021 is
the new changes I made (they are built on top of your v8 0022).
Tomorrow I will start first thing figuring out how to break this down
into parts that can apply on master and then rebase the rest of the
patches on top of it.
- Melanie
Attachments:
v9-0001-lazy_scan_prune-tests-tuple-vis-with-GlobalVisTes.patchtext/x-diff; charset=us-asciiDownload
From 37d167f1916f79b9b535a633de6a11e0596b8b74 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:14:47 -0500
Subject: [PATCH v9 01/21] lazy_scan_prune tests tuple vis with GlobalVisTest
One requirement for eventually combining the prune and freeze records,
is that we must check during pruning if live tuples on the page are
visible to everyone and thus, whether or not the page is all visible. We
only consider opportunistically freezing tuples if the whole page is all
visible and could be set all frozen.
During pruning (in heap_page_prune()), we do not have access to
VacuumCutoffs -- as on access pruning also calls heap_page_prune(). We
do, however, have access to a GlobalVisState. This can be used to
determine whether or not the tuple is visible to everyone. It also has
the potential of being more up-to-date than VacuumCutoffs->OldestXmin.
This commit simply modifies lazy_scan_prune() to use GlobalVisState
instead of OldestXmin. Future commits will move the
all_visible/all_frozen calculation into heap_page_prune().
---
src/backend/access/heap/vacuumlazy.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ba5b7083a3a..a7451743e25 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1579,11 +1579,15 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
+ * that everyone sees it as committed? A
+ * FrozenTransactionId is seen as committed to everyone.
+ * Otherwise, we check if there is a snapshot that
+ * considers this xid to still be running, and if so, we
+ * don't consider the page all-visible.
*/
xmin = HeapTupleHeaderGetXmin(htup);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ if (xmin != FrozenTransactionId &&
+ !GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
{
all_visible = false;
break;
--
2.40.1
v9-0002-Pass-heap_prune_chain-PruneResult-output-paramete.patchtext/x-diff; charset=us-asciiDownload
From 0289deb1a4636aec0b1abcea11a01b6127ed2e0b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 13:39:59 -0500
Subject: [PATCH v9 02/21] Pass heap_prune_chain() PruneResult output parameter
Future commits will set other members of PruneResult in
heap_prune_chain(), so start passing it as an output parameter now. This
eliminates the output parameter htsv -- the array of HTSV_Results --
since that is a member of the PruneResult.
---
src/backend/access/heap/pruneheap.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ef816c2fa9c..29c3c98b0e7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -59,8 +59,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- int8 *htsv,
- PruneState *prstate);
+ PruneState *prstate, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -325,7 +324,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Process this item or chain of items */
presult->ndeleted += heap_prune_chain(buffer, offnum,
- presult->htsv, &prstate);
+ &prstate, presult);
}
/* Clear the offset information once we have processed the given page. */
@@ -427,7 +426,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in htsv.
+ * Tuple visibility information is provided in presult->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -453,7 +452,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- int8 *htsv, PruneState *prstate)
+ PruneState *prstate, PruneResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -474,7 +473,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(htsv[rootoffnum] != -1);
+ Assert(presult->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -497,7 +496,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -594,7 +593,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(htsv[offnum]))
+ switch (htsv_get_valid_status(presult->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
--
2.40.1
v9-0003-Rename-PruneState-snapshotConflictHorizon-to-late.patchtext/x-diff; charset=us-asciiDownload
From 76d0c1b59e8f8c080ea502f0ad54527a5a0bfab3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 18:02:09 -0400
Subject: [PATCH v9 03/21] Rename PruneState->snapshotConflictHorizon to
latest_xid_removed
In anticipation of combining pruning and freezing and emitting a single
WAL record, rename PruneState->snapshotConflictHorizon to
latest_xid_removed. After pruning and freezing, we will choose a
combined record snapshot conflict horizon taking into account both
values.
---
src/backend/access/heap/pruneheap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 29c3c98b0e7..7d7e1d2744c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -35,7 +35,7 @@ typedef struct
bool mark_unused_now;
TransactionId new_prune_xid; /* new prune hint value for page */
- TransactionId snapshotConflictHorizon; /* latest xid removed */
+ TransactionId latest_xid_removed;
int nredirected; /* numbers of entries in arrays below */
int ndead;
int nunused;
@@ -238,7 +238,7 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.new_prune_xid = InvalidTransactionId;
prstate.vistest = vistest;
prstate.mark_unused_now = mark_unused_now;
- prstate.snapshotConflictHorizon = InvalidTransactionId;
+ prstate.latest_xid_removed = InvalidTransactionId;
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -367,7 +367,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (RelationNeedsWAL(relation))
{
log_heap_prune_and_freeze(relation, buffer,
- prstate.snapshotConflictHorizon,
+ prstate.latest_xid_removed,
true, reason,
NULL, 0,
prstate.redirected, prstate.nredirected,
@@ -501,7 +501,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
heap_prune_record_unused(prstate, rootoffnum);
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
+ &prstate->latest_xid_removed);
ndeleted++;
}
@@ -647,7 +647,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
latestdead = offnum;
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
+ &prstate->latest_xid_removed);
}
else if (!recent_dead)
break;
--
2.40.1
v9-0004-heap_page_prune-sets-all_visible-and-visibility_c.patchtext/x-diff; charset=us-asciiDownload
From b09e18aa153a9cf6f30d6cdb6c6346603daee83c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 18:31:24 -0400
Subject: [PATCH v9 04/21] heap_page_prune sets all_visible and
visibility_cutoff_xid
In order to combine the prune and freeze records, we must know if the
page is eligible to be opportunistically frozen before finishing
pruning. Save all_visible in the PruneResult and set it to false when we
see non-removable tuples which are not visible to everyone.
We will also need to ensure that the snapshotConflictHorizon for the combined
prune + freeze record is the more conservative of that calculated for each of
pruning and freezing. Calculate the visibility_cutoff_xid for the purposes of
freezing -- the newest xmin on the page -- in heap_page_prune() and save it in
PruneResult.visibility_cutoff_xid.
Note that these are only needed by vacuum callers of heap_page_prune(),
so don't update them for on-access pruning.
---
src/backend/access/heap/pruneheap.c | 131 +++++++++++++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 113 +++++------------------
src/include/access/heapam.h | 21 +++++
3 files changed, 169 insertions(+), 96 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7d7e1d2744c..52513fcdc90 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -63,8 +63,10 @@ static int heap_prune_chain(Buffer buffer,
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
-static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -249,6 +251,14 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ /*
+ * Keep track of whether or not the page is all_visible in case the caller
+ * wants to use this information to update the VM.
+ */
+ presult->all_visible = true;
+ /* for recovery conflicts */
+ presult->visibility_cutoff_xid = InvalidTransactionId;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -300,8 +310,101 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
+
+ if (reason == PRUNE_ON_ACCESS)
+ continue;
+
+ switch (presult->htsv[offnum])
+ {
+ case HEAPTUPLE_DEAD:
+
+ /*
+ * Deliberately delay unsetting all_visible until later during
+ * pruning. Removable dead tuples shouldn't preclude freezing
+ * the page. After finishing this first pass of tuple
+ * visibility checks, initialize all_visible_except_removable
+ * with the current value of all_visible to indicate whether
+ * or not the page is all visible except for dead tuples. This
+ * will allow us to attempt to freeze the page after pruning.
+ * Later during pruning, if we encounter an LP_DEAD item or
+ * are setting an item LP_DEAD, we will unset all_visible. As
+ * long as we unset it before updating the visibility map,
+ * this will be correct.
+ */
+ break;
+ case HEAPTUPLE_LIVE:
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed
+ * asynchronously. See SetHintBits for more info. Check that
+ * the tuple is hinted xmin-committed because of that.
+ */
+ if (presult->all_visible)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed? A
+ * FrozenTransactionId is seen as committed to everyone.
+ * Otherwise, we check if there is a snapshot that
+ * considers this xid to still be running, and if so, we
+ * don't consider the page all-visible.
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+ if (xmin != FrozenTransactionId &&
+ !GlobalVisTestIsRemovableXid(vistest, xmin))
+ {
+ presult->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, presult->visibility_cutoff_xid) &&
+ TransactionIdIsNormal(xmin))
+ presult->visibility_cutoff_xid = xmin;
+ }
+ break;
+ case HEAPTUPLE_RECENTLY_DEAD:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+ /* This is an expected case during concurrent vacuum */
+ presult->all_visible = false;
+ break;
+ default:
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+ break;
+ }
}
+ /*
+ * For vacuum, if the whole page will become frozen, we consider
+ * opportunistically freezing tuples. Dead tuples which will be removed by
+ * the end of vacuuming should not preclude us from opportunistically
+ * freezing. We will not be able to freeze the whole page if there are
+ * tuples present which are not visible to everyone or if there are dead
+ * tuples which are not yet removable. We need all_visible to be false if
+ * LP_DEAD tuples remain after pruning so that we do not incorrectly
+ * update the visibility map or page hint bit. So, we will update
+ * presult->all_visible to reflect the presence of LP_DEAD items while
+ * pruning and keep all_visible_except_removable to permit freezing if the
+ * whole page will eventually become all visible after removing tuples.
+ */
+ presult->all_visible_except_removable = presult->all_visible;
+
/* Scan the page */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -565,10 +668,14 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
/*
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now. We don't increment ndeleted here since
- * the LP was already marked dead.
+ * the LP was already marked dead. If it will not be marked
+ * LP_UNUSED, it will remain LP_DEAD, making the page not
+ * all_visible.
*/
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
+ else
+ presult->all_visible = false;
break;
}
@@ -705,7 +812,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect the root to the correct chain member.
*/
if (i >= nchain)
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
}
@@ -718,7 +825,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* redirect item. We can clean up by setting the redirect item to
* DEAD state or LP_UNUSED if the caller indicated.
*/
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
return ndeleted;
@@ -755,13 +862,20 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
-heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+
+ /*
+ * Setting the line pointer LP_DEAD means the page will definitely not be
+ * all_visible.
+ */
+ presult->all_visible = false;
}
/*
@@ -771,7 +885,8 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
* pointers LP_DEAD if mark_unused_now is true.
*/
static void
-heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ PruneResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
@@ -782,7 +897,7 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
- heap_prune_record_dead(prstate, offnum);
+ heap_prune_record_dead(prstate, offnum, presult);
}
/* Record line pointer to be marked unused */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a7451743e25..17fb0b4f7b7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1422,9 +1422,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_visible,
- all_frozen;
- TransactionId visibility_cutoff_xid;
+ bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1465,17 +1463,16 @@ lazy_scan_prune(LVRelState *vacrel,
&presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
/*
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Keep track of whether or not the page is all_visible and
- * all_frozen and use this information to update the VM. all_visible
- * implies 0 lpdead_items, but don't trust all_frozen result unless
- * all_visible is also set to true.
+ * Now scan the page to collect LP_DEAD items and check for tuples
+ * requiring freezing among remaining tuples with storage. We will update
+ * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
+ * have determined whether or not the page is all_visible. Keep track of
+ * whether or not the page is all_frozen and use this information to
+ * update the VM. all_visible implies lpdead_items == 0, but don't trust
+ * all_frozen result unless all_visible is also set to true.
*
- * Also keep track of the visibility cutoff xid for recovery conflicts.
*/
- all_visible = true;
all_frozen = true;
- visibility_cutoff_xid = InvalidTransactionId;
/*
* Now scan the page to collect LP_DEAD items and update the variables set
@@ -1516,11 +1513,6 @@ lazy_scan_prune(LVRelState *vacrel,
* will only happen every other VACUUM, at most. Besides, VACUUM
* must treat hastup/nonempty_pages as provisional no matter how
* LP_DEAD items are handled (handled here, or handled later on).
- *
- * Also deliberately delay unsetting all_visible until just before
- * we return to lazy_scan_heap caller, as explained in full below.
- * (This is another case where it's useful to anticipate that any
- * LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
*/
deadoffsets[lpdead_items++] = offnum;
continue;
@@ -1558,46 +1550,6 @@ lazy_scan_prune(LVRelState *vacrel,
* what acquire_sample_rows() does.
*/
live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A
- * FrozenTransactionId is seen as committed to everyone.
- * Otherwise, we check if there is a snapshot that
- * considers this xid to still be running, and if so, we
- * don't consider the page all-visible.
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (xmin != FrozenTransactionId &&
- !GlobalVisTestIsRemovableXid(vacrel->vistest, xmin))
- {
- all_visible = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
break;
case HEAPTUPLE_RECENTLY_DEAD:
@@ -1607,7 +1559,6 @@ lazy_scan_prune(LVRelState *vacrel,
* pruning.)
*/
recently_dead_tuples++;
- all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1618,16 +1569,13 @@ lazy_scan_prune(LVRelState *vacrel,
* results. This assumption is a bit shaky, but it is what
* acquire_sample_rows() does, so be consistent.
*/
- all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
- all_visible = false;
/*
- * Count such rows as live. As above, we assume the deleting
- * transaction will commit and update the counters after we
- * report.
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
*/
live_tuples++;
break;
@@ -1670,7 +1618,7 @@ lazy_scan_prune(LVRelState *vacrel,
* page all-frozen afterwards (might not happen until final heap pass).
*/
if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (all_visible && all_frozen &&
+ (presult.all_visible_except_removable && all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1708,11 +1656,11 @@ lazy_scan_prune(LVRelState *vacrel,
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (all_visible && all_frozen)
+ if (presult.all_visible_except_removable && all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = visibility_cutoff_xid;
- visibility_cutoff_xid = InvalidTransactionId;
+ snapshotConflictHorizon = presult.visibility_cutoff_xid;
+ presult.visibility_cutoff_xid = InvalidTransactionId;
}
else
{
@@ -1748,17 +1696,19 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && lpdead_items == 0)
+ if (presult.all_visible)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
+ Assert(lpdead_items == 0);
+
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
Assert(false);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == visibility_cutoff_xid);
+ debug_cutoff == presult.visibility_cutoff_xid);
}
#endif
@@ -1783,19 +1733,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(dead_items->num_items <= dead_items->max_items);
pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
dead_items->num_items);
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on
- * to make the choice of whether or not to freeze the page unaffected
- * by the short-term presence of LP_DEAD items. These LP_DEAD items
- * were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible. It needs
- * to reflect the present state of things, as expected by our caller.
- */
- all_visible = false;
}
/* Finally, add page-local counts to whole-VACUUM counts */
@@ -1812,20 +1749,20 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (lpdead_items > 0);
- Assert(!all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_visible || !(*has_lpdead_items));
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (all_frozen)
{
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1845,7 +1782,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
+ vmbuffer, presult.visibility_cutoff_xid,
flags);
}
@@ -1893,7 +1830,7 @@ lazy_scan_prune(LVRelState *vacrel,
* it as all-frozen. Note that all_frozen is only valid if all_visible is
* true, so we must check both all_visible and all_frozen.
*/
- else if (all_visible_according_to_vm && all_visible &&
+ else if (all_visible_according_to_vm && presult.all_visible &&
all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
@@ -1914,7 +1851,7 @@ lazy_scan_prune(LVRelState *vacrel,
* since a snapshotConflictHorizon sufficient to make everything safe
* for REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f1122453738..29daab7aeb8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -199,6 +199,27 @@ typedef struct PruneResult
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ /*
+ * The rest of the fields in PruneResult are only guaranteed to be
+ * initialized if heap_page_prune is passed PruneReason VACUUM_SCAN.
+ */
+
+ /*
+ * Whether or not the page is truly all-visible after pruning. If there
+ * are LP_DEAD items on the page which cannot be removed until vacuum's
+ * second pass, this will be false.
+ */
+ bool all_visible;
+
+ /*
+ * Whether or not the page is all-visible except for tuples which will be
+ * removed during vacuum's second pass. This is used by VACUUM to
+ * determine whether or not to consider opportunistically freezing the
+ * page.
+ */
+ bool all_visible_except_removable;
+ TransactionId visibility_cutoff_xid; /* Newest xmin on the page */
+
/*
* Tuple visibility is only computed once for each tuple, for correctness
* and efficiency reasons; see comment in heap_page_prune() for details.
--
2.40.1
v9-0005-Add-reference-to-VacuumCutoffs-in-HeapPageFreeze.patchtext/x-diff; charset=us-asciiDownload
From 5844448db2bbaa79e1de90c41f837f67957d2308 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 6 Jan 2024 16:22:17 -0500
Subject: [PATCH v9 05/21] Add reference to VacuumCutoffs in HeapPageFreeze
Future commits will move opportunistic freezing into the main path of
pruning in heap_page_prune(). Because on-access pruning will not do
opportunistic freezing, it is cleaner to keep the visibility information
required for calling heap_prepare_freeze_tuple() inside of the
HeapPageFreeze structure itself by saving a reference to VacuumCutoffs.
---
src/backend/access/heap/heapam.c | 16 ++++++++--------
src/backend/access/heap/vacuumlazy.c | 3 ++-
src/include/access/heapam.h | 2 +-
3 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2f6527df0dc..bb856690234 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6125,9 +6125,9 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
*/
static TransactionId
FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
- const struct VacuumCutoffs *cutoffs, uint16 *flags,
- HeapPageFreeze *pagefrz)
+ uint16 *flags, HeapPageFreeze *pagefrz)
{
+ const struct VacuumCutoffs *cutoffs = pagefrz->cutoffs;
TransactionId newxmax;
MultiXactMember *members;
int nmembers;
@@ -6475,10 +6475,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
*/
bool
heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen)
{
+ const struct VacuumCutoffs *cutoffs = pagefrz->cutoffs;
bool xmin_already_frozen = false,
xmax_already_frozen = false;
bool freeze_xmin = false,
@@ -6550,8 +6550,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* perform no-op xmax processing. The only constraint is that the
* FreezeLimit/MultiXactCutoff postcondition must never be violated.
*/
- newxmax = FreezeMultiXactId(xid, tuple->t_infomask, cutoffs,
- &flags, pagefrz);
+ newxmax = FreezeMultiXactId(xid, tuple->t_infomask, &flags, pagefrz);
if (flags & FRM_NOOP)
{
@@ -6729,7 +6728,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* Does this tuple force caller to freeze the entire page?
*/
pagefrz->freeze_required =
- heap_tuple_should_freeze(tuple, cutoffs,
+ heap_tuple_should_freeze(tuple, pagefrz->cutoffs,
&pagefrz->NoFreezePageRelfrozenXid,
&pagefrz->NoFreezePageRelminMxid);
}
@@ -6890,8 +6889,9 @@ heap_freeze_tuple(HeapTupleHeader tuple,
pagefrz.NoFreezePageRelfrozenXid = FreezeLimit;
pagefrz.NoFreezePageRelminMxid = MultiXactCutoff;
- do_freeze = heap_prepare_freeze_tuple(tuple, &cutoffs,
- &pagefrz, &frz, &totally_frozen);
+ pagefrz.cutoffs = &cutoffs;
+
+ do_freeze = heap_prepare_freeze_tuple(tuple, &pagefrz, &frz, &totally_frozen);
/*
* Note that because this is not a WAL-logged operation, we don't need to
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 17fb0b4f7b7..1b060124a3f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1442,6 +1442,7 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ pagefrz.cutoffs = &vacrel->cutoffs;
tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
@@ -1587,7 +1588,7 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
+ if (heap_prepare_freeze_tuple(htup, &pagefrz,
&frozen[tuples_frozen], &totally_frozen))
{
/* Save prepared freeze plan for later */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 29daab7aeb8..689427e2512 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -189,6 +189,7 @@ typedef struct HeapPageFreeze
TransactionId NoFreezePageRelfrozenXid;
MultiXactId NoFreezePageRelminMxid;
+ struct VacuumCutoffs *cutoffs;
} HeapPageFreeze;
/*
@@ -324,7 +325,6 @@ extern TM_Result heap_lock_tuple(Relation relation, ItemPointer tid,
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
--
2.40.1
v9-0006-Prepare-freeze-tuples-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 7a74b4a420b5a19270e8f648483cdfa31b5a5892 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 19:23:11 -0400
Subject: [PATCH v9 06/21] Prepare freeze tuples in heap_page_prune()
In order to combine the freeze and prune records, we must determine
which tuples are freezable before actually executing pruning. All of the
page modifications should be made in the same critical section along
with emitting the combined WAL. Determine whether or not tuples should
or must be frozen and whether or not the page will be all frozen as a
consequence during pruning.
---
src/backend/access/heap/pruneheap.c | 41 +++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 68 ++++++----------------------
src/include/access/heapam.h | 12 +++++
3 files changed, 64 insertions(+), 57 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 52513fcdc90..eb09713311b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -153,7 +153,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false,
+ heap_page_prune(relation, buffer, vistest, false, NULL,
&presult, PRUNE_ON_ACCESS, NULL);
/*
@@ -201,6 +201,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED
* during pruning.
*
+ * pagefrz contains both input and output parameters used if the caller is
+ * interested in potentially freezing tuples on the page.
+ *
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
* heap_page_prune() is responsible for initializing it.
@@ -215,6 +218,7 @@ void
heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
PruneReason reason,
OffsetNumber *off_loc)
@@ -250,11 +254,16 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ presult->nfrozen = 0;
/*
- * Keep track of whether or not the page is all_visible in case the caller
- * wants to use this information to update the VM.
+ * Caller will update the VM after pruning, collecting LP_DEAD items, and
+ * freezing tuples. Keep track of whether or not the page is all_visible
+ * and all_frozen and use this information to update the VM. all_visible
+ * implies lpdead_items == 0, but don't trust all_frozen result unless
+ * all_visible is also set to true.
*/
+ presult->all_frozen = true;
presult->all_visible = true;
/* for recovery conflicts */
presult->visibility_cutoff_xid = InvalidTransactionId;
@@ -388,6 +397,32 @@ heap_page_prune(Relation relation, Buffer buffer,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
+
+ /*
+ * Consider freezing any normal tuples which will not be removed
+ */
+ if (presult->htsv[offnum] != HEAPTUPLE_DEAD && pagefrz)
+ {
+ bool totally_frozen;
+
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, pagefrz,
+ &presult->frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ presult->frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to
+ * become totally frozen (according to its freeze plan), then the
+ * page definitely cannot be set all-frozen in the visibility map
+ * later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+ }
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1b060124a3f..2a3cc5c7cd3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1416,16 +1416,13 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff;
ItemId itemid;
PruneResult presult;
- int tuples_frozen,
- lpdead_items,
+ int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_frozen;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1443,7 +1440,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
recently_dead_tuples = 0;
@@ -1460,21 +1456,9 @@ lazy_scan_prune(LVRelState *vacrel,
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
* false otherwise.
*/
- heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+ heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0, &pagefrz,
&presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
- /*
- * Now scan the page to collect LP_DEAD items and check for tuples
- * requiring freezing among remaining tuples with storage. We will update
- * the VM after collecting LP_DEAD items and freezing tuples. Pruning will
- * have determined whether or not the page is all_visible. Keep track of
- * whether or not the page is all_frozen and use this information to
- * update the VM. all_visible implies lpdead_items == 0, but don't trust
- * all_frozen result unless all_visible is also set to true.
- *
- */
- all_frozen = true;
-
/*
* Now scan the page to collect LP_DEAD items and update the variables set
* just above.
@@ -1483,9 +1467,6 @@ lazy_scan_prune(LVRelState *vacrel,
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
- HeapTupleHeader htup;
- bool totally_frozen;
-
/*
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
@@ -1521,8 +1502,6 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(ItemIdIsNormal(itemid));
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
/*
* The criteria for counting a tuple as live in this block need to
* match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
@@ -1587,29 +1566,8 @@ lazy_scan_prune(LVRelState *vacrel,
hastup = true; /* page makes rel truncation unsafe */
- /* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &pagefrz,
- &frozen[tuples_frozen], &totally_frozen))
- {
- /* Save prepared freeze plan for later */
- frozen[tuples_frozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the page
- * definitely cannot be set all-frozen in the visibility map later on
- */
- if (!totally_frozen)
- all_frozen = false;
}
- /*
- * We have now divided every item on the page into either an LP_DEAD item
- * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
- * that remains and needs to be considered for freezing now (LP_UNUSED and
- * LP_REDIRECT items also remain, but are of no further interest to us).
- */
vacrel->offnum = InvalidOffsetNumber;
/*
@@ -1618,8 +1576,8 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (presult.all_visible_except_removable && all_frozen &&
+ if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ (presult.all_visible_except_removable && presult.all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1629,7 +1587,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (tuples_frozen == 0)
+ if (presult.nfrozen == 0)
{
/*
* We have no freeze plans to execute, so there's no added cost
@@ -1657,7 +1615,7 @@ lazy_scan_prune(LVRelState *vacrel,
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (presult.all_visible_except_removable && all_frozen)
+ if (presult.all_visible_except_removable && presult.all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
snapshotConflictHorizon = presult.visibility_cutoff_xid;
@@ -1673,7 +1631,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(vacrel->rel, buf,
snapshotConflictHorizon,
- frozen, tuples_frozen);
+ presult.frozen, presult.nfrozen);
}
}
else
@@ -1684,8 +1642,8 @@ lazy_scan_prune(LVRelState *vacrel,
*/
vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- all_frozen = false;
- tuples_frozen = 0; /* avoid miscounts in instrumentation */
+ presult.all_frozen = false;
+ presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1708,6 +1666,8 @@ lazy_scan_prune(LVRelState *vacrel,
&debug_cutoff, &debug_all_frozen))
Assert(false);
+ Assert(presult.all_frozen == debug_all_frozen);
+
Assert(!TransactionIdIsValid(debug_cutoff) ||
debug_cutoff == presult.visibility_cutoff_xid);
}
@@ -1738,7 +1698,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += tuples_frozen;
+ vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
vacrel->live_tuples += live_tuples;
vacrel->recently_dead_tuples += recently_dead_tuples;
@@ -1761,7 +1721,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (all_frozen)
+ if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1832,7 +1792,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && presult.all_visible &&
- all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 689427e2512..9d047621ea5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -219,6 +219,9 @@ typedef struct PruneResult
* page.
*/
bool all_visible_except_removable;
+
+ /* Whether or not the page can be set all-frozen in the VM */
+ bool all_frozen;
TransactionId visibility_cutoff_xid; /* Newest xmin on the page */
/*
@@ -231,6 +234,14 @@ typedef struct PruneResult
* 1. Otherwise every access would need to subtract 1.
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+
+ /* Number of tuples we may freeze */
+ int nfrozen;
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneResult;
/* 'reason' codes for heap_page_prune() */
@@ -353,6 +364,7 @@ extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
PruneResult *presult,
PruneReason reason,
OffsetNumber *off_loc);
--
2.40.1
v9-0007-lazy_scan_prune-reorder-freeze-execution-logic.patchtext/x-diff; charset=us-asciiDownload
From 668f841634312d5b0ed4d048f10a0a4b94485cbc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 19:39:25 -0400
Subject: [PATCH v9 07/21] lazy_scan_prune reorder freeze execution logic
To combine the prune and freeze records, freezing must be done before a
pruning WAL record is emitted. We will move the freeze execution into
heap_page_prune() in future commits. lazy_scan_prune() currently
executes freezing, updates vacrel->NewRelfrozenXid and
vacrel->NewRelminMxid, and resets the snapshotConflictHorizon that the
visibility map update record may use in the same block of if statements.
This commit starts reordering that logic so that the freeze execution
can be separated from the other updates which should not be done in
pruning.
---
src/backend/access/heap/vacuumlazy.c | 93 +++++++++++++++-------------
1 file changed, 50 insertions(+), 43 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2a3cc5c7cd3..f474e661428 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1421,6 +1421,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
+ bool do_freeze;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1576,10 +1577,15 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || presult.nfrozen == 0 ||
+ do_freeze = pagefrz.freeze_required ||
(presult.all_visible_except_removable && presult.all_frozen &&
- fpi_before != pgWalUsage.wal_fpi))
+ presult.nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+
+ if (do_freeze)
{
+ TransactionId snapshotConflictHorizon;
+
/*
* We're freezing the page. Our final NewRelfrozenXid doesn't need to
* be affected by the XIDs that are just about to be frozen anyway.
@@ -1587,52 +1593,53 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- if (presult.nfrozen == 0)
- {
- /*
- * We have no freeze plans to execute, so there's no added cost
- * from following the freeze path. That's why it was chosen. This
- * is important in the case where the page only contains totally
- * frozen tuples at this point (perhaps only following pruning).
- * Such pages can be marked all-frozen in the VM by our caller,
- * even though none of its tuples were newly frozen here (note
- * that the "no freeze" path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter
- * here, since it only counts pages with newly frozen tuples
- * (don't confuse that with pages newly set all-frozen in VM).
- */
- }
+ vacrel->frozen_pages++;
+
+ /*
+ * We can use frz_conflict_horizon as our cutoff for conflicts when
+ * the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin.
+ */
+ if (presult.all_visible_except_removable && presult.all_frozen)
+ snapshotConflictHorizon = presult.visibility_cutoff_xid;
else
{
- TransactionId snapshotConflictHorizon;
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ snapshotConflictHorizon = pagefrz.cutoffs->OldestXmin;
+ TransactionIdRetreat(snapshotConflictHorizon);
+ }
- vacrel->frozen_pages++;
+ /* Using same cutoff when setting VM is now unnecessary */
+ if (presult.all_visible_except_removable && presult.all_frozen)
+ presult.visibility_cutoff_xid = InvalidTransactionId;
- /*
- * We can use visibility_cutoff_xid as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM
- * once we're done with it. Otherwise we generate a conservative
- * cutoff by stepping back from OldestXmin.
- */
- if (presult.all_visible_except_removable && presult.all_frozen)
- {
- /* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = presult.visibility_cutoff_xid;
- presult.visibility_cutoff_xid = InvalidTransactionId;
- }
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(vacrel->rel, buf,
+ snapshotConflictHorizon,
+ presult.frozen, presult.nfrozen);
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
+ }
+ else if (presult.all_frozen && presult.nfrozen == 0)
+ {
+ /* Page should be all visible except to-be-removed tuples */
+ Assert(presult.all_visible_except_removable);
+
+ /*
+ * We have no freeze plans to execute, so there's no added cost from
+ * following the freeze path. That's why it was chosen. This is
+ * important in the case where the page only contains totally frozen
+ * tuples at this point (perhaps only following pruning). Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here (note that the "no freeze"
+ * path never sets pages all-frozen).
+ *
+ * We never increment the frozen_pages instrumentation counter here,
+ * since it only counts pages with newly frozen tuples (don't confuse
+ * that with pages newly set all-frozen in VM).
+ */
+ vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
+ vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
}
else
{
--
2.40.1
v9-0008-Execute-freezing-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 9b2a47dbf1c156c1c2453a0a3ebf0b5d21e6a166 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 20:32:11 -0400
Subject: [PATCH v9 08/21] Execute freezing in heap_page_prune()
As a step toward combining the prune and freeze WAL records, execute
freezing in heap_page_prune(). The logic to determine whether or not to
execute freeze plans was moved from lazy_scan_prune() over to
heap_page_prune() with little modification.
---
src/backend/access/heap/heapam_handler.c | 2 +-
src/backend/access/heap/pruneheap.c | 189 ++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 150 +++++-------------
src/backend/storage/ipc/procarray.c | 6 +-
src/include/access/heapam.h | 52 ++++---
src/tools/pgindent/typedefs.list | 2 +-
6 files changed, 224 insertions(+), 177 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6abfe36dec7..a793c0f56ee 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1106,7 +1106,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
* We ignore unused and redirect line pointers. DEAD line pointers
* should be counted as dead, because we need vacuum to run to get rid
* of them. Note that this rule agrees with the way that
- * heap_page_prune() counts things.
+ * heap_page_prune_and_freeze() counts things.
*/
if (!ItemIdIsNormal(itemid))
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index eb09713311b..312695f806c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -17,16 +17,19 @@
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
+#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
+#include "commands/vacuum.h"
#include "access/xloginsert.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
-/* Working data for heap_page_prune and subroutines */
+/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
/* tuple visibility test, initialized for the relation */
@@ -51,6 +54,11 @@ typedef struct
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneState;
/* Local functions */
@@ -59,14 +67,15 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- PruneState *prstate, PruneResult *presult);
+ PruneState *prstate, PruneFreezeResult *presult);
+
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult);
+ PruneFreezeResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult);
+ PruneFreezeResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -146,15 +155,15 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
- PruneResult presult;
+ PruneFreezeResult presult;
/*
* For now, pass mark_unused_now as false regardless of whether or
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, false, NULL,
- &presult, PRUNE_ON_ACCESS, NULL);
+ heap_page_prune_and_freeze(relation, buffer, vistest, false, NULL,
+ &presult, PRUNE_ON_ACCESS, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -188,7 +197,12 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
/*
- * Prune and repair fragmentation in the specified page.
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * If the page can be marked all-frozen in the visibility map, we may
+ * opportunistically freeze tuples on the page if either its tuples are old
+ * enough or freezing will be cheap enough.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -201,12 +215,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* mark_unused_now indicates whether or not dead items can be set LP_UNUSED
* during pruning.
*
- * pagefrz contains both input and output parameters used if the caller is
- * interested in potentially freezing tuples on the page.
+ * pagefrz is an input parameter containing visibility cutoff information and
+ * the current relfrozenxid and relminmxids used if the caller is interested in
+ * freezing tuples on the page.
*
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
- * heap_page_prune() is responsible for initializing it.
+ * heap_page_prune_and_freeze() is responsible for initializing it.
*
* reason indicates why the pruning is performed. It is included in the WAL
* record for debugging and analysis purposes, but otherwise has no effect.
@@ -215,13 +230,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* callback.
*/
void
-heap_page_prune(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc)
+heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ GlobalVisState *vistest,
+ bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -229,6 +244,10 @@ heap_page_prune(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
+ TransactionId visibility_cutoff_xid;
+ bool do_freeze;
+ bool all_visible_except_removable;
+ int64 fpi_before = pgWalUsage.wal_fpi;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -264,9 +283,20 @@ heap_page_prune(Relation relation, Buffer buffer,
* all_visible is also set to true.
*/
presult->all_frozen = true;
- presult->all_visible = true;
- /* for recovery conflicts */
- presult->visibility_cutoff_xid = InvalidTransactionId;
+
+ /*
+ * The visibility cutoff xid is the newest xmin of live tuples on the
+ * page. In the common case, this will be set as the conflict horizon the
+ * caller can use for updating the VM. If, at the end of freezing and
+ * pruning, the page is all-frozen, there is no possibility that any
+ * running transaction on the standby does not see tuples on the page as
+ * all-visible, so the conflict horizon remains InvalidTransactionId.
+ */
+ presult->vm_conflict_horizon = visibility_cutoff_xid = InvalidTransactionId;
+
+ /* For advancing relfrozenxid and relminmxid */
+ presult->new_relfrozenxid = InvalidTransactionId;
+ presult->new_relminmxid = InvalidMultiXactId;
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -291,6 +321,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* prefetching efficiency significantly / decreases the number of cache
* misses.
*/
+ all_visible_except_removable = true;
for (offnum = maxoff;
offnum >= FirstOffsetNumber;
offnum = OffsetNumberPrev(offnum))
@@ -351,13 +382,13 @@ heap_page_prune(Relation relation, Buffer buffer,
* asynchronously. See SetHintBits for more info. Check that
* the tuple is hinted xmin-committed because of that.
*/
- if (presult->all_visible)
+ if (all_visible_except_removable)
{
TransactionId xmin;
if (!HeapTupleHeaderXminCommitted(htup))
{
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
}
@@ -373,25 +404,25 @@ heap_page_prune(Relation relation, Buffer buffer,
if (xmin != FrozenTransactionId &&
!GlobalVisTestIsRemovableXid(vistest, xmin))
{
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
}
/* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, presult->visibility_cutoff_xid) &&
+ if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
- presult->visibility_cutoff_xid = xmin;
+ visibility_cutoff_xid = xmin;
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
/* This is an expected case during concurrent vacuum */
- presult->all_visible = false;
+ all_visible_except_removable = false;
break;
default:
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
@@ -407,11 +438,11 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Tuple with storage -- consider need to freeze */
if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &presult->frozen[presult->nfrozen],
+ &prstate.frozen[presult->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- presult->frozen[presult->nfrozen++].offset = offnum;
+ prstate.frozen[presult->nfrozen++].offset = offnum;
}
/*
@@ -438,7 +469,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* pruning and keep all_visible_except_removable to permit freezing if the
* whole page will eventually become all visible after removing tuples.
*/
- presult->all_visible_except_removable = presult->all_visible;
+ presult->all_visible = all_visible_except_removable;
/* Scan the page */
for (offnum = FirstOffsetNumber;
@@ -537,6 +568,86 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ */
+ if (pagefrz)
+ do_freeze = pagefrz->freeze_required ||
+ (all_visible_except_removable && presult->all_frozen &&
+ presult->nfrozen > 0 &&
+ fpi_before != pgWalUsage.wal_fpi);
+ else
+ do_freeze = false;
+
+ if (do_freeze)
+ {
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for conflicts
+ * when the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin. This avoids false conflicts when
+ * hot_standby_feedback is in use.
+ */
+ if (all_visible_except_removable && presult->all_frozen)
+ frz_conflict_horizon = visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+
+ /* Execute all freeze plans for page as a single atomic action */
+ heap_freeze_execute_prepared(relation, buffer,
+ frz_conflict_horizon,
+ prstate.frozen, presult->nfrozen);
+ }
+ else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
+ {
+ /*
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all frozen and there
+ * will be no newly frozen tuples.
+ */
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+
+ /*
+ * For callers planning to update the visibility map, the conflict horizon
+ * for that record must be the newest xmin on the page. However, if the
+ * page is completely frozen, there can be no conflict and the
+ * vm_conflict_horizon should remain InvalidTransactionId.
+ */
+ if (!presult->all_frozen)
+ presult->vm_conflict_horizon = visibility_cutoff_xid;
+
+ if (pagefrz)
+ {
+ /*
+ * If we will freeze tuples on the page or, even if we don't freeze
+ * tuples on the page, if we will set the page all-frozen in the
+ * visibility map, we can advance relfrozenxid and relminmxid to the
+ * values in pagefrz->FreezePageRelfrozenXid and
+ * pagefrz->FreezePageRelminMxid.
+ */
+ if (presult->all_frozen || presult->nfrozen > 0)
+ {
+ presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
+ }
+ else
+ {
+ presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
+ presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
+ }
+ }
}
@@ -590,7 +701,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static int
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- PruneState *prstate, PruneResult *presult)
+ PruneState *prstate, PruneFreezeResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -855,10 +966,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
/*
* We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune caused us to
- * visit the dead successor of a redirect item before visiting the
- * redirect item. We can clean up by setting the redirect item to
- * DEAD state or LP_UNUSED if the caller indicated.
+ * item. This can happen if the loop in heap_page_prune_and_freeze()
+ * caused us to visit the dead successor of a redirect item before
+ * visiting the redirect item. We can clean up by setting the
+ * redirect item to DEAD state or LP_UNUSED if the caller indicated.
*/
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
@@ -898,7 +1009,7 @@ heap_prune_record_redirect(PruneState *prstate,
/* Record line pointer to be marked dead */
static void
heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult)
+ PruneFreezeResult *presult)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
@@ -921,7 +1032,7 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
*/
static void
heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneResult *presult)
+ PruneFreezeResult *presult)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f474e661428..8beef4093ae 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,12 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* as an upper bound on the XIDs stored in the pages we'll actually scan
* (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
*
- * Next acquire vistest, a related cutoff that's used in heap_page_prune.
- * We expect vistest will always make heap_page_prune remove any deleted
- * tuple whose xmax is < OldestXmin. lazy_scan_prune must never become
- * confused about whether a tuple should be frozen or removed. (In the
- * future we might want to teach lazy_scan_prune to recompute vistest from
- * time to time, to increase the number of dead tuples it can prune away.)
+ * Next acquire vistest, a related cutoff that's used in
+ * heap_page_prune_and_freeze(). We expect vistest will always make
+ * heap_page_prune_and_freeze() remove any deleted tuple whose xmax is <
+ * OldestXmin. (In the future we might want to teach lazy_scan_prune to
+ * recompute vistest from time to time, to increase the number of dead
+ * tuples it can prune away.)
*/
vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
@@ -1378,21 +1378,21 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
- * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
- * whether or not a tuple should be considered DEAD. This happened when an
- * inserting transaction concurrently aborted (after our heap_page_prune()
- * call, before our HeapTupleSatisfiesVacuum() call). There was rather a lot
- * of complexity just so we could deal with tuples that were DEAD to VACUUM,
- * but nevertheless were left with storage after pruning.
+ * Prior to PostgreSQL 14 there were very rare cases where
+ * heap_page_prune_and_freeze() was allowed to disagree with our
+ * HeapTupleSatisfiesVacuum() call about whether or not a tuple should be
+ * considered DEAD. This happened when an inserting transaction concurrently
+ * aborted (after our heap_page_prune_and_freeze() call, before our
+ * HeapTupleSatisfiesVacuum() call). There was rather a lot of complexity just
+ * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
+ * left with storage after pruning.
*
* As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune()'s visibility check. Without the second call to
- * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
- * disagreement. We'll just handle such tuples as if they had become fully dead
- * right after this operation completes instead of in the middle of it. Note that
- * any tuple that becomes dead after the call to heap_page_prune() can't need to
- * be frozen, because it was visible to another session when vacuum started.
+ * result of heap_page_prune_and_freeze()'s visibility check. Without the
+ * second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
+ * there can be no disagreement. We'll just handle such tuples as if they had
+ * become fully dead right after this operation completes instead of in the
+ * middle of it.
*
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
@@ -1415,26 +1415,24 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
- PruneResult presult;
+ PruneFreezeResult presult;
int lpdead_items,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
bool hastup = false;
- bool do_freeze;
- int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
/*
* maxoff might be reduced following line pointer array truncation in
- * heap_page_prune. That's safe for us to ignore, since the reclaimed
- * space will continue to look like LP_UNUSED items below.
+ * heap_page_prune_and_freeze(). That's safe for us to ignore, since the
+ * reclaimed space will continue to look like LP_UNUSED items below.
*/
maxoff = PageGetMaxOffsetNumber(page);
- /* Initialize (or reset) page-level state */
+ /* Initialize pagefrz */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
@@ -1446,7 +1444,7 @@ lazy_scan_prune(LVRelState *vacrel,
recently_dead_tuples = 0;
/*
- * Prune all HOT-update chains in this page.
+ * Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
* in presult.ndeleted. It should not be confused with lpdead_items;
@@ -1457,8 +1455,8 @@ lazy_scan_prune(LVRelState *vacrel,
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
* false otherwise.
*/
- heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0, &pagefrz,
- &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
+ heap_page_prune_and_freeze(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+ &pagefrz, &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and update the variables set
@@ -1571,86 +1569,20 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = InvalidOffsetNumber;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- do_freeze = pagefrz.freeze_required ||
- (presult.all_visible_except_removable && presult.all_frozen &&
- presult.nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
+ Assert(MultiXactIdIsValid(presult.new_relminmxid));
+ vacrel->NewRelfrozenXid = presult.new_relfrozenxid;
+ Assert(TransactionIdIsValid(presult.new_relfrozenxid));
+ vacrel->NewRelminMxid = presult.new_relminmxid;
- if (do_freeze)
+ if (presult.nfrozen > 0)
{
- TransactionId snapshotConflictHorizon;
-
/*
- * We're freezing the page. Our final NewRelfrozenXid doesn't need to
- * be affected by the XIDs that are just about to be frozen anyway.
+ * We never increment the frozen_pages instrumentation counter when
+ * nfrozen == 0, since it only counts pages with newly frozen tuples
+ * (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
-
vacrel->frozen_pages++;
- /*
- * We can use frz_conflict_horizon as our cutoff for conflicts when
- * the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin.
- */
- if (presult.all_visible_except_removable && presult.all_frozen)
- snapshotConflictHorizon = presult.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = pagefrz.cutoffs->OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
-
- /* Using same cutoff when setting VM is now unnecessary */
- if (presult.all_visible_except_removable && presult.all_frozen)
- presult.visibility_cutoff_xid = InvalidTransactionId;
-
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
-
- }
- else if (presult.all_frozen && presult.nfrozen == 0)
- {
- /* Page should be all visible except to-be-removed tuples */
- Assert(presult.all_visible_except_removable);
-
- /*
- * We have no freeze plans to execute, so there's no added cost from
- * following the freeze path. That's why it was chosen. This is
- * important in the case where the page only contains totally frozen
- * tuples at this point (perhaps only following pruning). Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here (note that the "no freeze"
- * path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter here,
- * since it only counts pages with newly frozen tuples (don't confuse
- * that with pages newly set all-frozen in VM).
- */
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
- }
- else
- {
- /*
- * Page requires "no freeze" processing. It might be set all-visible
- * in the visibility map, but it can never be set all-frozen.
- */
- vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- presult.all_frozen = false;
- presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1676,7 +1608,7 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.visibility_cutoff_xid);
+ debug_cutoff == presult.vm_conflict_horizon);
}
#endif
@@ -1730,7 +1662,7 @@ lazy_scan_prune(LVRelState *vacrel,
if (presult.all_frozen)
{
- Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1750,7 +1682,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, presult.visibility_cutoff_xid,
+ vmbuffer, presult.vm_conflict_horizon,
flags);
}
@@ -1815,11 +1747,11 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Set the page all-frozen (and all-visible) in the VM.
*
- * We can pass InvalidTransactionId as our visibility_cutoff_xid,
- * since a snapshotConflictHorizon sufficient to make everything safe
- * for REDO was logged when the page's tuples were frozen.
+ * We can pass InvalidTransactionId as our vm_conflict_horizon, since
+ * a snapshotConflictHorizon sufficient to make everything safe for
+ * REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index b3cd248fb64..88a6d504dff 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1715,9 +1715,9 @@ TransactionIdIsActive(TransactionId xid)
* Note: the approximate horizons (see definition of GlobalVisState) are
* updated by the computations done here. That's currently required for
* correctness and a small optimization. Without doing so it's possible that
- * heap vacuum's call to heap_page_prune() uses a more conservative horizon
- * than later when deciding which tuples can be removed - which the code
- * doesn't expect (breaking HOT).
+ * heap vacuum's call to heap_page_prune_and_freeze() uses a more conservative
+ * horizon than later when deciding which tuples can be removed - which the
+ * code doesn't expect (breaking HOT).
*/
static void
ComputeXidHorizons(ComputeXidHorizonsResult *h)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9d047621ea5..de11c166575 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -195,13 +195,13 @@ typedef struct HeapPageFreeze
/*
* Per-page state returned from pruning
*/
-typedef struct PruneResult
+typedef struct PruneFreezeResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
/*
- * The rest of the fields in PruneResult are only guaranteed to be
+ * The rest of the fields in PruneFreezeResult are only guaranteed to be
* initialized if heap_page_prune is passed PruneReason VACUUM_SCAN.
*/
@@ -212,23 +212,22 @@ typedef struct PruneResult
*/
bool all_visible;
- /*
- * Whether or not the page is all-visible except for tuples which will be
- * removed during vacuum's second pass. This is used by VACUUM to
- * determine whether or not to consider opportunistically freezing the
- * page.
- */
- bool all_visible_except_removable;
-
/* Whether or not the page can be set all-frozen in the VM */
bool all_frozen;
- TransactionId visibility_cutoff_xid; /* Newest xmin on the page */
+
+ /*
+ * If the page is all-visible and not all-frozen this is the oldest xid
+ * that can see the page as all-visible. It is to be used as the snapshot
+ * conflict horizon when emitting a XLOG_HEAP2_VISIBLE record.
+ */
+ TransactionId vm_conflict_horizon;
/*
* Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
*
* This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
* 1. Otherwise every access would need to subtract 1.
@@ -242,9 +241,14 @@ typedef struct PruneResult
* One entry for every tuple that we may freeze.
*/
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
-} PruneResult;
+ /* New value of relfrozenxid found by heap_page_prune_and_freeze() */
+ TransactionId new_relfrozenxid;
+
+ /* New value of relminmxid found by heap_page_prune_and_freeze() */
+ MultiXactId new_relminmxid;
+} PruneFreezeResult;
-/* 'reason' codes for heap_page_prune() */
+/* 'reason' codes for heap_page_prune_and_freeze() */
typedef enum
{
PRUNE_ON_ACCESS, /* on-access pruning */
@@ -254,7 +258,7 @@ typedef enum
/*
* Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneResult.htsv for details. This helper function is meant to
+ * of int8. See PruneFreezeResult.htsv for details. This helper function is meant to
* guard against examining visibility status array members which have not yet
* been computed.
*/
@@ -361,13 +365,13 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc);
+extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ struct GlobalVisState *vistest,
+ bool mark_unused_now,
+ HeapPageFreeze *pagefrz,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cfa9d5aaeac..5737bc5b945 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2191,8 +2191,8 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
+PruneFreezeResult
PruneReason
-PruneResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.40.1
v9-0009-Make-opp-freeze-heuristic-compatible-with-prune-f.patchtext/x-diff; charset=us-asciiDownload
From 15eaa7ff3d442580043de5270192fd83892ff142 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 20:48:11 -0400
Subject: [PATCH v9 09/21] Make opp freeze heuristic compatible with
prune+freeze record
Once the prune and freeze records are combined, we will no longer be
able to use a test of whether or not pruning emitted an FPI to decide
whether or not to opportunistically freeze a freezable page.
While this heuristic should be improved, for now, approximate the
previous logic by keeping track of whether or not a hint bit FPI was
emitted during visibility checks (when checksums are on) and combine
that with checking XLogCheckBufferNeedsBackup(). If we just finished
deciding whether or not to prune and the current buffer seems to need an
FPI after modification, it is likely that pruning would have emitted an
FPI.
---
src/backend/access/heap/pruneheap.c | 57 +++++++++++++++++++++--------
1 file changed, 42 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 312695f806c..f0decff35dc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -247,6 +247,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId visibility_cutoff_xid;
bool do_freeze;
bool all_visible_except_removable;
+ bool do_prune;
+ bool whole_page_freezable;
+ bool hint_bit_fpi;
+ bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
/*
@@ -456,6 +460,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
}
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted. Then reset fpi_before for no prune case.
+ */
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ fpi_before = pgWalUsage.wal_fpi;
+
/*
* For vacuum, if the whole page will become frozen, we consider
* opportunistically freezing tuples. Dead tuples which will be removed by
@@ -500,11 +511,41 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = InvalidOffsetNumber;
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
+ /*
+ * Only incur overhead of checking if we will do an FPI if we might use
+ * the information.
+ */
+ if (do_prune && pagefrz)
+ prune_fpi = XLogCheckBufferNeedsBackup(buffer);
+
+ /* Is the whole page freezable? And is there something to freeze */
+ whole_page_freezable = all_visible_except_removable &&
+ presult->all_frozen;
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and prune
+ * records are combined, this heuristic couldn't be used anymore. The
+ * opportunistic freeze heuristic must be improved; however, for now, try
+ * to approximate it.
+ */
+ do_freeze = pagefrz &&
+ (pagefrz->freeze_required ||
+ (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
/* Have we found any prunable items? */
- if (prstate.nredirected > 0 || prstate.ndead > 0 || prstate.nunused > 0)
+ if (do_prune)
{
/*
* Apply the planned item changes, then repair page fragmentation, and
@@ -569,20 +610,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Record number of newly-set-LP_DEAD items for caller */
presult->nnewlpdead = prstate.ndead;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- if (pagefrz)
- do_freeze = pagefrz->freeze_required ||
- (all_visible_except_removable && presult->all_frozen &&
- presult->nfrozen > 0 &&
- fpi_before != pgWalUsage.wal_fpi);
- else
- do_freeze = false;
-
if (do_freeze)
{
TransactionId frz_conflict_horizon = InvalidTransactionId;
--
2.40.1
v9-0010-Separate-tuple-pre-freeze-checks-and-invoke-earli.patchtext/x-diff; charset=us-asciiDownload
From c801ed5f85b49323f7bee132fabea6f7f9638745 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 25 Mar 2024 20:54:37 -0400
Subject: [PATCH v9 10/21] Separate tuple pre freeze checks and invoke earlier
When combining the prune and freeze records their critical sections will
have to be combined. heap_freeze_execute_prepared() does a set of pre
freeze validations before starting its critical section. Move these
validations into a helper function, heap_pre_freeze_checks(), and invoke
it in heap_page_prune() before the pruning critical section.
---
src/backend/access/heap/heapam.c | 58 ++++++++++++++++-------------
src/backend/access/heap/pruneheap.c | 41 +++++++++++---------
src/include/access/heapam.h | 3 ++
3 files changed, 59 insertions(+), 43 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index bb856690234..b3119de2aa6 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6762,35 +6762,19 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
- */
+* Perform xmin/xmax XID status sanity checks before calling
+* heap_freeze_execute_prepared().
+*
+* heap_prepare_freeze_tuple doesn't perform these checks directly because
+* pg_xact lookups are relatively expensive. They shouldn't be repeated
+* by successive VACUUMs that each decide against freezing the same page.
+*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- /*
- * Perform xmin/xmax XID status sanity checks before critical section.
- *
- * heap_prepare_freeze_tuple doesn't perform these checks directly because
- * pg_xact lookups are relatively expensive. They shouldn't be repeated
- * by successive VACUUMs that each decide against freezing the same page.
- */
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6829,6 +6813,30 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xmax)));
}
}
+}
+
+/*
+ * heap_freeze_execute_prepared
+ *
+ * Executes freezing of one or more heap tuples on a page on behalf of caller.
+ * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
+ * Caller must set 'offset' in each plan for us. Note that we destructively
+ * sort caller's tuples array in-place, so caller had better be done with it.
+ *
+ * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
+ * later on without any risk of unsafe pg_xact lookups, even following a hard
+ * crash (or when querying from a standby). We represent freezing by setting
+ * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
+ * See section on buffer access rules in src/backend/storage/buffer/README.
+ */
+void
+heap_freeze_execute_prepared(Relation rel, Buffer buffer,
+ TransactionId snapshotConflictHorizon,
+ HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
+
+ Assert(ntuples > 0);
START_CRIT_SECTION();
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f0decff35dc..13db348b2c1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -245,6 +245,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
PruneState prstate;
HeapTupleData tup;
TransactionId visibility_cutoff_xid;
+ TransactionId frz_conflict_horizon;
bool do_freeze;
bool all_visible_except_removable;
bool do_prune;
@@ -297,6 +298,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all-visible, so the conflict horizon remains InvalidTransactionId.
*/
presult->vm_conflict_horizon = visibility_cutoff_xid = InvalidTransactionId;
+ frz_conflict_horizon = InvalidTransactionId;
/* For advancing relfrozenxid and relminmxid */
presult->new_relfrozenxid = InvalidTransactionId;
@@ -541,6 +543,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
(pagefrz->freeze_required ||
(whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+ if (do_freeze)
+ {
+ heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
+
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for conflicts
+ * when the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise we generate a conservative cutoff by
+ * stepping back from OldestXmin. This avoids false conflicts when
+ * hot_standby_feedback is in use.
+ */
+ if (all_visible_except_removable && presult->all_frozen)
+ frz_conflict_horizon = visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+ }
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -612,24 +635,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- TransactionId frz_conflict_horizon = InvalidTransactionId;
-
- /*
- * We can use the visibility_cutoff_xid as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin. This avoids false conflicts when
- * hot_standby_feedback is in use.
- */
- if (all_visible_except_removable && presult->all_frozen)
- frz_conflict_horizon = visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
-
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(relation, buffer,
frz_conflict_horizon,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index de11c166575..cc3b3346bc4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -342,6 +342,9 @@ extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
+
+extern void heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
TransactionId snapshotConflictHorizon,
HeapTupleFreeze *tuples, int ntuples);
--
2.40.1
v9-0011-Remove-heap_freeze_execute_prepared.patchtext/x-diff; charset=us-asciiDownload
From e00ead3b3db5fcdc8274a61878ae994df351d2de Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 09:10:14 -0400
Subject: [PATCH v9 11/21] Remove heap_freeze_execute_prepared()
In order to merge freeze and prune records, the execution of tuple
freezing and the WAL logging of the changes to the page must be
separated so that the WAL logging can be combined with prune WAL
logging. This commit makes a helper for the tuple freezing and then
inlines the contents of heap_freeze_execute_prepared() where it is
called in heap_page_prune().
---
src/backend/access/heap/heapam.c | 49 +++++++----------------------
src/backend/access/heap/pruneheap.c | 22 ++++++++++---
src/include/access/heapam.h | 28 +++++++++--------
3 files changed, 44 insertions(+), 55 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b3119de2aa6..41c1c7d286f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6445,9 +6445,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* XIDs or MultiXactIds that will need to be processed by a future VACUUM.
*
* VACUUM caller must assemble HeapTupleFreeze freeze plan entries for every
- * tuple that we returned true for, and call heap_freeze_execute_prepared to
- * execute freezing. Caller must initialize pagefrz fields for page as a
- * whole before first call here for each heap page.
+ * tuple that we returned true for, and then execute freezing. Caller must
+ * initialize pagefrz fields for page as a whole before first call here for
+ * each heap page.
*
* VACUUM caller decides on whether or not to freeze the page as a whole.
* We'll often prepare freeze plans for a page that caller just discards.
@@ -6762,8 +6762,8 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
-* Perform xmin/xmax XID status sanity checks before calling
-* heap_freeze_execute_prepared().
+* Perform xmin/xmax XID status sanity checks before actually executing freeze
+* plans.
*
* heap_prepare_freeze_tuple doesn't perform these checks directly because
* pg_xact lookups are relatively expensive. They shouldn't be repeated
@@ -6816,30 +6816,17 @@ heap_pre_freeze_checks(Buffer buffer,
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
+ * Helper which executes freezing of one or more heap tuples on a page on
+ * behalf of caller. Caller passes an array of tuple plans from
+ * heap_prepare_freeze_tuple. Caller must set 'offset' in each plan for us.
+ * Must be called in a critical section that also marks the buffer dirty and,
+ * if needed, emits WAL.
*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- START_CRIT_SECTION();
-
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6851,20 +6838,6 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
}
MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(rel))
- {
- log_heap_prune_and_freeze(rel, buffer, snapshotConflictHorizon,
- false, /* no cleanup lock required */
- PRUNE_VACUUM_SCAN,
- tuples, ntuples,
- NULL, 0, /* redirected */
- NULL, 0, /* dead */
- NULL, 0); /* unused */
- }
-
- END_CRIT_SECTION();
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 13db348b2c1..5d8c881c2fc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -635,10 +635,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
{
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(relation, buffer,
- frz_conflict_horizon,
- prstate.frozen, presult->nfrozen);
+ START_CRIT_SECTION();
+
+ Assert(presult->nfrozen > 0);
+
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+
+ MarkBufferDirty(buffer);
+
+ /* Now WAL-log freezing if necessary */
+ if (RelationNeedsWAL(relation))
+ log_heap_prune_and_freeze(relation, buffer,
+ frz_conflict_horizon, false, reason,
+ prstate.frozen, presult->nfrozen,
+ NULL, 0, /* redirected */
+ NULL, 0, /* dead */
+ NULL, 0); /* unused */
+
+ END_CRIT_SECTION();
}
else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
{
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index cc3b3346bc4..897f3bc50c9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -14,6 +14,7 @@
#ifndef HEAPAM_H
#define HEAPAM_H
+#include "access/heapam_xlog.h"
#include "access/relation.h" /* for backward compatibility */
#include "access/relscan.h"
#include "access/sdir.h"
@@ -101,8 +102,8 @@ typedef enum
} HTSV_Result;
/*
- * heap_prepare_freeze_tuple may request that heap_freeze_execute_prepared
- * check any tuple's to-be-frozen xmin and/or xmax status using pg_xact
+ * heap_prepare_freeze_tuple may request that any tuple's to-be-frozen xmin
+ * and/or xmax status is checked using pg_xact during freezing execution.
*/
#define HEAP_FREEZE_CHECK_XMIN_COMMITTED 0x01
#define HEAP_FREEZE_CHECK_XMAX_ABORTED 0x02
@@ -154,14 +155,14 @@ typedef struct HeapPageFreeze
/*
* "Freeze" NewRelfrozenXid/NewRelminMxid trackers.
*
- * Trackers used when heap_freeze_execute_prepared freezes, or when there
- * are zero freeze plans for a page. It is always valid for vacuumlazy.c
- * to freeze any page, by definition. This even includes pages that have
- * no tuples with storage to consider in the first place. That way the
- * 'totally_frozen' results from heap_prepare_freeze_tuple can always be
- * used in the same way, even when no freeze plans need to be executed to
- * "freeze the page". Only the "freeze" path needs to consider the need
- * to set pages all-frozen in the visibility map under this scheme.
+ * Trackers used when tuples will be frozen, or when there are zero freeze
+ * plans for a page. It is always valid for vacuumlazy.c to freeze any
+ * page, by definition. This even includes pages that have no tuples with
+ * storage to consider in the first place. That way the 'totally_frozen'
+ * results from heap_prepare_freeze_tuple can always be used in the same
+ * way, even when no freeze plans need to be executed to "freeze the
+ * page". Only the "freeze" path needs to consider the need to set pages
+ * all-frozen in the visibility map under this scheme.
*
* When we freeze a page, we generally freeze all XIDs < OldestXmin, only
* leaving behind XIDs that are ineligible for freezing, if any. And so
@@ -345,12 +346,13 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
extern void heap_pre_freeze_checks(Buffer buffer,
HeapTupleFreeze *tuples, int ntuples);
-extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples);
+
+extern void heap_freeze_prepared_tuples(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern bool heap_freeze_tuple(HeapTupleHeader tuple,
TransactionId relfrozenxid, TransactionId relminmxid,
TransactionId FreezeLimit, TransactionId MultiXactCutoff);
+
extern bool heap_tuple_should_freeze(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
TransactionId *NoFreezePageRelfrozenXid,
--
2.40.1
v9-0012-Merge-prune-and-freeze-records.patchtext/x-diff; charset=us-asciiDownload
From 3172c3319e931883231123761b29d3d2b3036e51 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 09:37:46 -0400
Subject: [PATCH v9 12/21] Merge prune and freeze records
When both pruning and freezing is done, this means a single, combined
WAL record is emitted for both operations. This will reduce the number
of WAL records emitted.
When there are only tuples to freeze present, we can avoid taking a full
cleanup lock when replaying the record.
---
src/backend/access/heap/heapam.c | 2 -
src/backend/access/heap/pruneheap.c | 215 +++++++++++++++-------------
2 files changed, 114 insertions(+), 103 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 41c1c7d286f..aefc0be0dd3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6836,8 +6836,6 @@ heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
htup = (HeapTupleHeader) PageGetItem(page, itemid);
heap_execute_freeze_tuple(htup, frz);
}
-
- MarkBufferDirty(buffer);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5d8c881c2fc..6085fd1a8f9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -249,9 +249,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool all_visible_except_removable;
bool do_prune;
- bool whole_page_freezable;
+ bool do_hint;
bool hint_bit_fpi;
- bool prune_fpi = false;
int64 fpi_before = pgWalUsage.wal_fpi;
/*
@@ -464,10 +463,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
- * an FPI to be emitted. Then reset fpi_before for no prune case.
+ * an FPI to be emitted.
*/
hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
- fpi_before = pgWalUsage.wal_fpi;
/*
* For vacuum, if the whole page will become frozen, we consider
@@ -517,16 +515,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /* Record number of newly-set-LP_DEAD items for caller */
+ presult->nnewlpdead = prstate.ndead;
+
/*
- * Only incur overhead of checking if we will do an FPI if we might use
- * the information.
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
*/
- if (do_prune && pagefrz)
- prune_fpi = XLogCheckBufferNeedsBackup(buffer);
-
- /* Is the whole page freezable? And is there something to freeze */
- whole_page_freezable = all_visible_except_removable &&
- presult->all_frozen;
+ do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
/*
* Freeze the page when heap_prepare_freeze_tuple indicates that at least
@@ -539,46 +537,57 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* opportunistic freeze heuristic must be improved; however, for now, try
* to approximate it.
*/
- do_freeze = pagefrz &&
- (pagefrz->freeze_required ||
- (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
- if (do_freeze)
+ do_freeze = false;
+ if (pagefrz)
{
- heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
+ /* Is the whole page freezable? And is there something to freeze? */
+ bool whole_page_freezable = all_visible_except_removable &&
+ presult->all_frozen;
- /*
- * We can use the visibility_cutoff_xid as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM once
- * we're done with it. Otherwise we generate a conservative cutoff by
- * stepping back from OldestXmin. This avoids false conflicts when
- * hot_standby_feedback is in use.
- */
- if (all_visible_except_removable && presult->all_frozen)
- frz_conflict_horizon = visibility_cutoff_xid;
- else
+ if (pagefrz->freeze_required)
+ do_freeze = true;
+ else if (whole_page_freezable && presult->nfrozen > 0)
{
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
+ /*
+ * Freezing would make the page all-frozen. In this case, we will
+ * freeze if we have already emitted an FPI or will do so anyway.
+ * Be sure only to incur the overhead of checking if we will do an
+ * FPI if we may use that information.
+ */
+ if (hint_bit_fpi ||
+ ((do_prune || do_hint) && XLogCheckBufferNeedsBackup(buffer)))
+ {
+ do_freeze = true;
+ }
}
}
- /* Any error while applying the changes is critical */
- START_CRIT_SECTION();
+ /*
+ * Validate the tuples we are considering freezing. We do this even if
+ * pruning and hint bit setting have not emitted an FPI so far because we
+ * still may emit an FPI while setting the page hint bit later. But we
+ * want to avoid doing the pre-freeze checks in a critical section.
+ */
+ if (do_freeze)
+ heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
- /* Have we found any prunable items? */
- if (do_prune)
+ if (!do_freeze && (!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
{
/*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all-frozen and there
+ * will be no newly frozen tuples.
*/
- heap_page_prune_execute(buffer, false,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
+ presult->all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+
+ /* Any error while applying the changes is critical */
+ START_CRIT_SECTION();
+ if (do_hint)
+ {
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
* XID of any soon-prunable tuple.
@@ -586,12 +595,52 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
/*
- * Also clear the "page is full" flag, since there's no point in
- * repeating the prune/defrag process until something else happens to
- * the page.
+ * Clear the "page is full" flag if it is set since there's no point
+ * in repeating the prune/defrag process until something else happens
+ * to the page.
*/
PageClearFull(page);
+ /*
+ * We only needed to update pd_prune_xid and clear the page-is-full
+ * hint bit, this is a non-WAL-logged hint. If we will also freeze or
+ * prune the page, we will mark the buffer dirty below.
+ */
+ if (!do_freeze && !do_prune)
+ MarkBufferDirtyHint(buffer, true);
+ }
+
+ if (do_prune || do_freeze)
+ {
+ /* Apply the planned item changes, then repair page fragmentation. */
+ if (do_prune)
+ {
+ heap_page_prune_execute(buffer, false,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for
+ * conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin. This
+ * avoids false conflicts when hot_standby_feedback is in use.
+ */
+ if (all_visible_except_removable && presult->all_frozen)
+ frz_conflict_horizon = visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -599,72 +648,35 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
+ /*
+ * The snapshotConflictHorizon for the whole record should be the
+ * most conservative of all the horizons calculated for any of the
+ * possible modifications. If this record will prune tuples, any
+ * transactions on the standby older than the youngest xmax of the
+ * most recently removed tuple this record will prune will
+ * conflict. If this record will freeze tuples, any transactions
+ * on the standby with xids older than the youngest tuple this
+ * record will freeze will conflict.
+ */
+ TransactionId conflict_xid;
+
+ if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
+ conflict_xid = frz_conflict_horizon;
+ else
+ conflict_xid = prstate.latest_xid_removed;
+
log_heap_prune_and_freeze(relation, buffer,
- prstate.latest_xid_removed,
+ conflict_xid,
true, reason,
- NULL, 0,
+ prstate.frozen, presult->nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
}
}
- else
- {
- /*
- * If we didn't prune anything, but have found a new value for the
- * pd_prune_xid field, update it and mark the buffer dirty. This is
- * treated as a non-WAL-logged hint.
- *
- * Also clear the "page is full" flag if it is set, since there's no
- * point in repeating the prune/defrag process until something else
- * happens to the page.
- */
- if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page))
- {
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
- PageClearFull(page);
- MarkBufferDirtyHint(buffer, true);
- }
- }
END_CRIT_SECTION();
- /* Record number of newly-set-LP_DEAD items for caller */
- presult->nnewlpdead = prstate.ndead;
-
- if (do_freeze)
- {
- START_CRIT_SECTION();
-
- Assert(presult->nfrozen > 0);
-
- heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
-
- MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(relation))
- log_heap_prune_and_freeze(relation, buffer,
- frz_conflict_horizon, false, reason,
- prstate.frozen, presult->nfrozen,
- NULL, 0, /* redirected */
- NULL, 0, /* dead */
- NULL, 0); /* unused */
-
- END_CRIT_SECTION();
- }
- else if (!pagefrz || !presult->all_frozen || presult->nfrozen > 0)
- {
- /*
- * If we will neither freeze tuples on the page nor set the page all
- * frozen in the visibility map, the page is not all frozen and there
- * will be no newly frozen tuples.
- */
- presult->all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumentation */
- }
-
/*
* For callers planning to update the visibility map, the conflict horizon
* for that record must be the newest xmin on the page. However, if the
@@ -681,9 +693,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* tuples on the page, if we will set the page all-frozen in the
* visibility map, we can advance relfrozenxid and relminmxid to the
* values in pagefrz->FreezePageRelfrozenXid and
- * pagefrz->FreezePageRelminMxid.
+ * pagefrz->FreezePageRelminMxid. MFIXME: which one should be pick if
+ * presult->nfrozen == 0 and presult->all_frozen = True.
*/
- if (presult->all_frozen || presult->nfrozen > 0)
+ if (presult->nfrozen > 0)
{
presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
--
2.40.1
v9-0013-Set-hastup-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 83210d52ee641c58152cef552a822f969e11c079 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 09:56:02 -0400
Subject: [PATCH v9 13/21] Set hastup in heap_page_prune
lazy_scan_prune() loops through the line pointers and tuple visibility
information for each tuple on a page, setting hastup to true if there
are any LP_REDIRECT line pointers or tuples with storage which will not
be removed. We want to remove this extra loop from lazy_scan_prune(),
and we know about non-removable tuples during heap_page_prune() anyway.
Set hastup when recording LP_REDIRECT line pointers in
heap_prune_chain() and when LP_NORMAL line pointers refer to tuples
whose visibility status is not HEAPTUPLE_DEAD.
---
src/backend/access/heap/pruneheap.c | 64 ++++++++++++++++++----------
src/backend/access/heap/vacuumlazy.c | 24 +----------
src/include/access/heapam.h | 3 ++
3 files changed, 46 insertions(+), 45 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6085fd1a8f9..4814ff576c1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -71,7 +71,8 @@ static int heap_prune_chain(Buffer buffer,
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum);
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneFreezeResult *presult);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
PruneFreezeResult *presult);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
@@ -279,6 +280,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->nnewlpdead = 0;
presult->nfrozen = 0;
+ presult->hastup = false;
+
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
* freezing tuples. Keep track of whether or not the page is all_visible
@@ -434,30 +437,42 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
}
- /*
- * Consider freezing any normal tuples which will not be removed
- */
- if (presult->htsv[offnum] != HEAPTUPLE_DEAD && pagefrz)
+ if (presult->htsv[offnum] != HEAPTUPLE_DEAD)
{
- bool totally_frozen;
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the
+ * soft assumption that any LP_DEAD items encountered here will
+ * become LP_UNUSED later on, before count_nondeletable_pages is
+ * reached. If we don't make this assumption then rel truncation
+ * will only happen every other VACUUM, at most. Besides, VACUUM
+ * must treat hastup/nonempty_pages as provisional no matter how
+ * LP_DEAD items are handled (handled here, or handled later on).
+ */
+ presult->hastup = true;
- /* Tuple with storage -- consider need to freeze */
- if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &prstate.frozen[presult->nfrozen],
- &totally_frozen)))
+ /* Consider freezing any normal tuples which will not be removed */
+ if (pagefrz)
{
- /* Save prepared freeze plan for later */
- prstate.frozen[presult->nfrozen++].offset = offnum;
- }
+ bool totally_frozen;
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the
- * page definitely cannot be set all-frozen in the visibility map
- * later on
- */
- if (!totally_frozen)
- presult->all_frozen = false;
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, pagefrz,
+ &prstate.frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ prstate.frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or
+ * eligible to become totally frozen (according to its freeze
+ * plan), then the page definitely cannot be set all-frozen in
+ * the visibility map later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+ }
}
}
@@ -1019,7 +1034,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (i >= nchain)
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
- heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
+ heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], presult);
}
else if (nchain < 2 && ItemIdIsRedirected(rootlp))
{
@@ -1053,7 +1068,8 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
/* Record line pointer to be redirected */
static void
heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum)
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ PruneFreezeResult *presult)
{
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
@@ -1063,6 +1079,8 @@ heap_prune_record_redirect(PruneState *prstate,
prstate->marked[offnum] = true;
Assert(!prstate->marked[rdoffnum]);
prstate->marked[rdoffnum] = true;
+
+ presult->hastup = true;
}
/* Record line pointer to be marked dead */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8beef4093ae..68258d083ab 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1420,7 +1420,6 @@ lazy_scan_prune(LVRelState *vacrel,
live_tuples,
recently_dead_tuples;
HeapPageFreeze pagefrz;
- bool hastup = false;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1473,28 +1472,12 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- if (!ItemIdIsUsed(itemid))
- continue;
-
/* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid))
- {
- /* page makes rel truncation unsafe */
- hastup = true;
+ if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
continue;
- }
if (ItemIdIsDead(itemid))
{
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- */
deadoffsets[lpdead_items++] = offnum;
continue;
}
@@ -1562,9 +1545,6 @@ lazy_scan_prune(LVRelState *vacrel,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
-
- hastup = true; /* page makes rel truncation unsafe */
-
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1643,7 +1623,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->recently_dead_tuples += recently_dead_tuples;
/* Can't truncate this page */
- if (hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 897f3bc50c9..71c59793da7 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -216,6 +216,9 @@ typedef struct PruneFreezeResult
/* Whether or not the page can be set all-frozen in the VM */
bool all_frozen;
+ /* Whether or not the page makes rel truncation unsafe */
+ bool hastup;
+
/*
* If the page is all-visible and not all-frozen this is the oldest xid
* that can see the page as all-visible. It is to be used as the snapshot
--
2.40.1
v9-0014-Count-tuples-for-vacuum-logging-in-heap_page_prun.patchtext/x-diff; charset=us-asciiDownload
From d230bde906511e7f33288aae66795bc4a4d1f256 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 10:04:38 -0400
Subject: [PATCH v9 14/21] Count tuples for vacuum logging in heap_page_prune
lazy_scan_prune() loops through all of the tuple visibility information
that was recorded in heap_page_prune() and then counts live and recently
dead tuples. That information is available in heap_page_prune(), so just
record it there. Add live and recently dead tuple counters to the
PruneResult. Doing this counting in heap_page_prune() eliminates the
need for saving the tuple visibility status information in the
PruneResult. Instead, save it in the PruneState where it can be
referenced by heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 98 ++++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 93 +-------------------------
src/include/access/heapam.h | 36 ++--------
3 files changed, 97 insertions(+), 130 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4814ff576c1..fde5f26bb5a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -55,6 +55,18 @@ typedef struct
*/
bool marked[MaxHeapTuplesPerPage + 1];
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
+
/*
* One entry for every tuple that we may freeze.
*/
@@ -69,6 +81,7 @@ static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
PruneState *prstate, PruneFreezeResult *presult);
+static inline HTSV_Result htsv_get_valid_status(int status);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
@@ -273,7 +286,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
memset(prstate.marked, 0, sizeof(prstate.marked));
/*
- * presult->htsv is not initialized here because all ntuple spots in the
+ * prstate.htsv is not initialized here because all ntuple spots in the
* array will be set either to a valid HTSV_Result value or -1.
*/
presult->ndeleted = 0;
@@ -282,6 +295,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->hastup = false;
+ presult->live_tuples = 0;
+ presult->recently_dead_tuples = 0;
+
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
* freezing tuples. Keep track of whether or not the page is all_visible
@@ -340,7 +356,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsNormal(itemid))
{
- presult->htsv[offnum] = -1;
+ prstate.htsv[offnum] = -1;
continue;
}
@@ -356,13 +372,32 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
- presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
if (reason == PRUNE_ON_ACCESS)
continue;
- switch (presult->htsv[offnum])
+ /*
+ * The criteria for counting a tuple as live in this block need to
+ * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
+ * and ANALYZE may produce wildly different reltuples values, e.g.
+ * when there are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as
+ * VACUUM can't run inside a transaction block, which makes some cases
+ * impossible (e.g. in-progress insert from the same transaction).
+ *
+ * We treat LP_DEAD items (which are the closest thing to DEAD tuples
+ * that might be seen here) differently, too: we assume that they'll
+ * become LP_UNUSED before VACUUM finishes. This difference is only
+ * superficial. VACUUM effectively agrees with ANALYZE about DEAD
+ * items, in the end. VACUUM won't remember LP_DEAD items, but only
+ * because they're not supposed to be left behind when it is done.
+ * (Cases where we bypass index vacuuming will violate this optimistic
+ * assumption, but the overall impact of that should be negligible.)
+ */
+ switch (prstate.htsv[offnum])
{
case HEAPTUPLE_DEAD:
@@ -382,6 +417,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
case HEAPTUPLE_LIVE:
+ /*
+ * Count it as live. Not only is this natural, but it's also
+ * what acquire_sample_rows() does.
+ */
+ presult->live_tuples++;
+
/*
* Is the tuple definitely visible to all transactions?
*
@@ -423,13 +464,34 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
+
+ /*
+ * If tuple is recently dead then we must not remove it from
+ * the relation. (We only remove items that are LP_DEAD from
+ * pruning.)
+ */
+ presult->recently_dead_tuples++;
all_visible_except_removable = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and
+ * we assume that will happen only after we report our
+ * results. This assumption is a bit shaky, but it is what
+ * acquire_sample_rows() does, so be consistent.
+ */
all_visible_except_removable = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
+
+ /*
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
+ */
+ presult->live_tuples++;
all_visible_except_removable = false;
break;
default:
@@ -437,7 +499,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
break;
}
- if (presult->htsv[offnum] != HEAPTUPLE_DEAD)
+ if (prstate.htsv[offnum] != HEAPTUPLE_DEAD)
{
/*
* Deliberately don't set hastup for LP_DEAD items. We make the
@@ -746,10 +808,24 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
}
+/*
+ * Pruning calculates tuple visibility once and saves the results in an array
+ * of int8. See PruneState.htsv for details. This helper function is meant to
+ * guard against examining visibility status array members which have not yet
+ * been computed.
+ */
+static inline HTSV_Result
+htsv_get_valid_status(int status)
+{
+ Assert(status >= HEAPTUPLE_DEAD &&
+ status <= HEAPTUPLE_DELETE_IN_PROGRESS);
+ return (HTSV_Result) status;
+}
+
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in presult->htsv.
+ * Tuple visibility information is provided in prstate->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -796,7 +872,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(presult->htsv[rootoffnum] != -1);
+ Assert(prstate->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -819,7 +895,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -920,7 +996,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
tupdead = recent_dead = false;
- switch (htsv_get_valid_status(presult->htsv[offnum]))
+ switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 68258d083ab..c28e786a1e0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1378,22 +1378,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where
- * heap_page_prune_and_freeze() was allowed to disagree with our
- * HeapTupleSatisfiesVacuum() call about whether or not a tuple should be
- * considered DEAD. This happened when an inserting transaction concurrently
- * aborted (after our heap_page_prune_and_freeze() call, before our
- * HeapTupleSatisfiesVacuum() call). There was rather a lot of complexity just
- * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
- * left with storage after pruning.
- *
- * As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune_and_freeze()'s visibility check. Without the
- * second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
- * there can be no disagreement. We'll just handle such tuples as if they had
- * become fully dead right after this operation completes instead of in the
- * middle of it.
- *
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
@@ -1415,10 +1399,8 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
+ int lpdead_items = 0;
PruneFreezeResult presult;
- int lpdead_items,
- live_tuples,
- recently_dead_tuples;
HeapPageFreeze pagefrz;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1438,9 +1420,6 @@ lazy_scan_prune(LVRelState *vacrel,
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.cutoffs = &vacrel->cutoffs;
- lpdead_items = 0;
- live_tuples = 0;
- recently_dead_tuples = 0;
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
@@ -1472,9 +1451,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->offnum = offnum;
itemid = PageGetItemId(page, offnum);
- /* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid))
- continue;
if (ItemIdIsDead(itemid))
{
@@ -1482,69 +1458,6 @@ lazy_scan_prune(LVRelState *vacrel,
continue;
}
- Assert(ItemIdIsNormal(itemid));
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (htsv_get_valid_status(presult.htsv[offnum]))
- {
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- live_tuples++;
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- recently_dead_tuples++;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This an expected case during concurrent vacuum. Count such
- * rows as live. As above, we assume the deleting transaction
- * will commit and update the counters after we report.
- */
- live_tuples++;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
}
vacrel->offnum = InvalidOffsetNumber;
@@ -1619,8 +1532,8 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
+ vacrel->live_tuples += presult.live_tuples;
+ vacrel->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 71c59793da7..79ec4049f12 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -203,8 +203,14 @@ typedef struct PruneFreezeResult
/*
* The rest of the fields in PruneFreezeResult are only guaranteed to be
- * initialized if heap_page_prune is passed PruneReason VACUUM_SCAN.
+ * initialized if heap_page_prune_and_freeze() is passed a PruneReason
+ * other than PRUNE_ON_ACCESS.
*/
+ int live_tuples;
+ int recently_dead_tuples;
+
+ /* Number of tuples we froze */
+ int nfrozen;
/*
* Whether or not the page is truly all-visible after pruning. If there
@@ -226,21 +232,6 @@ typedef struct PruneFreezeResult
*/
TransactionId vm_conflict_horizon;
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
- * details. This is of type int8[], instead of HTSV_Result[], so we can
- * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
- * items.
- *
- * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
- * 1. Otherwise every access would need to subtract 1.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
-
- /* Number of tuples we may freeze */
- int nfrozen;
-
/*
* One entry for every tuple that we may freeze.
*/
@@ -260,19 +251,6 @@ typedef enum
PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
} PruneReason;
-/*
- * Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneFreezeResult.htsv for details. This helper function is meant to
- * guard against examining visibility status array members which have not yet
- * been computed.
- */
-static inline HTSV_Result
-htsv_get_valid_status(int status)
-{
- Assert(status >= HEAPTUPLE_DEAD &&
- status <= HEAPTUPLE_DELETE_IN_PROGRESS);
- return (HTSV_Result) status;
-}
/* ----------------
* function prototypes for heap access method
--
2.40.1
v9-0015-Save-dead-tuple-offsets-during-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 158545b5de5cccd27a3a023875891b9c4459bd89 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 10:16:11 -0400
Subject: [PATCH v9 15/21] Save dead tuple offsets during heap_page_prune
After heap_page_prune() returned, lazy_scan_prune() looped through all
of the offsets of LP_DEAD items which it later added to
LVRelState->dead_items. Instead take care of this when marking a line
pointer or when an existing non-removable LP_DEAD item is encountered in
heap_prune_chain().
---
src/backend/access/heap/pruneheap.c | 7 ++++
src/backend/access/heap/vacuumlazy.c | 60 +++++++---------------------
src/include/access/heapam.h | 2 +
3 files changed, 23 insertions(+), 46 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fde5f26bb5a..3529ea69520 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -297,6 +297,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->live_tuples = 0;
presult->recently_dead_tuples = 0;
+ presult->lpdead_items = 0;
/*
* Caller will update the VM after pruning, collecting LP_DEAD items, and
@@ -971,7 +972,10 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);
else
+ {
presult->all_visible = false;
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
+ }
break;
}
@@ -1175,6 +1179,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
* all_visible.
*/
presult->all_visible = false;
+
+ /* Record the dead offset for vacuum */
+ presult->deadoffsets[presult->lpdead_items++] = offnum;
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c28e786a1e0..0fb5a7dd24d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1396,23 +1396,11 @@ lazy_scan_prune(LVRelState *vacrel,
bool *has_lpdead_items)
{
Relation rel = vacrel->rel;
- OffsetNumber offnum,
- maxoff;
- ItemId itemid;
- int lpdead_items = 0;
PruneFreezeResult presult;
HeapPageFreeze pagefrz;
- OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
- /*
- * maxoff might be reduced following line pointer array truncation in
- * heap_page_prune_and_freeze(). That's safe for us to ignore, since the
- * reclaimed space will continue to look like LP_UNUSED items below.
- */
- maxoff = PageGetMaxOffsetNumber(page);
-
/* Initialize pagefrz */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
@@ -1425,41 +1413,21 @@ lazy_scan_prune(LVRelState *vacrel,
* Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
- * in presult.ndeleted. It should not be confused with lpdead_items;
- * lpdead_items's final value can be thought of as the number of tuples
- * that were deleted from indexes.
+ * in presult.ndeleted. It should not be confused with
+ * presult.lpdead_items; presult.lpdead_items's final value can be thought
+ * of as the number of tuples that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED, so mark_unused_now should be true if no indexes and
* false otherwise.
+ *
+ * We will update the VM after collecting LP_DEAD items and freezing
+ * tuples. Pruning will have determined whether or not the page is
+ * all-visible.
*/
heap_page_prune_and_freeze(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
&pagefrz, &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
- /*
- * Now scan the page to collect LP_DEAD items and update the variables set
- * just above.
- */
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
- {
- /*
- * Set the offset number so that we can display it along with any
- * error that occurred while processing this tuple.
- */
- vacrel->offnum = offnum;
- itemid = PageGetItemId(page, offnum);
-
-
- if (ItemIdIsDead(itemid))
- {
- deadoffsets[lpdead_items++] = offnum;
- continue;
- }
-
- }
-
vacrel->offnum = InvalidOffsetNumber;
Assert(MultiXactIdIsValid(presult.new_relminmxid));
@@ -1492,7 +1460,7 @@ lazy_scan_prune(LVRelState *vacrel,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(lpdead_items == 0);
+ Assert(presult.lpdead_items == 0);
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
@@ -1508,7 +1476,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
- if (lpdead_items > 0)
+ if (presult.lpdead_items > 0)
{
VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
@@ -1517,9 +1485,9 @@ lazy_scan_prune(LVRelState *vacrel,
ItemPointerSetBlockNumber(&tmp, blkno);
- for (int i = 0; i < lpdead_items; i++)
+ for (int i = 0; i < presult.lpdead_items; i++)
{
- ItemPointerSetOffsetNumber(&tmp, deadoffsets[i]);
+ ItemPointerSetOffsetNumber(&tmp, presult.deadoffsets[i]);
dead_items->items[dead_items->num_items++] = tmp;
}
@@ -1531,7 +1499,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += lpdead_items;
+ vacrel->lpdead_items += presult.lpdead_items;
vacrel->live_tuples += presult.live_tuples;
vacrel->recently_dead_tuples += presult.recently_dead_tuples;
@@ -1540,7 +1508,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
- *has_lpdead_items = (lpdead_items > 0);
+ *has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
@@ -1608,7 +1576,7 @@ lazy_scan_prune(LVRelState *vacrel,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (lpdead_items > 0 && PageIsAllVisible(page))
+ else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 79ec4049f12..68b4d5b859c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -241,6 +241,8 @@ typedef struct PruneFreezeResult
/* New value of relminmxid found by heap_page_prune_and_freeze() */
MultiXactId new_relminmxid;
+ int lpdead_items; /* includes existing LP_DEAD items */
+ OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
/* 'reason' codes for heap_page_prune_and_freeze() */
--
2.40.1
v9-0016-move-live-tuple-accounting-to-heap_prune_chain.patchtext/x-diff; charset=us-asciiDownload
From 3fab81d47b89e8a3ecd7120fc4df5ab829cadf9b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 26 Mar 2024 13:54:19 -0400
Subject: [PATCH v9 16/21] move live tuple accounting to heap_prune_chain()
ci-os-only:
---
src/backend/access/heap/pruneheap.c | 636 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 38 +-
src/include/access/heapam.h | 59 ++-
3 files changed, 424 insertions(+), 309 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3529ea69520..6f039002684 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -34,8 +34,9 @@ typedef struct
{
/* tuple visibility test, initialized for the relation */
GlobalVisState *vistest;
- /* whether or not dead items can be set LP_UNUSED during pruning */
- bool mark_unused_now;
+ uint8 actions;
+ TransactionId visibility_cutoff_xid;
+ bool all_visible_except_removable;
TransactionId new_prune_xid; /* new prune hint value for page */
TransactionId latest_xid_removed;
@@ -67,10 +68,14 @@ typedef struct
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+ HeapPageFreeze pagefrz;
+
/*
- * One entry for every tuple that we may freeze.
+ * Whether or not this tuple has been counted toward vacuum stats. In
+ * heap_prune_chain(), we have to be sure that Heap Only Tuples that are
+ * not part of any chain are counted correctly.
*/
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+ bool counted[MaxHeapTuplesPerPage + 1];
} PruneState;
/* Local functions */
@@ -83,7 +88,7 @@ static int heap_prune_chain(Buffer buffer,
static inline HTSV_Result htsv_get_valid_status(int status);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
-static void heap_prune_record_redirect(PruneState *prstate,
+static void heap_prune_record_redirect(Page page, PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
PruneFreezeResult *presult);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
@@ -91,6 +96,9 @@ static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
PruneFreezeResult *presult);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
+
+static void heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate,
+ OffsetNumber offnum, PruneFreezeResult *presult);
static void page_verify_redirects(Page page);
@@ -172,12 +180,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeResult presult;
/*
- * For now, pass mark_unused_now as false regardless of whether or
- * not the relation has indexes, since we cannot safely determine
- * that during on-access pruning with the current implementation.
+ * For now, do not set PRUNE_DO_MARK_UNUSED_NOW regardless of
+ * whether or not the relation has indexes, since we cannot safely
+ * determine that during on-access pruning with the current
+ * implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, false, NULL,
- &presult, PRUNE_ON_ACCESS, NULL);
+ heap_page_prune_and_freeze(relation, buffer, 0, vistest,
+ NULL, &presult, PRUNE_ON_ACCESS, NULL, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -209,7 +218,6 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
-
/*
* Prune and repair fragmentation and potentially freeze tuples on the
* specified page.
@@ -223,16 +231,12 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * actions are the pruning actions that heap_page_prune_and_freeze() should
+ * take.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
- * mark_unused_now indicates whether or not dead items can be set LP_UNUSED
- * during pruning.
- *
- * pagefrz is an input parameter containing visibility cutoff information and
- * the current relfrozenxid and relminmxids used if the caller is interested in
- * freezing tuples on the page.
- *
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
* heap_page_prune_and_freeze() is responsible for initializing it.
@@ -242,15 +246,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*
* off_loc is the offset location required by the caller to use in error
* callback.
+ *
+ * new_relfrozen_xid and new_relmin_xid are provided by the caller if they
+ * would like the current values of those updated as part of advancing
+ * relfrozenxid/relminmxid.
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ uint8 actions,
GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
+ struct VacuumCutoffs *cutoffs,
PruneFreezeResult *presult,
PruneReason reason,
- OffsetNumber *off_loc)
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -258,15 +268,43 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
- TransactionId visibility_cutoff_xid;
TransactionId frz_conflict_horizon;
bool do_freeze;
- bool all_visible_except_removable;
bool do_prune;
bool do_hint;
bool hint_bit_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ /*
+ * pagefrz contains visibility cutoff information and the current
+ * relfrozenxid and relminmxids used if the caller is interested in
+ * freezing tuples on the page.
+ */
+ prstate.pagefrz.cutoffs = cutoffs;
+ prstate.pagefrz.freeze_required = false;
+
+ if (new_relmin_mxid)
+ {
+ prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
+ prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+ }
+ else
+ {
+ prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ }
+
+ if (new_relfrozen_xid)
+ {
+ prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
+ }
+ else
+ {
+ prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ }
+
/*
* Our strategy is to scan the page and make lists of items to change,
* then apply the changes within a critical section. This keeps as much
@@ -280,10 +318,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
prstate.new_prune_xid = InvalidTransactionId;
prstate.vistest = vistest;
- prstate.mark_unused_now = mark_unused_now;
+ prstate.actions = actions;
prstate.latest_xid_removed = InvalidTransactionId;
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
+ memset(prstate.counted, 0, sizeof(prstate.counted));
/*
* prstate.htsv is not initialized here because all ntuple spots in the
@@ -291,7 +330,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
presult->ndeleted = 0;
presult->nnewlpdead = 0;
- presult->nfrozen = 0;
presult->hastup = false;
@@ -300,13 +338,45 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = 0;
/*
- * Caller will update the VM after pruning, collecting LP_DEAD items, and
+ * Caller may update the VM after pruning, collecting LP_DEAD items, and
* freezing tuples. Keep track of whether or not the page is all_visible
* and all_frozen and use this information to update the VM. all_visible
* implies lpdead_items == 0, but don't trust all_frozen result unless
- * all_visible is also set to true.
+ * all_visible is also set to true. If we won't even try freezing,
+ * initialize all_frozen to false.
+ *
+ * For vacuum, if the whole page will become frozen, we consider
+ * opportunistically freezing tuples. Dead tuples which will be removed by
+ * the end of vacuuming should not preclude us from opportunistically
+ * freezing. We will not be able to freeze the whole page if there are
+ * tuples present which are not visible to everyone or if there are dead
+ * tuples which are not yet removable. We need all_visible to be false if
+ * LP_DEAD tuples remain after pruning so that we do not incorrectly
+ * update the visibility map or page hint bit. So, we will update
+ * presult->all_visible to reflect the presence of LP_DEAD items while
+ * pruning and keep all_visible_except_removable to permit freezing if the
+ * whole page will eventually become all visible after removing tuples.
*/
- presult->all_frozen = true;
+ presult->all_visible = true;
+
+ if (prstate.actions & PRUNE_DO_TRY_FREEZE)
+ presult->set_all_frozen = true;
+ else
+ presult->set_all_frozen = false;
+ presult->nfrozen = 0;
+
+ /*
+ * Deliberately delay unsetting all_visible until later during pruning.
+ * Removable dead tuples shouldn't preclude freezing the page. After
+ * finishing this first pass of tuple visibility checks, initialize
+ * all_visible_except_removable with the current value of all_visible to
+ * indicate whether or not the page is all visible except for dead tuples.
+ * This will allow us to attempt to freeze the page after pruning. Later
+ * during pruning, if we encounter an LP_DEAD item or are setting an item
+ * LP_DEAD, we will unset all_visible. As long as we unset it before
+ * updating the visibility map, this will be correct.
+ */
+ prstate.all_visible_except_removable = true;
/*
* The visibility cutoff xid is the newest xmin of live tuples on the
@@ -316,13 +386,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* running transaction on the standby does not see tuples on the page as
* all-visible, so the conflict horizon remains InvalidTransactionId.
*/
- presult->vm_conflict_horizon = visibility_cutoff_xid = InvalidTransactionId;
+ presult->vm_conflict_horizon = prstate.visibility_cutoff_xid = InvalidTransactionId;
frz_conflict_horizon = InvalidTransactionId;
- /* For advancing relfrozenxid and relminmxid */
- presult->new_relfrozenxid = InvalidTransactionId;
- presult->new_relminmxid = InvalidMultiXactId;
-
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -346,7 +412,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* prefetching efficiency significantly / decreases the number of cache
* misses.
*/
- all_visible_except_removable = true;
for (offnum = maxoff;
offnum >= FirstOffsetNumber;
offnum = OffsetNumberPrev(offnum))
@@ -375,168 +440,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
-
- if (reason == PRUNE_ON_ACCESS)
- continue;
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (prstate.htsv[offnum])
- {
- case HEAPTUPLE_DEAD:
-
- /*
- * Deliberately delay unsetting all_visible until later during
- * pruning. Removable dead tuples shouldn't preclude freezing
- * the page. After finishing this first pass of tuple
- * visibility checks, initialize all_visible_except_removable
- * with the current value of all_visible to indicate whether
- * or not the page is all visible except for dead tuples. This
- * will allow us to attempt to freeze the page after pruning.
- * Later during pruning, if we encounter an LP_DEAD item or
- * are setting an item LP_DEAD, we will unset all_visible. As
- * long as we unset it before updating the visibility map,
- * this will be correct.
- */
- break;
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- presult->live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible_except_removable)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible_except_removable = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A
- * FrozenTransactionId is seen as committed to everyone.
- * Otherwise, we check if there is a snapshot that
- * considers this xid to still be running, and if so, we
- * don't consider the page all-visible.
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (xmin != FrozenTransactionId &&
- !GlobalVisTestIsRemovableXid(vistest, xmin))
- {
- all_visible_except_removable = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- presult->recently_dead_tuples++;
- all_visible_except_removable = false;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- all_visible_except_removable = false;
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This an expected case during concurrent vacuum. Count such
- * rows as live. As above, we assume the deleting transaction
- * will commit and update the counters after we report.
- */
- presult->live_tuples++;
- all_visible_except_removable = false;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
-
- if (prstate.htsv[offnum] != HEAPTUPLE_DEAD)
- {
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- */
- presult->hastup = true;
-
- /* Consider freezing any normal tuples which will not be removed */
- if (pagefrz)
- {
- bool totally_frozen;
-
- /* Tuple with storage -- consider need to freeze */
- if ((heap_prepare_freeze_tuple(htup, pagefrz,
- &prstate.frozen[presult->nfrozen],
- &totally_frozen)))
- {
- /* Save prepared freeze plan for later */
- prstate.frozen[presult->nfrozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or
- * eligible to become totally frozen (according to its freeze
- * plan), then the page definitely cannot be set all-frozen in
- * the visibility map later on
- */
- if (!totally_frozen)
- presult->all_frozen = false;
- }
- }
}
/*
@@ -545,21 +448,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
- /*
- * For vacuum, if the whole page will become frozen, we consider
- * opportunistically freezing tuples. Dead tuples which will be removed by
- * the end of vacuuming should not preclude us from opportunistically
- * freezing. We will not be able to freeze the whole page if there are
- * tuples present which are not visible to everyone or if there are dead
- * tuples which are not yet removable. We need all_visible to be false if
- * LP_DEAD tuples remain after pruning so that we do not incorrectly
- * update the visibility map or page hint bit. So, we will update
- * presult->all_visible to reflect the presence of LP_DEAD items while
- * pruning and keep all_visible_except_removable to permit freezing if the
- * whole page will eventually become all visible after removing tuples.
- */
- presult->all_visible = all_visible_except_removable;
-
/* Scan the page */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -615,15 +503,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* opportunistic freeze heuristic must be improved; however, for now, try
* to approximate it.
*/
-
do_freeze = false;
- if (pagefrz)
+ if (prstate.actions & PRUNE_DO_TRY_FREEZE)
{
/* Is the whole page freezable? And is there something to freeze? */
- bool whole_page_freezable = all_visible_except_removable &&
- presult->all_frozen;
+ bool whole_page_freezable = prstate.all_visible_except_removable &&
+ presult->set_all_frozen;
- if (pagefrz->freeze_required)
+ if (prstate.pagefrz.freeze_required)
do_freeze = true;
else if (whole_page_freezable && presult->nfrozen > 0)
{
@@ -648,17 +535,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* want to avoid doing the pre-freeze checks in a critical section.
*/
if (do_freeze)
- heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
-
- if (!do_freeze && (!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
+ heap_pre_freeze_checks(buffer, prstate.pagefrz.frozen, presult->nfrozen);
+ else if (!presult->set_all_frozen || presult->nfrozen > 0)
{
/*
* If we will neither freeze tuples on the page nor set the page all
* frozen in the visibility map, the page is not all-frozen and there
* will be no newly frozen tuples.
*/
- presult->all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumentation */
+ presult->set_all_frozen = false;
+ presult->nfrozen = 0; /* avoid miscounts in instrumenation */
}
/* Any error while applying the changes is critical */
@@ -708,15 +594,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* conservative cutoff by stepping back from OldestXmin. This
* avoids false conflicts when hot_standby_feedback is in use.
*/
- if (all_visible_except_removable && presult->all_frozen)
- frz_conflict_horizon = visibility_cutoff_xid;
+ if (prstate.all_visible_except_removable && presult->set_all_frozen)
+ frz_conflict_horizon = prstate.visibility_cutoff_xid;
else
{
/* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+ frz_conflict_horizon = prstate.pagefrz.cutoffs->OldestXmin;
TransactionIdRetreat(frz_conflict_horizon);
}
- heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
+ heap_freeze_prepared_tuples(buffer, prstate.pagefrz.frozen, presult->nfrozen);
}
MarkBufferDirty(buffer);
@@ -746,7 +632,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
log_heap_prune_and_freeze(relation, buffer,
conflict_xid,
true, reason,
- prstate.frozen, presult->nfrozen,
+ prstate.pagefrz.frozen, presult->nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
@@ -761,29 +647,31 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* page is completely frozen, there can be no conflict and the
* vm_conflict_horizon should remain InvalidTransactionId.
*/
- if (!presult->all_frozen)
- presult->vm_conflict_horizon = visibility_cutoff_xid;
+ if (!presult->set_all_frozen)
+ presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * If we will freeze tuples on the page or, even if we don't freeze tuples
+ * on the page, if we will set the page all-frozen in the visibility map,
+ * we can advance relfrozenxid and relminmxid to the values in
+ * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
+ * MFIXME: which one should be pick if presult->nfrozen == 0 and
+ * presult->all_frozen = True.
+ */
+ if (new_relfrozen_xid)
+ {
+ if (presult->nfrozen > 0)
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ else
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ }
- if (pagefrz)
+ if (new_relmin_mxid)
{
- /*
- * If we will freeze tuples on the page or, even if we don't freeze
- * tuples on the page, if we will set the page all-frozen in the
- * visibility map, we can advance relfrozenxid and relminmxid to the
- * values in pagefrz->FreezePageRelfrozenXid and
- * pagefrz->FreezePageRelminMxid. MFIXME: which one should be pick if
- * presult->nfrozen == 0 and presult->all_frozen = True.
- */
if (presult->nfrozen > 0)
- {
- presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
- presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
- }
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
else
- {
- presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
- presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
- }
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
@@ -896,13 +784,32 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
- !HeapTupleHeaderIsHotUpdated(htup))
+ if (!HeapTupleHeaderIsHotUpdated(htup))
{
- heap_prune_record_unused(prstate, rootoffnum);
- HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->latest_xid_removed);
- ndeleted++;
+ if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD)
+ {
+ heap_prune_record_unused(prstate, rootoffnum);
+ HeapTupleHeaderAdvanceConflictHorizon(htup,
+ &prstate->latest_xid_removed);
+ ndeleted++;
+ }
+ else
+ {
+ Assert(!prstate->marked[rootoffnum]);
+
+ /*
+ * MFIXME: not sure if this is right -- maybe counting too
+ * many
+ */
+
+ /*
+ * Ensure that this tuple is counted. If it is later
+ * redirected to, it would have been counted then, but we
+ * won't double count because we check if it has already
+ * been counted first.
+ */
+ heap_prune_record_live_or_recently_dead(dp, prstate, rootoffnum, presult);
+ }
}
/* Nothing more to do */
@@ -963,13 +870,13 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (ItemIdIsDead(lp))
{
/*
- * If the caller set mark_unused_now true, we can set dead line
- * pointers LP_UNUSED now. We don't increment ndeleted here since
- * the LP was already marked dead. If it will not be marked
+ * If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can set dead
+ * line pointers LP_UNUSED now. We don't increment ndeleted here
+ * since the LP was already marked dead. If it will not be marked
* LP_UNUSED, it will remain LP_DEAD, making the page not
* all_visible.
*/
- if (unlikely(prstate->mark_unused_now))
+ if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
heap_prune_record_unused(prstate, offnum);
else
{
@@ -1114,7 +1021,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (i >= nchain)
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
else
- heap_prune_record_redirect(prstate, rootoffnum, chainitems[i], presult);
+ heap_prune_record_redirect(dp, prstate, rootoffnum, chainitems[i], presult);
}
else if (nchain < 2 && ItemIdIsRedirected(rootlp))
{
@@ -1128,6 +1035,14 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
}
+ /*
+ * If not marked for pruning, consider if the tuple should be counted as
+ * live or recently dead. Note that line pointers redirected to will
+ * already have been counted.
+ */
+ if (ItemIdIsNormal(rootlp) && !prstate->marked[rootoffnum])
+ heap_prune_record_live_or_recently_dead(dp, prstate, rootoffnum, presult);
+
return ndeleted;
}
@@ -1147,13 +1062,15 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
/* Record line pointer to be redirected */
static void
-heap_prune_record_redirect(PruneState *prstate,
+heap_prune_record_redirect(Page page, PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
PruneFreezeResult *presult)
{
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
prstate->redirected[prstate->nredirected * 2 + 1] = rdoffnum;
+ heap_prune_record_live_or_recently_dead(page, prstate, rdoffnum, presult);
+
prstate->nredirected++;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
@@ -1185,22 +1102,22 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
}
/*
- * Depending on whether or not the caller set mark_unused_now to true, record that a
- * line pointer should be marked LP_DEAD or LP_UNUSED. There are other cases in
- * which we will mark line pointers LP_UNUSED, but we will not mark line
- * pointers LP_DEAD if mark_unused_now is true.
+ * Depending on whether or not the caller set PRUNE_DO_MARK_UNUSED_NOW, record
+ * that a line pointer should be marked LP_DEAD or LP_UNUSED. There are other
+ * cases in which we will mark line pointers LP_UNUSED, but we will not mark
+ * line pointers LP_DEAD if PRUNE_DO_MARK_UNUSED_NOW is set.
*/
static void
heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
PruneFreezeResult *presult)
{
/*
- * If the caller set mark_unused_now to true, we can remove dead tuples
+ * If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can remove dead tuples
* during pruning instead of marking their line pointers dead. Set this
* tuple's line pointer LP_UNUSED. We hint that this option is less
* likely.
*/
- if (unlikely(prstate->mark_unused_now))
+ if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
heap_prune_record_unused(prstate, offnum);
else
heap_prune_record_dead(prstate, offnum, presult);
@@ -1217,6 +1134,187 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
prstate->marked[offnum] = true;
}
+static void
+heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNumber offnum,
+ PruneFreezeResult *presult)
+{
+ HTSV_Result status;
+ HeapTupleHeader htup;
+ bool totally_frozen;
+
+ /* This could happen for items which are redirected to. */
+ if (prstate->counted[offnum])
+ return;
+
+ prstate->counted[offnum] = true;
+
+ /*
+ * If we don't want to do any of the special defined actions, we don't
+ * need to continue.
+ */
+ if (prstate->actions == 0)
+ return;
+
+ status = htsv_get_valid_status(prstate->htsv[offnum]);
+
+ Assert(status != HEAPTUPLE_DEAD);
+
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the soft
+ * assumption that any LP_DEAD items encountered here will become
+ * LP_UNUSED later on, before count_nondeletable_pages is reached. If we
+ * don't make this assumption then rel truncation will only happen every
+ * other VACUUM, at most. Besides, VACUUM must treat
+ * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
+ * handled (handled here, or handled later on).
+ */
+ presult->hastup = true;
+
+ /*
+ * The criteria for counting a tuple as live in this block need to match
+ * what analyze.c's acquire_sample_rows() does, otherwise VACUUM and
+ * ANALYZE may produce wildly different reltuples values, e.g. when there
+ * are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as VACUUM
+ * can't run inside a transaction block, which makes some cases impossible
+ * (e.g. in-progress insert from the same transaction).
+ *
+ * We treat LP_DEAD items (which are the closest thing to DEAD tuples that
+ * might be seen here) differently, too: we assume that they'll become
+ * LP_UNUSED before VACUUM finishes. This difference is only superficial.
+ * VACUUM effectively agrees with ANALYZE about DEAD items, in the end.
+ * VACUUM won't remember LP_DEAD items, but only because they're not
+ * supposed to be left behind when it is done. (Cases where we bypass
+ * index vacuuming will violate this optimistic assumption, but the
+ * overall impact of that should be negligible.)
+ *
+ * HEAPTUPLE_LIVE tuples are naturally counted as live. This is also what
+ * acquire_sample_rows() does.
+ *
+ * HEAPTUPLE_DELETE_IN_PROGRESS tuples are expected during concurrent
+ * vacuum. We expect the deleting transaction to update the counters at
+ * commit after we report our results, so count these tuples as live to
+ * ensure the math works out. The assumption that the transaction will
+ * commit and update the counters after we report is a bit shaky; but it
+ * is what acquire_sample_rows() does, so we do the same to be consistent.
+ */
+ htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+
+ switch (status)
+ {
+ case HEAPTUPLE_LIVE:
+
+ /*
+ * Count it as live. Not only is this natural, but it's also what
+ * acquire_sample_rows() does.
+ */
+ presult->live_tuples++;
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed asynchronously.
+ * See SetHintBits for more info. Check that the tuple is hinted
+ * xmin-committed because of that.
+ */
+ if (prstate->all_visible_except_removable)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed? A FrozenTransactionId
+ * is seen as committed to everyone. Otherwise, we check if
+ * there is a snapshot that considers this xid to still be
+ * running, and if so, we don't consider the page all-visible.
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+
+ /* For now always use pagefrz->cutoffs */
+ Assert(prstate->pagefrz.cutoffs);
+ if (!TransactionIdPrecedes(xmin, prstate->pagefrz.cutoffs->OldestXmin))
+ {
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
+ TransactionIdIsNormal(xmin))
+ prstate->visibility_cutoff_xid = xmin;
+ }
+ break;
+ case HEAPTUPLE_RECENTLY_DEAD:
+
+ /*
+ * If tuple is recently dead then we must not remove it from the
+ * relation. (We only remove items that are LP_DEAD from
+ * pruning.)
+ */
+ presult->recently_dead_tuples++;
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and we
+ * assume that will happen only after we report our results. This
+ * assumption is a bit shaky, but it is what acquire_sample_rows()
+ * does, so be consistent.
+ */
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+ /*
+ * This an expected case during concurrent vacuum. Count such rows
+ * as live. As above, we assume the deleting transaction will
+ * commit and update the counters after we report.
+ */
+ presult->live_tuples++;
+ prstate->all_visible_except_removable = false;
+ presult->all_visible = false;
+ break;
+ default:
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+ break;
+ }
+
+ /* Consider freezing any normal tuples which will not be removed */
+ if (prstate->actions & PRUNE_DO_TRY_FREEZE)
+ {
+ /* Tuple with storage -- consider need to freeze */
+ if ((heap_prepare_freeze_tuple(htup, &prstate->pagefrz,
+ &prstate->pagefrz.frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ prstate->pagefrz.frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to
+ * become totally frozen (according to its freeze plan), then the page
+ * definitely cannot be set all-frozen in the visibility map later on
+ */
+ if (!totally_frozen)
+ presult->set_all_frozen = false;
+ }
+
+}
/*
* Perform the actual page changes needed by heap_page_prune.
@@ -1350,12 +1448,12 @@ heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
else
{
/*
- * When heap_page_prune() was called, mark_unused_now may have
- * been passed as true, which allows would-be LP_DEAD items to be
- * made LP_UNUSED instead. This is only possible if the relation
- * has no indexes. If there are any dead items, then
- * mark_unused_now was not true and every item being marked
- * LP_UNUSED must refer to a heap-only tuple.
+ * When heap_page_prune() was called, PRUNE_DO_MARK_UNUSED_NOW may
+ * have been set, which allows would-be LP_DEAD items to be made
+ * LP_UNUSED instead. This is only possible if the relation has
+ * no indexes. If there are any dead items, then
+ * PRUNE_DO_MARK_UNUSED_NOW was not set and every item being
+ * marked LP_UNUSED must refer to a heap-only tuple.
*/
if (ndead > 0)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0fb5a7dd24d..04e86347a0b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1397,18 +1397,10 @@ lazy_scan_prune(LVRelState *vacrel,
{
Relation rel = vacrel->rel;
PruneFreezeResult presult;
- HeapPageFreeze pagefrz;
+ uint8 actions = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
- /* Initialize pagefrz */
- pagefrz.freeze_required = false;
- pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
- pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
- pagefrz.cutoffs = &vacrel->cutoffs;
-
/*
* Prune all HOT-update chains and potentially freeze tuples on this page.
*
@@ -1418,22 +1410,26 @@ lazy_scan_prune(LVRelState *vacrel,
* of as the number of tuples that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
- * items LP_UNUSED, so mark_unused_now should be true if no indexes and
- * false otherwise.
+ * items LP_UNUSED, so PRUNE_DO_MARK_UNUSED_NOW should be set if no
+ * indexes and unset otherwise.
*
* We will update the VM after collecting LP_DEAD items and freezing
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
- &pagefrz, &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
+ actions |= PRUNE_DO_TRY_FREEZE;
- vacrel->offnum = InvalidOffsetNumber;
+ if (vacrel->nindexes == 0)
+ actions |= PRUNE_DO_MARK_UNUSED_NOW;
- Assert(MultiXactIdIsValid(presult.new_relminmxid));
- vacrel->NewRelfrozenXid = presult.new_relfrozenxid;
- Assert(TransactionIdIsValid(presult.new_relfrozenxid));
- vacrel->NewRelminMxid = presult.new_relminmxid;
+ heap_page_prune_and_freeze(rel, buf, actions, vacrel->vistest,
+ &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum,
+ &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
+
+ Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
+ Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
+
+ vacrel->offnum = InvalidOffsetNumber;
if (presult.nfrozen > 0)
{
@@ -1466,7 +1462,7 @@ lazy_scan_prune(LVRelState *vacrel,
&debug_cutoff, &debug_all_frozen))
Assert(false);
- Assert(presult.all_frozen == debug_all_frozen);
+ Assert(presult.set_all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
debug_cutoff == presult.vm_conflict_horizon);
@@ -1521,7 +1517,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (presult.all_frozen)
+ if (presult.set_all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1592,7 +1588,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.set_all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 68b4d5b859c..a0420bea2eb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -191,8 +191,35 @@ typedef struct HeapPageFreeze
MultiXactId NoFreezePageRelminMxid;
struct VacuumCutoffs *cutoffs;
+
+ /*
+ * One entry for every tuple that we may freeze.
+ */
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} HeapPageFreeze;
+/*
+ * Actions that can be taken during pruning and freezing. By default, we will
+ * at least attempt regular pruning.
+ */
+
+/*
+ * mark_unused_now indicates whether or not dead items can be set LP_UNUSED
+ * during pruning.
+ */
+#define PRUNE_DO_MARK_UNUSED_NOW (1 << 1)
+
+/*
+ * Freeze if advantageous or required and try to advance relfrozenxid and
+ * relminmxid. To attempt freezing, we will need to determine if the page is
+ * all frozen. So, if this action is set, we will also inform the caller if the
+ * page is all-visible and/or all-frozen and calculate a snapshot conflict
+ * horizon for updating the visibility map. While doing this, we also count if
+ * tuples are live or recently dead.
+ */
+#define PRUNE_DO_TRY_FREEZE (1 << 2)
+
+
/*
* Per-page state returned from pruning
*/
@@ -203,14 +230,17 @@ typedef struct PruneFreezeResult
/*
* The rest of the fields in PruneFreezeResult are only guaranteed to be
- * initialized if heap_page_prune_and_freeze() is passed a PruneReason
- * other than PRUNE_ON_ACCESS.
+ * initialized if heap_page_prune_and_freeze() is passed
+ * PRUNE_DO_TRY_FREEZE.
*/
- int live_tuples;
- int recently_dead_tuples;
-
/* Number of tuples we froze */
int nfrozen;
+ /* Whether or not the page should be set all-frozen in the VM */
+ bool set_all_frozen;
+
+ /* Number of live and recently dead tuples */
+ int live_tuples;
+ int recently_dead_tuples;
/*
* Whether or not the page is truly all-visible after pruning. If there
@@ -219,8 +249,6 @@ typedef struct PruneFreezeResult
*/
bool all_visible;
- /* Whether or not the page can be set all-frozen in the VM */
- bool all_frozen;
/* Whether or not the page makes rel truncation unsafe */
bool hastup;
@@ -232,15 +260,6 @@ typedef struct PruneFreezeResult
*/
TransactionId vm_conflict_horizon;
- /*
- * One entry for every tuple that we may freeze.
- */
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
- /* New value of relfrozenxid found by heap_page_prune_and_freeze() */
- TransactionId new_relfrozenxid;
-
- /* New value of relminmxid found by heap_page_prune_and_freeze() */
- MultiXactId new_relminmxid;
int lpdead_items; /* includes existing LP_DEAD items */
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
@@ -354,12 +373,14 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ uint8 actions,
struct GlobalVisState *vistest,
- bool mark_unused_now,
- HeapPageFreeze *pagefrz,
+ struct VacuumCutoffs *cutoffs,
PruneFreezeResult *presult,
PruneReason reason,
- OffsetNumber *off_loc);
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid);
extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
--
2.40.1
v9-0017-Move-frozen-array-to-PruneState.patchtext/x-diff; charset=us-asciiDownload
From 5c024ee72abe7033c1d691e2e2a83d4b9ff2085a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 27 Mar 2024 23:37:35 +0200
Subject: [PATCH v9 17/21] Move 'frozen' array to PruneState.
It can be internal to heap_page_prune_and_freeze(), like the other
arrays. The freeze subroutines don't need it.
---
src/backend/access/heap/pruneheap.c | 22 ++++++++++++----------
src/include/access/heapam.h | 8 +-------
2 files changed, 13 insertions(+), 17 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6f039002684..fd8dc0bc85b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,10 +43,12 @@ typedef struct
int nredirected; /* numbers of entries in arrays below */
int ndead;
int nunused;
+ int nfrozen;
/* arrays that accumulate indexes of items to be changed */
OffsetNumber redirected[MaxHeapTuplesPerPage * 2];
OffsetNumber nowdead[MaxHeapTuplesPerPage];
OffsetNumber nowunused[MaxHeapTuplesPerPage];
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
/*
* marked[i] is true if item i is entered in one of the above arrays.
@@ -320,7 +322,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.vistest = vistest;
prstate.actions = actions;
prstate.latest_xid_removed = InvalidTransactionId;
- prstate.nredirected = prstate.ndead = prstate.nunused = 0;
+ prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
memset(prstate.counted, 0, sizeof(prstate.counted));
@@ -363,7 +365,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->set_all_frozen = true;
else
presult->set_all_frozen = false;
- presult->nfrozen = 0;
/*
* Deliberately delay unsetting all_visible until later during pruning.
@@ -512,7 +513,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (prstate.pagefrz.freeze_required)
do_freeze = true;
- else if (whole_page_freezable && presult->nfrozen > 0)
+ else if (whole_page_freezable && prstate.nfrozen > 0)
{
/*
* Freezing would make the page all-frozen. In this case, we will
@@ -535,8 +536,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* want to avoid doing the pre-freeze checks in a critical section.
*/
if (do_freeze)
- heap_pre_freeze_checks(buffer, prstate.pagefrz.frozen, presult->nfrozen);
- else if (!presult->set_all_frozen || presult->nfrozen > 0)
+ heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
+ else if (!presult->set_all_frozen || prstate.nfrozen > 0)
{
/*
* If we will neither freeze tuples on the page nor set the page all
@@ -544,7 +545,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* will be no newly frozen tuples.
*/
presult->set_all_frozen = false;
- presult->nfrozen = 0; /* avoid miscounts in instrumenation */
+ prstate.nfrozen = 0; /* avoid miscounts in instrumenation */
}
/* Any error while applying the changes is critical */
@@ -602,7 +603,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
frz_conflict_horizon = prstate.pagefrz.cutoffs->OldestXmin;
TransactionIdRetreat(frz_conflict_horizon);
}
- heap_freeze_prepared_tuples(buffer, prstate.pagefrz.frozen, presult->nfrozen);
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
}
MarkBufferDirty(buffer);
@@ -632,7 +633,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
log_heap_prune_and_freeze(relation, buffer,
conflict_xid,
true, reason,
- prstate.pagefrz.frozen, presult->nfrozen,
+ prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
@@ -649,6 +650,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (!presult->set_all_frozen)
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ presult->nfrozen = prstate.nfrozen;
/*
* If we will freeze tuples on the page or, even if we don't freeze tuples
@@ -1298,11 +1300,11 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
{
/* Tuple with storage -- consider need to freeze */
if ((heap_prepare_freeze_tuple(htup, &prstate->pagefrz,
- &prstate->pagefrz.frozen[presult->nfrozen],
+ &prstate->frozen[presult->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- prstate->pagefrz.frozen[presult->nfrozen++].offset = offnum;
+ prstate->frozen[presult->nfrozen++].offset = offnum;
}
/*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a0420bea2eb..ef61e0277ee 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -191,11 +191,6 @@ typedef struct HeapPageFreeze
MultiXactId NoFreezePageRelminMxid;
struct VacuumCutoffs *cutoffs;
-
- /*
- * One entry for every tuple that we may freeze.
- */
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} HeapPageFreeze;
/*
@@ -227,14 +222,13 @@ typedef struct PruneFreezeResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ int nfrozen; /* Number of tuples we froze */
/*
* The rest of the fields in PruneFreezeResult are only guaranteed to be
* initialized if heap_page_prune_and_freeze() is passed
* PRUNE_DO_TRY_FREEZE.
*/
- /* Number of tuples we froze */
- int nfrozen;
/* Whether or not the page should be set all-frozen in the VM */
bool set_all_frozen;
--
2.40.1
v9-0018-Cosmetic-fixes.patchtext/x-diff; charset=us-asciiDownload
From 4665c6529e7353233a78df9d15d00e5c407a5f11 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 27 Mar 2024 23:41:15 +0200
Subject: [PATCH v9 18/21] Cosmetic fixes
---
src/backend/access/heap/heapam.c | 14 +++++++-------
src/backend/access/heap/pruneheap.c | 2 +-
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index aefc0be0dd3..ed4045925bd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6762,13 +6762,13 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
-* Perform xmin/xmax XID status sanity checks before actually executing freeze
-* plans.
-*
-* heap_prepare_freeze_tuple doesn't perform these checks directly because
-* pg_xact lookups are relatively expensive. They shouldn't be repeated
-* by successive VACUUMs that each decide against freezing the same page.
-*/
+ * Perform xmin/xmax XID status sanity checks before actually executing freeze
+ * plans.
+ *
+ * heap_prepare_freeze_tuple doesn't perform these checks directly because
+ * pg_xact lookups are relatively expensive. They shouldn't be repeated
+ * by successive VACUUMs that each decide against freezing the same page.
+ */
void
heap_pre_freeze_checks(Buffer buffer,
HeapTupleFreeze *tuples, int ntuples)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fd8dc0bc85b..337331901ab 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -545,7 +545,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* will be no newly frozen tuples.
*/
presult->set_all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumenation */
+ prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/* Any error while applying the changes is critical */
--
2.40.1
v9-0019-Almost-cosmetic-fixes.patchtext/x-diff; charset=us-asciiDownload
From 1184c142bbba6fa2457b869cb03829b5002c7a74 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 27 Mar 2024 23:44:17 +0200
Subject: [PATCH v9 19/21] Almost cosmetic fixes
---
src/backend/access/heap/pruneheap.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 337331901ab..2bd2e858bcd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -38,7 +38,10 @@ typedef struct
TransactionId visibility_cutoff_xid;
bool all_visible_except_removable;
- TransactionId new_prune_xid; /* new prune hint value for page */
+ /*
+ * Fields describing what to do to the page
+ */
+ TransactionId new_prune_xid; /* new prune hint value */
TransactionId latest_xid_removed;
int nredirected; /* numbers of entries in arrays below */
int ndead;
@@ -61,7 +64,7 @@ typedef struct
/*
* Tuple visibility is only computed once for each tuple, for correctness
* and efficiency reasons; see comment in heap_page_prune_and_freeze() for
- * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
* use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
* items.
*
--
2.40.1
v9-0020-Move-frz_conflict_horizon-to-tighter-scope.patchtext/x-diff; charset=us-asciiDownload
From 87bf27b2cbb8825c624b6fc7b80155cf762051ce Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 27 Mar 2024 23:47:24 +0200
Subject: [PATCH v9 20/21] Move 'frz_conflict_horizon' to tighter scope
---
src/backend/access/heap/pruneheap.c | 38 ++++++++++++++---------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2bd2e858bcd..e1eed42004f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -273,7 +273,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
- TransactionId frz_conflict_horizon;
bool do_freeze;
bool do_prune;
bool do_hint;
@@ -391,7 +390,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* all-visible, so the conflict horizon remains InvalidTransactionId.
*/
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid = InvalidTransactionId;
- frz_conflict_horizon = InvalidTransactionId;
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -590,24 +588,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
if (do_freeze)
- {
- /*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin. This
- * avoids false conflicts when hot_standby_feedback is in use.
- */
- if (prstate.all_visible_except_removable && presult->set_all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.pagefrz.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
- }
MarkBufferDirty(buffer);
@@ -626,8 +607,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
TransactionId conflict_xid;
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for
+ * conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (do_freeze)
+ {
+ if (prstate.all_visible_except_removable && presult->set_all_frozen)
+ frz_conflict_horizon = prstate.visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = prstate.pagefrz.cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+ }
+
if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
conflict_xid = frz_conflict_horizon;
else
--
2.40.1
v9-0021-WIP-refactor.patchtext/x-diff; charset=us-asciiDownload
From b946b65744d2e7906d8ea39085d5749c4b6be4d5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 28 Mar 2024 00:16:09 +0200
Subject: [PATCH v9 21/21] WIP refactor
- the revisit array is removed. with the restructuring of
heap_prune_chain(), we could ensure that we called record_unchanged()
for all of the correct tuples without stashing them away
- in heap_prune_chain(), I cleaned up the the chain traversal logic and
grouped the post traversal chain processing is in three separate
branches. I find this to be a big clarity improvement.
---
src/backend/access/heap/pruneheap.c | 794 ++++++++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 6 +-
src/include/access/heapam.h | 40 +-
3 files changed, 504 insertions(+), 336 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e1eed42004f..1cb692dd25f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -32,16 +32,16 @@
/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
+ /* PRUNE_DO_* arguments */
+ uint8 actions;
+
/* tuple visibility test, initialized for the relation */
GlobalVisState *vistest;
- uint8 actions;
- TransactionId visibility_cutoff_xid;
- bool all_visible_except_removable;
/*
* Fields describing what to do to the page
*/
- TransactionId new_prune_xid; /* new prune hint value */
+ TransactionId new_prune_xid; /* new prune hint value */
TransactionId latest_xid_removed;
int nredirected; /* numbers of entries in arrays below */
int ndead;
@@ -53,8 +53,10 @@ typedef struct
OffsetNumber nowunused[MaxHeapTuplesPerPage];
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+ HeapPageFreeze pagefrz;
+
/*
- * marked[i] is true if item i is entered in one of the above arrays.
+ * marked[i] is true when heap_prune_chain() has already processed item i.
*
* This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
* 1. Otherwise every access would need to subtract 1.
@@ -73,37 +75,73 @@ typedef struct
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
- HeapPageFreeze pagefrz;
+ /*
+ * The rest of the fields are not used by pruning itself, but are used to
+ * collect information about what was pruned and what state the page is in
+ * after pruning, for the benefit of the caller. They are copied to
+ * PruneFreezeResult at the end.
+ */
+
+ int ndeleted; /* Number of tuples deleted from the page */
+
+ /* Number of live and recently dead tuples, after pruning */
+ int live_tuples;
+ int recently_dead_tuples;
+
+ /* Whether or not the page makes rel truncation unsafe */
+ bool hastup;
/*
- * Whether or not this tuple has been counted toward vacuum stats. In
- * heap_prune_chain(), we have to be sure that Heap Only Tuples that are
- * not part of any chain are counted correctly.
+ * LP_DEAD items on the page after pruning. Includes existing LP_DEAD
+ * items
+ */
+ int lpdead_items; /* includes existing LP_DEAD items */
+ OffsetNumber *deadoffsets; /* points directly to PruneResult->deadoffsets */
+
+ /*
+ * all_visible and all_frozen indicate if the all-visible and all-frozen
+ * bits in the visibility map can be set for this page, after pruning.
+ *
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page.
+ * The caller can use it as the conflict horizon, when setting the VM
+ * bits. It is only valid if we froze some tuples, and all_frozen is
+ * true.
+ *
+ * These are only set if the PRUNE_DO_TRY_FREEZE action flag is set.
+ *
+ * NOTE: This 'all_visible' doesn't include LP_DEAD items. That's
+ * convenient for heap_page_prune_and_freeze(), to use this to decide
+ * whether to freeze the page or not. The 'all_visible' value returned to
+ * the caller is adjusted to include LP_DEAD items at the end.
*/
- bool counted[MaxHeapTuplesPerPage + 1];
+ bool all_visible;
+ bool all_frozen;
+ TransactionId visibility_cutoff_xid;
+
} PruneState;
/* Local functions */
static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
HeapTuple tup,
Buffer buffer);
-static int heap_prune_chain(Buffer buffer,
- OffsetNumber rootoffnum,
- PruneState *prstate, PruneFreezeResult *presult);
-
static inline HTSV_Result htsv_get_valid_status(int status);
+static void heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
+ PruneState *prstate);
+
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
-static void heap_prune_record_redirect(Page page, PruneState *prstate,
+static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
- PruneFreezeResult *presult);
+ bool was_normal);
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneFreezeResult *presult);
+ bool was_normal);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneFreezeResult *presult);
-static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
+ bool was_normal);
+static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
+
+static void heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate,
- OffsetNumber offnum, PruneFreezeResult *presult);
static void page_verify_redirects(Page page);
@@ -242,6 +280,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
+ * cutoffs contains the information on visibility for the whole relation
+ * collected by vacuum at the beginning of vacuuming the relation. It will be
+ * NULL for callers other than vacuum.
+ *
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
* heap_page_prune_and_freeze() is responsible for initializing it.
@@ -326,70 +368,62 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.latest_xid_removed = InvalidTransactionId;
prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
- memset(prstate.counted, 0, sizeof(prstate.counted));
/*
* prstate.htsv is not initialized here because all ntuple spots in the
* array will be set either to a valid HTSV_Result value or -1.
*/
- presult->ndeleted = 0;
- presult->nnewlpdead = 0;
-
- presult->hastup = false;
- presult->live_tuples = 0;
- presult->recently_dead_tuples = 0;
- presult->lpdead_items = 0;
+ prstate.ndeleted = 0;
+ prstate.hastup = false;
+ prstate.live_tuples = 0;
+ prstate.recently_dead_tuples = 0;
+ prstate.lpdead_items = 0;
+ prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after pruning, collecting LP_DEAD items, and
- * freezing tuples. Keep track of whether or not the page is all_visible
- * and all_frozen and use this information to update the VM. all_visible
- * implies lpdead_items == 0, but don't trust all_frozen result unless
- * all_visible is also set to true. If we won't even try freezing,
- * initialize all_frozen to false.
+ * Caller may update the VM after we're done. We keep track of whether
+ * the page will be all_visible and all_frozen, once we're done with the
+ * pruning and freezing, to help the caller to do that.
*
- * For vacuum, if the whole page will become frozen, we consider
- * opportunistically freezing tuples. Dead tuples which will be removed by
- * the end of vacuuming should not preclude us from opportunistically
- * freezing. We will not be able to freeze the whole page if there are
- * tuples present which are not visible to everyone or if there are dead
- * tuples which are not yet removable. We need all_visible to be false if
- * LP_DEAD tuples remain after pruning so that we do not incorrectly
- * update the visibility map or page hint bit. So, we will update
- * presult->all_visible to reflect the presence of LP_DEAD items while
- * pruning and keep all_visible_except_removable to permit freezing if the
- * whole page will eventually become all visible after removing tuples.
+ * Currently, only VACUUM sets the VM bits. To save the effort, only do
+ * only the bookkeeping if the caller needs it. Currently, that's tied to
+ * PRUNE_DO_TRY_FREEZE, but it could be a separate flag, if you wanted to
+ * update the VM bits without also freezing, or freezing without setting
+ * the VM bits.
+ *
+ * In addition to telling the caller whether it can set the VM bit, we
+ * also use 'all_visible' and 'all_frozen' for our own decision-making. If
+ * the whole page will become frozen, we consider opportunistically
+ * freezing tuples. We will not be able to freeze the whole page if there
+ * are tuples present which are not visible to everyone or if there are
+ * dead tuples which are not yet removable. However, dead tuples which
+ * will be removed by the end of vacuuming should not preclude us from
+ * opportunistically freezing. Because of that, we do not clear
+ * all_visible when we see LP_DEAD items. We fix that at the end of the
+ * function, when we return the value to the caller, so that the caller
+ * doesn't set the VM bit incorrectly.
*/
- presult->all_visible = true;
-
if (prstate.actions & PRUNE_DO_TRY_FREEZE)
- presult->set_all_frozen = true;
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = true;
+ }
else
- presult->set_all_frozen = false;
-
- /*
- * Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page. After
- * finishing this first pass of tuple visibility checks, initialize
- * all_visible_except_removable with the current value of all_visible to
- * indicate whether or not the page is all visible except for dead tuples.
- * This will allow us to attempt to freeze the page after pruning. Later
- * during pruning, if we encounter an LP_DEAD item or are setting an item
- * LP_DEAD, we will unset all_visible. As long as we unset it before
- * updating the visibility map, this will be correct.
- */
- prstate.all_visible_except_removable = true;
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
/*
* The visibility cutoff xid is the newest xmin of live tuples on the
* page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
+ * caller can use for updating the VM. If, at the end of freezing and
* pruning, the page is all-frozen, there is no possibility that any
* running transaction on the standby does not see tuples on the page as
* all-visible, so the conflict horizon remains InvalidTransactionId.
*/
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid = InvalidTransactionId;
+ prstate.visibility_cutoff_xid = InvalidTransactionId;
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -450,7 +484,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
- /* Scan the page */
+ /*
+ * Scan the page, processing each tuple.
+ *
+ * heap_prune_chain() decides for each tuple, whether it can be pruned,
+ * redirected or frozen. It follows HOT chains, processing each HOT chain
+ * as a unit.
+ */
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
@@ -471,10 +511,44 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
continue;
/* Process this item or chain of items */
- presult->ndeleted += heap_prune_chain(buffer, offnum,
- &prstate, presult);
+ heap_prune_chain(buffer, offnum, &prstate);
}
+ /*
+ * If PruneReason is PRUNE_ON_ACCESS, there may have been in-progress
+ * deletes or inserts of HOT tuples which broke up the HOT chain and left
+ * unchanged tuples unprocessed. MFIXME: should we just skip the below
+ * since we don't care about most of it if ON_ACCESS? We would have to move
+ * the record_prunable() calls back out, so maybe it's not worht it...
+ */
+ for (offnum = FirstOffsetNumber;
+ offnum <= maxoff;
+ offnum = OffsetNumberNext(offnum))
+ {
+ ItemId itemid = PageGetItemId(page, offnum);
+
+ if (ItemIdIsUsed(itemid) && !prstate.marked[offnum])
+ heap_prune_record_unchanged(page, &prstate, offnum);
+ }
+
+ /* We should now have processed every tuple exactly once */
+#ifdef USE_ASSERT_CHECKING
+ for (offnum = FirstOffsetNumber;
+ offnum <= maxoff;
+ offnum = OffsetNumberNext(offnum))
+ {
+ ItemId itemid;
+
+ if (off_loc)
+ *off_loc = offnum;
+ itemid = PageGetItemId(page, offnum);
+ if (ItemIdIsUsed(itemid))
+ Assert(prstate.marked[offnum]);
+ else
+ Assert(!prstate.marked[offnum]);
+ }
+#endif
+
/* Clear the offset information once we have processed the given page. */
if (off_loc)
*off_loc = InvalidOffsetNumber;
@@ -483,9 +557,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
- /* Record number of newly-set-LP_DEAD items for caller */
- presult->nnewlpdead = prstate.ndead;
-
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update the hint
@@ -509,8 +580,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (prstate.actions & PRUNE_DO_TRY_FREEZE)
{
/* Is the whole page freezable? And is there something to freeze? */
- bool whole_page_freezable = prstate.all_visible_except_removable &&
- presult->set_all_frozen;
+ bool whole_page_freezable = prstate.all_visible &&
+ prstate.all_frozen;
if (prstate.pagefrz.freeze_required)
do_freeze = true;
@@ -538,14 +609,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (do_freeze)
heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- else if (!presult->set_all_frozen || prstate.nfrozen > 0)
+ else if (!prstate.all_frozen || prstate.nfrozen > 0)
{
+ Assert(!prstate.pagefrz.freeze_required);
+
/*
* If we will neither freeze tuples on the page nor set the page all
* frozen in the visibility map, the page is not all-frozen and there
* will be no newly frozen tuples.
*/
- presult->set_all_frozen = false;
+ prstate.all_frozen = false;
prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
}
@@ -618,7 +691,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (do_freeze)
{
- if (prstate.all_visible_except_removable && presult->set_all_frozen)
+ if (prstate.all_visible && prstate.all_frozen)
frz_conflict_horizon = prstate.visibility_cutoff_xid;
else
{
@@ -645,24 +718,61 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
+ /* Copy data back to 'presult' */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which heap pass (initial pass or final pass) ends up setting the
+ * page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state
+ * of things, as expected by our caller.
+ */
+ if (prstate.lpdead_items == 0)
+ {
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
+ }
+ else
+ {
+ presult->all_visible = false;
+ presult->all_frozen = false;
+ }
+ presult->hastup = prstate.hastup;
+
/*
* For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
+ * for that record must be the newest xmin on the page. However, if the
* page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId.
+ * vm_conflict_horizon should remain InvalidTransactionId. This includes
+ * the case that we just froze all the tuples; the prune-freeze record
+ * included the conflict XID already so the caller doesn't need it.
*/
- if (!presult->set_all_frozen)
+ if (!presult->all_frozen)
presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
- presult->nfrozen = prstate.nfrozen;
+ else
+ presult->vm_conflict_horizon = InvalidTransactionId;
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
/*
* If we will freeze tuples on the page or, even if we don't freeze tuples
* on the page, if we will set the page all-frozen in the visibility map,
* we can advance relfrozenxid and relminmxid to the values in
* pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
- * MFIXME: which one should be pick if presult->nfrozen == 0 and
- * presult->all_frozen = True.
*/
+ Assert(presult->nfrozen > 0 || !prstate.pagefrz.freeze_required);
+
if (new_relfrozen_xid)
{
if (presult->nfrozen > 0)
@@ -670,7 +780,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
else
*new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
}
-
if (new_relmin_mxid)
{
if (presult->nfrozen > 0)
@@ -739,25 +848,32 @@ htsv_get_valid_status(int status)
* prstate showing the changes to be made. Items to be redirected are added
* to the redirected[] array (two entries per redirection); items to be set to
* LP_DEAD state are added to nowdead[]; and items to be set to LP_UNUSED
- * state are added to nowunused[].
- *
- * Returns the number of tuples (to be) deleted from the page.
+ * state are added to nowunused[]. We perform bookkeeping of live tuples,
+ * visibility etc. based on what the page will look like after the changes
+ * applied. All that bookkeeping is performed in the heap_prune_record_*()
+ * subroutines. The division of labor is that heap_prune_chain() decides the
+ * fate of each tuple, ie. whether it's going to be removed, redirected or
+ * left unchanged, and the heap_prune_record_*() subroutines update PruneState
+ * based on that outcome.
*/
-static int
+static void
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- PruneState *prstate, PruneFreezeResult *presult)
+ PruneState *prstate)
{
- int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
TransactionId priorXmax = InvalidTransactionId;
ItemId rootlp;
HeapTupleHeader htup;
- OffsetNumber latestdead = InvalidOffsetNumber,
- maxoff = PageGetMaxOffsetNumber(dp),
+ OffsetNumber maxoff = PageGetMaxOffsetNumber(dp),
offnum;
OffsetNumber chainitems[MaxHeapTuplesPerPage];
- int nchain = 0,
- i;
+
+ /*
+ * After traversing the HOT chain, survivor is the index in chainitems of
+ * the first live sucessor after the last dead item.
+ */
+ int survivor = 0,
+ nchain = 0;
rootlp = PageGetItemId(dp, rootoffnum);
@@ -788,49 +904,62 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* Note that we might first arrive at a dead heap-only tuple
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
+ *
+ * Whether we arrive at the dead HOT tuple first here or while
+ * following a chain below affects whether preceding RECENTLY_DEAD
+ * tuples in the chain can be removed or not. Imagine that you
+ * have a chain with two tuples: RECENTLY_DEAD -> DEAD. If we
+ * reach the RECENTLY_DEAD tuple first, the chain-following logic
+ * will find the DEAD tuple and conclude that both tuples are in
+ * fact dead and can be removed. But if we reach the DEAD tuple
+ * at the end of the chain first, when we reach the RECENTLY_DEAD
+ * tuple later, we will not follow the chain because the DEAD
+ * TUPLE is already 'marked', and will not remove the
+ * RECENTLY_DEAD tuple. This is not a correctness issue, and the
+ * RECENTLY_DEAD tuple will be removed by a later VACUUM.
*/
- if (!HeapTupleHeaderIsHotUpdated(htup))
+ if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ !HeapTupleHeaderIsHotUpdated(htup))
{
- if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD)
- {
- heap_prune_record_unused(prstate, rootoffnum);
- HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->latest_xid_removed);
- ndeleted++;
- }
- else
- {
- Assert(!prstate->marked[rootoffnum]);
-
- /*
- * MFIXME: not sure if this is right -- maybe counting too
- * many
- */
-
- /*
- * Ensure that this tuple is counted. If it is later
- * redirected to, it would have been counted then, but we
- * won't double count because we check if it has already
- * been counted first.
- */
- heap_prune_record_live_or_recently_dead(dp, prstate, rootoffnum, presult);
- }
+ heap_prune_record_unused(prstate, rootoffnum, true);
+ HeapTupleHeaderAdvanceConflictHorizon(htup,
+ &prstate->latest_xid_removed);
}
- /* Nothing more to do */
- return ndeleted;
+ /*
+ * MFIXME: I think we could use an example like the one you
+ * suggested below here for on-access pruning. For example, given
+ * the following tuple versions (this is an UPDATE), we bail here
+ * and don't end up returning because there are no prunable tuples
+ * in this chain. This is one of the cases the record_unchanged()
+ * loop outside in heap_page_prune_and_freeze() is catching.
+ *
+ * REDIRECT -> DELETE IN PROGRESS -> INSERT IN PROGRESS
+ */
+ return;
}
}
/* Start from the root tuple */
- offnum = rootoffnum;
+
+ /*----
+ * MFIXME: make this into something...
+ * this helped me to visualize how different chains might look like here.
+ * It's not an exhaustive list, just some examples to help with thinking.
+ * Remove this comment from final version, or refine.
+ *
+ * REDIRECT -> LIVE (stop) -> ...
+ * REDIRECT -> RECENTLY_DEAD -> LIVE (stop) -> ...
+ * REDIRECT -> RECENTLY_DEAD -> RECENTLY_DEAD
+ * REDIRECT -> RECENTLY_DEAD -> DEAD
+ * REDIRECT -> RECENTLY_DEAD -> DEAD -> RECENTLY_DEAD -> DEAD
+ * RECENTLY_DEAD -> ...
+ */
/* while not end of the chain */
- for (;;)
+ for (offnum = rootoffnum;;)
{
ItemId lp;
- bool tupdead,
- recent_dead;
/* Sanity check (pure paranoia) */
if (offnum < FirstOffsetNumber)
@@ -876,19 +1005,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
/*
* If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can set dead
- * line pointers LP_UNUSED now. We don't increment ndeleted here
- * since the LP was already marked dead. If it will not be marked
- * LP_UNUSED, it will remain LP_DEAD, making the page not
- * all_visible.
+ * line pointers LP_UNUSED now.
*/
if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
- heap_prune_record_unused(prstate, offnum);
+ heap_prune_record_unused(prstate, offnum, false);
else
- {
- presult->all_visible = false;
- presult->deadoffsets[presult->lpdead_items++] = offnum;
- }
-
+ heap_prune_record_unchanged_lp_dead(prstate, offnum);
break;
}
@@ -910,67 +1032,37 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
/*
* Check tuple's visibility status.
*/
- tupdead = recent_dead = false;
-
switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
- tupdead = true;
- break;
-
- case HEAPTUPLE_RECENTLY_DEAD:
- recent_dead = true;
/*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
+ * Remember the last DEAD tuple seen. We will advance past
+ * RECENTLY_DEAD tuples just in case there's a DEAD one after
+ * them; but we can't advance past anything else. We want to
+ * ensure that any line pointers for DEAD tuples are set
+ * LP_DEAD or LP_UNUSED. It is important that line pointers
+ * whose offsets are added to deadoffsets are in fact set
+ * LP_DEAD.
*/
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
+ survivor = nchain;
+ HeapTupleHeaderAdvanceConflictHorizon(htup,
+ &prstate->latest_xid_removed);
break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- */
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
+ case HEAPTUPLE_RECENTLY_DEAD:
break;
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
case HEAPTUPLE_LIVE:
case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * If we wanted to optimize for aborts, we might consider
- * marking the page prunable when we see INSERT_IN_PROGRESS.
- * But we don't. See related decisions about when to mark the
- * page prunable in heapam.c.
- */
- break;
+ goto process_chains;
default:
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
- /*
- * Remember the last DEAD tuple seen. We will advance past
- * RECENTLY_DEAD tuples just in case there's a DEAD one after them;
- * but we can't advance past anything else. We have to make sure that
- * we don't miss any DEAD tuples, since DEAD tuples that still have
- * tuple storage after pruning will confuse VACUUM.
- */
- if (tupdead)
- {
- latestdead = offnum;
- HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->latest_xid_removed);
- }
- else if (!recent_dead)
- break;
-
/*
* If the tuple is not HOT-updated, then we are at the end of this
* HOT-update chain.
@@ -990,65 +1082,67 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
priorXmax = HeapTupleHeaderGetUpdateXid(htup);
}
- /*
- * If we found a DEAD tuple in the chain, adjust the HOT chain so that all
- * the DEAD tuples at the start of the chain are removed and the root line
- * pointer is appropriately redirected.
- */
- if (OffsetNumberIsValid(latestdead))
+ if (ItemIdIsRedirected(rootlp) && nchain < 2)
{
/*
- * Mark as unused each intermediate item that we are able to remove
- * from the chain.
- *
- * When the previous item is the last dead tuple seen, we are at the
- * right candidate for redirection.
+ * We found a redirect item that doesn't point to a valid follow-on
+ * item. This can happen if the loop in heap_page_prune_and_freeze()
+ * caused us to visit the dead successor of a redirect item before
+ * visiting the redirect item. We can clean up by setting the
+ * redirect item to DEAD state or LP_UNUSED if the caller indicated.
*/
- for (i = 1; (i < nchain) && (chainitems[i - 1] != latestdead); i++)
- {
- heap_prune_record_unused(prstate, chainitems[i]);
- ndeleted++;
- }
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, false);
+ return;
+ }
+
+process_chains:
+ if (!survivor)
+ {
+ int i;
/*
- * If the root entry had been a normal tuple, we are deleting it, so
- * count it in the result. But changing a redirect (even to DEAD
- * state) doesn't count.
+ * If no DEAD tuple was found, and the root is redirected, mark it as
+ * such.
*/
- if (ItemIdIsNormal(rootlp))
- ndeleted++;
+ if ((i = ItemIdIsRedirected(rootlp)))
+ heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum);
+ /* the rest of tuples in the chain are normal, unchanged tuples */
+ for (; i < nchain; i++)
+ heap_prune_record_unchanged(dp, prstate, chainitems[i]);
+ }
+ else if (survivor == nchain)
+ {
/*
* If the DEAD tuple is at the end of the chain, the entire chain is
- * dead and the root line pointer can be marked dead. Otherwise just
- * redirect the root to the correct chain member.
+ * dead and the root line pointer can be marked dead.
*/
- if (i >= nchain)
- heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
- else
- heap_prune_record_redirect(dp, prstate, rootoffnum, chainitems[i], presult);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, ItemIdIsNormal(rootlp));
+
+ for (int i = 1; i < nchain; i++)
+ heap_prune_record_unused(prstate, chainitems[i], true);
}
- else if (nchain < 2 && ItemIdIsRedirected(rootlp))
+ else
{
/*
- * We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune_and_freeze()
- * caused us to visit the dead successor of a redirect item before
- * visiting the redirect item. We can clean up by setting the
- * redirect item to DEAD state or LP_UNUSED if the caller indicated.
+ * If we found a DEAD tuple in the chain, adjust the HOT chain so that
+ * all the DEAD tuples at the start of the chain are removed and the
+ * root line pointer is appropriately redirected.
*/
- heap_prune_record_dead_or_unused(prstate, rootoffnum, presult);
- }
+ heap_prune_record_redirect(prstate, rootoffnum, chainitems[survivor],
+ ItemIdIsNormal(rootlp));
- /*
- * If not marked for pruning, consider if the tuple should be counted as
- * live or recently dead. Note that line pointers redirected to will
- * already have been counted.
- */
- if (ItemIdIsNormal(rootlp) && !prstate->marked[rootoffnum])
- heap_prune_record_live_or_recently_dead(dp, prstate, rootoffnum, presult);
+ /*
+ * Mark as unused each intermediate item that we are able to remove
+ * from the chain.
+ */
+ for (int i = 1; i < survivor; i++)
+ heap_prune_record_unused(prstate, chainitems[i], true);
- return ndeleted;
+ /* the rest of tuples in the chain are normal, unchanged tuples */
+ for (int i = survivor; i < nchain; i++)
+ heap_prune_record_unchanged(dp, prstate, chainitems[i]);
+ }
}
/* Record lowest soon-prunable XID */
@@ -1067,43 +1161,69 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
/* Record line pointer to be redirected */
static void
-heap_prune_record_redirect(Page page, PruneState *prstate,
+heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
- PruneFreezeResult *presult)
+ bool was_normal)
{
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+
+ /*
+ * Do not mark the redirect target here. It needs to be counted
+ * separately as an unchanged tuple.
+ */
+
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
prstate->redirected[prstate->nredirected * 2 + 1] = rdoffnum;
- heap_prune_record_live_or_recently_dead(page, prstate, rdoffnum, presult);
prstate->nredirected++;
- Assert(!prstate->marked[offnum]);
- prstate->marked[offnum] = true;
- Assert(!prstate->marked[rdoffnum]);
- prstate->marked[rdoffnum] = true;
- presult->hastup = true;
+ /*
+ * If the root entry had been a normal tuple, we are deleting it, so count
+ * it in the result. But changing a redirect (even to DEAD state) doesn't
+ * count.
+ */
+ if (was_normal)
+ prstate->ndeleted++;
+
+ prstate->hastup = true;
}
/* Record line pointer to be marked dead */
static void
heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
- PruneFreezeResult *presult)
+ bool was_normal)
{
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
- Assert(!prstate->marked[offnum]);
- prstate->marked[offnum] = true;
/*
- * Setting the line pointer LP_DEAD means the page will definitely not be
- * all_visible.
+ * Deliberately delay unsetting all_visible until later during pruning.
+ * Removable dead tuples shouldn't preclude freezing the page. After
+ * finishing this first pass of tuple visibility checks, initialize
+ * all_visible_except_removable with the current value of all_visible to
+ * indicate whether or not the page is all visible except for dead tuples.
+ * This will allow us to attempt to freeze the page after pruning. Later
+ * during pruning, if we encounter an LP_DEAD item or are setting an item
+ * LP_DEAD, we will unset all_visible. As long as we unset it before
+ * updating the visibility map, this will be correct.
*/
- presult->all_visible = false;
/* Record the dead offset for vacuum */
- presult->deadoffsets[presult->lpdead_items++] = offnum;
+ prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+ /*
+ * If the root entry had been a normal tuple, we are deleting it, so count
+ * it in the result. But changing a redirect (even to DEAD state) doesn't
+ * count.
+ */
+ if (was_normal)
+ prstate->ndeleted++;
}
/*
@@ -1114,7 +1234,7 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
*/
static void
heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
- PruneFreezeResult *presult)
+ bool was_normal)
{
/*
* If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can remove dead tuples
@@ -1123,57 +1243,45 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
* likely.
*/
if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
- heap_prune_record_unused(prstate, offnum);
+ heap_prune_record_unused(prstate, offnum, was_normal);
else
- heap_prune_record_dead(prstate, offnum, presult);
+ heap_prune_record_dead(prstate, offnum, was_normal);
}
/* Record line pointer to be marked unused */
static void
-heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal)
{
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+
Assert(prstate->nunused < MaxHeapTuplesPerPage);
prstate->nowunused[prstate->nunused] = offnum;
prstate->nunused++;
- Assert(!prstate->marked[offnum]);
- prstate->marked[offnum] = true;
+
+ /*
+ * If the root entry had been a normal tuple, we are deleting it, so count
+ * it in the result. But changing a redirect (even to DEAD state) doesn't
+ * count.
+ */
+ if (was_normal)
+ prstate->ndeleted++;
}
+
+/*
+ * Record line pointer that is left unchanged. We consider freezing it, and
+ * update bookkeeping of tuple counts and page visibility.
+ */
static void
-heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNumber offnum,
- PruneFreezeResult *presult)
+heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum)
{
- HTSV_Result status;
HeapTupleHeader htup;
- bool totally_frozen;
-
- /* This could happen for items which are redirected to. */
- if (prstate->counted[offnum])
- return;
- prstate->counted[offnum] = true;
-
- /*
- * If we don't want to do any of the special defined actions, we don't
- * need to continue.
- */
- if (prstate->actions == 0)
- return;
-
- status = htsv_get_valid_status(prstate->htsv[offnum]);
-
- Assert(status != HEAPTUPLE_DEAD);
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the soft
- * assumption that any LP_DEAD items encountered here will become
- * LP_UNUSED later on, before count_nondeletable_pages is reached. If we
- * don't make this assumption then rel truncation will only happen every
- * other VACUUM, at most. Besides, VACUUM must treat
- * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
- * handled (handled here, or handled later on).
- */
- presult->hastup = true;
+ prstate->hastup = true; /* the page is not empty */
/*
* The criteria for counting a tuple as live in this block need to match
@@ -1185,28 +1293,19 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
* can't run inside a transaction block, which makes some cases impossible
* (e.g. in-progress insert from the same transaction).
*
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples that
- * might be seen here) differently, too: we assume that they'll become
- * LP_UNUSED before VACUUM finishes. This difference is only superficial.
- * VACUUM effectively agrees with ANALYZE about DEAD items, in the end.
- * VACUUM won't remember LP_DEAD items, but only because they're not
- * supposed to be left behind when it is done. (Cases where we bypass
- * index vacuuming will violate this optimistic assumption, but the
- * overall impact of that should be negligible.)
- *
- * HEAPTUPLE_LIVE tuples are naturally counted as live. This is also what
- * acquire_sample_rows() does.
- *
- * HEAPTUPLE_DELETE_IN_PROGRESS tuples are expected during concurrent
- * vacuum. We expect the deleting transaction to update the counters at
- * commit after we report our results, so count these tuples as live to
- * ensure the math works out. The assumption that the transaction will
- * commit and update the counters after we report is a bit shaky; but it
- * is what acquire_sample_rows() does, so we do the same to be consistent.
+ * HEAPTUPLE_DEAD are handled by the other heap_prune_record_*()
+ * subroutines. They don't count dead items like acquire_sample_rows()
+ * does, because we assume that all dead items will become LP_UNUSED
+ * before VACUUM finishes. This difference is only superficial. VACUUM
+ * effectively agrees with ANALYZE about DEAD items, in the end. VACUUM
+ * won't remember LP_DEAD items, but only because they're not supposed to
+ * be left behind when it is done. (Cases where we bypass index vacuuming
+ * will violate this optimistic assumption, but the overall impact of that
+ * should be negligible.)
*/
htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
- switch (status)
+ switch (prstate->htsv[offnum])
{
case HEAPTUPLE_LIVE:
@@ -1214,7 +1313,7 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
* Count it as live. Not only is this natural, but it's also what
* acquire_sample_rows() does.
*/
- presult->live_tuples++;
+ prstate->live_tuples++;
/*
* Is the tuple definitely visible to all transactions?
@@ -1224,14 +1323,13 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
* See SetHintBits for more info. Check that the tuple is hinted
* xmin-committed because of that.
*/
- if (prstate->all_visible_except_removable)
+ if (prstate->all_visible)
{
TransactionId xmin;
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible_except_removable = false;
- presult->all_visible = false;
+ prstate->all_visible = false;
break;
}
@@ -1248,8 +1346,7 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
Assert(prstate->pagefrz.cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->pagefrz.cutoffs->OldestXmin))
{
- prstate->all_visible_except_removable = false;
- presult->all_visible = false;
+ prstate->all_visible = false;
break;
}
@@ -1259,6 +1356,7 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
prstate->visibility_cutoff_xid = xmin;
}
break;
+
case HEAPTUPLE_RECENTLY_DEAD:
/*
@@ -1266,10 +1364,35 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
* relation. (We only remove items that are LP_DEAD from
* pruning.)
*/
- presult->recently_dead_tuples++;
- prstate->all_visible_except_removable = false;
- presult->all_visible = false;
+ prstate->recently_dead_tuples++;
+ prstate->all_visible = false;
+
+ /*
+ * This tuple may soon become DEAD. Update the hint field so that
+ * the page is reconsidered for pruning in future.
+ */
+ heap_prune_record_prunable(prstate,
+ HeapTupleHeaderGetUpdateXid(htup));
break;
+
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+ /*
+ * This an expected case during concurrent vacuum. Count such rows
+ * as live. As above, we assume the deleting transaction will
+ * commit and update the counters after we report.
+ */
+ prstate->live_tuples++;
+ prstate->all_visible = false;
+
+ /*
+ * This tuple may soon become DEAD. Update the hint field so that
+ * the page is reconsidered for pruning in future.
+ */
+ heap_prune_record_prunable(prstate,
+ HeapTupleHeaderGetUpdateXid(htup));
+ break;
+
case HEAPTUPLE_INSERT_IN_PROGRESS:
/*
@@ -1279,22 +1402,24 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible_except_removable = false;
- presult->all_visible = false;
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
+ prstate->all_visible = false;
/*
- * This an expected case during concurrent vacuum. Count such rows
- * as live. As above, we assume the deleting transaction will
- * commit and update the counters after we report.
+ * If we wanted to optimize for aborts, we might consider marking
+ * the page prunable when we see INSERT_IN_PROGRESS. But we
+ * don't. See related decisions about when to mark the page
+ * prunable in heapam.c.
*/
- presult->live_tuples++;
- prstate->all_visible_except_removable = false;
- presult->all_visible = false;
break;
+
default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+
+ /*
+ * DEAD tuples should've been passed to heap_prune_record_dead()
+ * or heap_prune_record_unused() instead.
+ */
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d",
+ prstate->htsv[offnum]);
break;
}
@@ -1302,12 +1427,14 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
if (prstate->actions & PRUNE_DO_TRY_FREEZE)
{
/* Tuple with storage -- consider need to freeze */
+ bool totally_frozen;
+
if ((heap_prepare_freeze_tuple(htup, &prstate->pagefrz,
- &prstate->frozen[presult->nfrozen],
+ &prstate->frozen[prstate->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- prstate->frozen[presult->nfrozen++].offset = offnum;
+ prstate->frozen[prstate->nfrozen++].offset = offnum;
}
/*
@@ -1316,9 +1443,50 @@ heap_prune_record_live_or_recently_dead(Page page, PruneState *prstate, OffsetNu
* definitely cannot be set all-frozen in the visibility map later on
*/
if (!totally_frozen)
- presult->set_all_frozen = false;
+ prstate->all_frozen = false;
}
+}
+/*
+ * Record line pointer that was already LP_DEAD and is left unchanged.
+ */
+static void
+heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
+{
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the soft
+ * assumption that any LP_DEAD items encountered here will become
+ * LP_UNUSED later on, before count_nondeletable_pages is reached. If we
+ * don't make this assumption then rel truncation will only happen every
+ * other VACUUM, at most. Besides, VACUUM must treat
+ * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
+ * handled (handled here, or handled later on).
+ *
+ * Similarly, don't unset all_visible until later, at the end of
+ * heap_page_prune_and_freeze(). This will allow us to attempt to freeze
+ * the page after pruning. As long as we unset it before updating the
+ * visibility map, this will be correct.
+ */
+
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+
+ prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+}
+
+static void
+heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum)
+{
+ /*
+ * A redirect line pointer doesn't count as a live tuple.
+ *
+ * If we leave a redirect line pointer in place, there will be another
+ * tuple on the page that it points to. We will do the bookkeeping for
+ * that separately. So we have nothing to do here, except remember that we
+ * processed this item.
+ */
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
}
/*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 04e86347a0b..92e02863e2d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1462,7 +1462,7 @@ lazy_scan_prune(LVRelState *vacrel,
&debug_cutoff, &debug_all_frozen))
Assert(false);
- Assert(presult.set_all_frozen == debug_all_frozen);
+ Assert(presult.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
debug_cutoff == presult.vm_conflict_horizon);
@@ -1517,7 +1517,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (presult.set_all_frozen)
+ if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1588,7 +1588,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && presult.all_visible &&
- presult.set_all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ef61e0277ee..9a20fef3a79 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -224,37 +224,37 @@ typedef struct PruneFreezeResult
int nnewlpdead; /* Number of newly LP_DEAD items */
int nfrozen; /* Number of tuples we froze */
- /*
- * The rest of the fields in PruneFreezeResult are only guaranteed to be
- * initialized if heap_page_prune_and_freeze() is passed
- * PRUNE_DO_TRY_FREEZE.
- */
- /* Whether or not the page should be set all-frozen in the VM */
- bool set_all_frozen;
-
- /* Number of live and recently dead tuples */
+ /* Number of live and recently dead tuples on the page, after pruning */
int live_tuples;
int recently_dead_tuples;
/*
- * Whether or not the page is truly all-visible after pruning. If there
- * are LP_DEAD items on the page which cannot be removed until vacuum's
- * second pass, this will be false.
+ * Whether or not the page makes rel truncation unsafe
+ *
+ * This is set to 'true', even if the page contains LP_DEAD items. VACUUM
+ * will remove them before attempting to truncate.
*/
- bool all_visible;
-
-
- /* Whether or not the page makes rel truncation unsafe */
bool hastup;
/*
- * If the page is all-visible and not all-frozen this is the oldest xid
- * that can see the page as all-visible. It is to be used as the snapshot
- * conflict horizon when emitting a XLOG_HEAP2_VISIBLE record.
+ * all_visible and all_frozen indicate if the all-visible and all-frozen
+ * bits in the visibility map can be set for this page, after pruning.
+ *
+ * vm_conflict_horizon is the newest xmin of live tuples on the page. The
+ * caller can use it as the conflict horizon, when setting the VM bits. It
+ * is only valid if we froze some tuples, and all_frozen is true.
+ *
+ * These are only set if the PRUNE_DO_TRY_FREEZE action flag is set.
*/
+ bool all_visible;
+ bool all_frozen;
TransactionId vm_conflict_horizon;
- int lpdead_items; /* includes existing LP_DEAD items */
+ /*
+ * LP_DEAD items on the page after pruning. Includes existing LP_DEAD
+ * items
+ */
+ int lpdead_items;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
--
2.40.1
On 29/03/2024 07:04, Melanie Plageman wrote:
On Thu, Mar 28, 2024 at 11:07:10AM -0400, Melanie Plageman wrote:
These comments could use another pass. I had added some extra
(probably redundant) content when I thought I was refactoring it a
certain way and then changed my mind.Attached is a diff with some ideas I had for a bit of code simplification.
Are you working on cleaning this patch up or should I pick it up?
Attached v9 is rebased over master. But, more importantly, I took
another pass at heap_prune_chain() and am pretty happy with what I came
up with. See 0021. I simplified the traversal logic and then grouped the
chain processing into three branches at the end. I find it much easier
to understand what we are doing for different types of HOT chains.
Ah yes, agreed, that's nicer.
The 'survivor' variable is a little confusing, especially here:
if (!survivor)
{
int i;
/*
* If no DEAD tuple was found, and the root is redirected, mark it as
* such.
*/
if ((i = ItemIdIsRedirected(rootlp)))
heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum);
/* the rest of tuples in the chain are normal, unchanged tuples */
for (; i < nchain; i++)
heap_prune_record_unchanged(dp, prstate, chainitems[i]);
}
You would think that "if(!survivor)", it means that there is no live
tuples on the chain, i.e. no survivors. But in fact it's the opposite;
it means that the are all live. Maybe call it 'ndeadchain' instead,
meaning the number of dead items in the chain.
I got rid of revisited. We can put it back, but I was thinking: we stash
all HOT tuples and then loop over them later, calling record_unchanged()
on the ones that aren't marked. But, if we have a lot of HOT tuples, is
this really that much better than just looping through all the offsets
and calling record_unchanged() on just the ones that aren't marked?
Well, it requires looping through all the offsets one more time, and
unless you have a lot of HOT tuples, most items would be marked already.
But maybe the overhead is negligible anyway.
I've done that in my version. While testing this, I found that only
on-access pruning needed this final loop calling record_unchanged() on
items not yet marked. I know we can't skip this final loop entirely in
the ON ACCESS case because it calls record_prunable(), but we could
consider moving that back out into heap_prune_chain()? Or what do you
think?
Hmm, why is that different with on-access pruning?
Here's another idea: In the first loop through the offsets, where we
gather the HTSV status of each item, also collect the offsets of all HOT
and non-HOT items to two separate arrays. Call heap_prune_chain() for
all the non-HOT items first, and then process any remaining HOT tuples
that haven't been marked yet.
I haven't finished updating all the comments, but I am really interested
to know what you think about heap_prune_chain() now.
Looks much better now, thanks!
--
Heikki Linnakangas
Neon (https://neon.tech)
On Fri, Mar 29, 2024 at 12:00 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 29/03/2024 07:04, Melanie Plageman wrote:
On Thu, Mar 28, 2024 at 11:07:10AM -0400, Melanie Plageman wrote:
These comments could use another pass. I had added some extra
(probably redundant) content when I thought I was refactoring it a
certain way and then changed my mind.Attached is a diff with some ideas I had for a bit of code simplification.
Are you working on cleaning this patch up or should I pick it up?
Attached v9 is rebased over master. But, more importantly, I took
another pass at heap_prune_chain() and am pretty happy with what I came
up with. See 0021. I simplified the traversal logic and then grouped the
chain processing into three branches at the end. I find it much easier
to understand what we are doing for different types of HOT chains.Ah yes, agreed, that's nicer.
The 'survivor' variable is a little confusing, especially here:
if (!survivor)
{
int i;/*
* If no DEAD tuple was found, and the root is redirected, mark it as
* such.
*/
if ((i = ItemIdIsRedirected(rootlp)))
heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum);/* the rest of tuples in the chain are normal, unchanged tuples */
for (; i < nchain; i++)
heap_prune_record_unchanged(dp, prstate, chainitems[i]);
}You would think that "if(!survivor)", it means that there is no live
tuples on the chain, i.e. no survivors. But in fact it's the opposite;
it means that the are all live. Maybe call it 'ndeadchain' instead,
meaning the number of dead items in the chain.
Makes sense.
I've done that in my version. While testing this, I found that only
on-access pruning needed this final loop calling record_unchanged() on
items not yet marked. I know we can't skip this final loop entirely in
the ON ACCESS case because it calls record_prunable(), but we could
consider moving that back out into heap_prune_chain()? Or what do you
think?Hmm, why is that different with on-access pruning?
Well, it is highly possible we just don't hit cases like this with
vacuum in our test suite (not that it is unreachable by vacuum).
It's just very easy to get in this situation with on-access pruning.
Imagine an UPDATE which caused the following chain:
RECENTLY_DEAD -> DELETE_IN_PROGRESS -> INSERT_IN_PROGRESS
It invokes heap_page_prune_and_freeze() (assume the page meets the
criteria for on-access pruning) and eventually enters
heap_prune_chain() with the first offset in this chain.
The first item is LP_NORMAL and the tuple is RECENTLY_DEAD, so the
survivor variable stays 0 and we record_unchanged() for that tuple and
return. The next two items are LP_NORMAL and the tuples are HOT
tuples, so we just return from the "fast path" at the top of
heap_prune_chain(). After invoking heap_prune_chain() for all of them,
the first offset is marked but the other two are not. Thus, we end up
having to record_unchanged() later. This kind of thing is a super
common case that we see all the time in queries in the regression test
suite.
Here's another idea: In the first loop through the offsets, where we
gather the HTSV status of each item, also collect the offsets of all HOT
and non-HOT items to two separate arrays. Call heap_prune_chain() for
all the non-HOT items first, and then process any remaining HOT tuples
that haven't been marked yet.
That's an interesting idea. I'll try it out and see how it works.
I haven't finished updating all the comments, but I am really interested
to know what you think about heap_prune_chain() now.Looks much better now, thanks!
I am currently doing chain traversal refactoring in heap_prune_chain()
on top of master as the first patch in the set.
- Melanie
On Fri, Mar 29, 2024 at 12:32:21PM -0400, Melanie Plageman wrote:
On Fri, Mar 29, 2024 at 12:00 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 29/03/2024 07:04, Melanie Plageman wrote:
On Thu, Mar 28, 2024 at 11:07:10AM -0400, Melanie Plageman wrote:
These comments could use another pass. I had added some extra
(probably redundant) content when I thought I was refactoring it a
certain way and then changed my mind.Attached is a diff with some ideas I had for a bit of code simplification.
Are you working on cleaning this patch up or should I pick it up?
Attached v9 is rebased over master. But, more importantly, I took
another pass at heap_prune_chain() and am pretty happy with what I came
up with. See 0021. I simplified the traversal logic and then grouped the
chain processing into three branches at the end. I find it much easier
to understand what we are doing for different types of HOT chains.Ah yes, agreed, that's nicer.
The 'survivor' variable is a little confusing, especially here:
if (!survivor)
{
int i;/*
* If no DEAD tuple was found, and the root is redirected, mark it as
* such.
*/
if ((i = ItemIdIsRedirected(rootlp)))
heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum);/* the rest of tuples in the chain are normal, unchanged tuples */
for (; i < nchain; i++)
heap_prune_record_unchanged(dp, prstate, chainitems[i]);
}You would think that "if(!survivor)", it means that there is no live
tuples on the chain, i.e. no survivors. But in fact it's the opposite;
it means that the are all live. Maybe call it 'ndeadchain' instead,
meaning the number of dead items in the chain.Makes sense.
I've done this in attached v10.
Here's another idea: In the first loop through the offsets, where we
gather the HTSV status of each item, also collect the offsets of all HOT
and non-HOT items to two separate arrays. Call heap_prune_chain() for
all the non-HOT items first, and then process any remaining HOT tuples
that haven't been marked yet.That's an interesting idea. I'll try it out and see how it works.
Attached v10 implements this method of dividing tuples into HOT and
non-HOT and processing the potential HOT chains first then processing
tuples not marked by calling heap_prune_chain().
I have applied the refactoring of heap_prune_chain() to master and then
built the other patches on top of that.
I discovered while writing this that LP_DEAD item offsets must be in
order in the deadoffsets array (the one that is used to populate
LVRelState->dead_items).
When I changed heap_page_prune_and_freeze() to partition the offsets
into HOT and non-HOT during the first loop through the item pointers
array (where we get tuple visibility information), we add dead item
offsets as they are encountered. So, they are no longer in order. I've
added a quicksort of the deadoffsets array to satisfy vacuum.
I think that we are actually successfully removing more RECENTLY_DEAD
HOT tuples than in master with heap_page_prune()'s new approach, and I
think it is correct; but let me know if I am missing something.
The early patches in the set include some additional comment cleanup as
well. 0001 is fairly polished. 0004 could use some variable renaming
(this patch partitions the tuples into HOT and not HOT and then
processes them). I was struggling with some of the names here
(chainmembers and chaincandidates is confusing).
The bulk of the combining of pruning and freezing is lumped into 0010.
I had planned to separate 0010 into 4 separate patches: 1 to execute
freezing in heap_prune_chain(), 1 for the freeze heuristic approximating
what is on master, and 1 for emitting a single record containing both
the pruning and freezing page modifications.
I ended up not doing this because I felt like the grouping of changes in
0007-0009 is off. As long as I still execute freezing in
lazy_scan_prune(), I have to share lots of state between
lazy_scan_prune() and heap_page_prune(). This meant I added a lot of
parameters to heap_page_prune() that later commits removed -- making the
later patches noisy and not so easy to understand.
I'm actually not sure what should go in what commit (either for review
clarity or for the actual final version).
But, I think we should probably focus on review of the code and not as
much how it is split up yet.
The final state of the code could definitely use more cleanup. I've been
staring at it for awhile, so I could use some thoughts/ideas about what
part to focus on improving.
- Melanie
Attachments:
v10-0001-Refactor-heap_prune_chain.patchtext/x-diff; charset=us-asciiDownload
From f0b9f507fcd8fb9e36856b359794eafdf91c9cd2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 28 Mar 2024 18:26:06 -0400
Subject: [PATCH v10 01/10] Refactor heap_prune_chain()
Keep track of the number of deleted tuples in PruneState and record this
information when recording a tuple dead, unused or redirected. This
removes a special case from the traversal and chain processing logic as
well as setting a precedent of recording the impact of prune actions in
the record functions themselves. This paradigm will be used in future
commits which move tracking of additional statistics on pruning actions
from lazy_scan_prune() to heap_prune_chain().
Simplify heap_prune_chain()'s chain traversal logic by handling each
case explicitly. That is, do not attempt to share code when processing
different types of chains. For each category of chain, process it
specifically and procedurally: first handling the root, then any
intervening tuples, and, finally, the end of the chain.
While we are at it, add a few new comments to heap_prune_chain()
clarifying some special cases involving RECENTLY_DEAD tuples.
ci-os-only:
---
src/backend/access/heap/pruneheap.c | 219 +++++++++++++++++-----------
1 file changed, 130 insertions(+), 89 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ef816c2fa9..b3047536f5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -51,22 +51,24 @@ typedef struct
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
+
+ int ndeleted; /* Number of tuples deleted from the page */
} PruneState;
/* Local functions */
static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
HeapTuple tup,
Buffer buffer);
-static int heap_prune_chain(Buffer buffer,
+static void heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
int8 *htsv,
PruneState *prstate);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum);
-static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
+ OffsetNumber offnum, OffsetNumber rdoffnum, bool was_normal);
+static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
+static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void page_verify_redirects(Page page);
@@ -242,6 +244,7 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.snapshotConflictHorizon = InvalidTransactionId;
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
+ prstate.ndeleted = 0;
/*
* presult->htsv is not initialized here because all ntuple spots in the
@@ -324,8 +327,7 @@ heap_page_prune(Relation relation, Buffer buffer,
continue;
/* Process this item or chain of items */
- presult->ndeleted += heap_prune_chain(buffer, offnum,
- presult->htsv, &prstate);
+ heap_prune_chain(buffer, offnum, presult->htsv, &prstate);
}
/* Clear the offset information once we have processed the given page. */
@@ -398,8 +400,9 @@ heap_page_prune(Relation relation, Buffer buffer,
END_CRIT_SECTION();
- /* Record number of newly-set-LP_DEAD items for caller */
+ /* Copy data back to 'presult' */
presult->nnewlpdead = prstate.ndead;
+ presult->ndeleted = prstate.ndeleted;
}
@@ -448,24 +451,25 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* to the redirected[] array (two entries per redirection); items to be set to
* LP_DEAD state are added to nowdead[]; and items to be set to LP_UNUSED
* state are added to nowunused[].
- *
- * Returns the number of tuples (to be) deleted from the page.
*/
-static int
+static void
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
int8 *htsv, PruneState *prstate)
{
- int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
TransactionId priorXmax = InvalidTransactionId;
ItemId rootlp;
HeapTupleHeader htup;
- OffsetNumber latestdead = InvalidOffsetNumber,
- maxoff = PageGetMaxOffsetNumber(dp),
+ OffsetNumber maxoff = PageGetMaxOffsetNumber(dp),
offnum;
OffsetNumber chainitems[MaxHeapTuplesPerPage];
- int nchain = 0,
- i;
+
+ /*
+ * After traversing the HOT chain, ndeadchain is the index in chainitems
+ * of the first live successor after the last dead item.
+ */
+ int ndeadchain = 0,
+ nchain = 0;
rootlp = PageGetItemId(dp, rootoffnum);
@@ -496,18 +500,29 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* Note that we might first arrive at a dead heap-only tuple
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
+ *
+ * Whether we arrive at the dead HOT tuple first here or while
+ * following a chain below affects whether preceding RECENTLY_DEAD
+ * tuples in the chain can be removed or not. Imagine that you
+ * have a chain with two tuples: RECENTLY_DEAD -> DEAD. If we
+ * reach the RECENTLY_DEAD tuple first, the chain-following logic
+ * will find the DEAD tuple and conclude that both tuples are in
+ * fact dead and can be removed. But if we reach the DEAD tuple
+ * at the end of the chain first, when we reach the RECENTLY_DEAD
+ * tuple later, we will not follow the chain because the DEAD
+ * TUPLE is already 'marked', and will not remove the
+ * RECENTLY_DEAD tuple. This is not a correctness issue, and the
+ * RECENTLY_DEAD tuple will be removed by a later VACUUM.
*/
if (htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
- heap_prune_record_unused(prstate, rootoffnum);
+ heap_prune_record_unused(prstate, rootoffnum, true);
HeapTupleHeaderAdvanceConflictHorizon(htup,
&prstate->snapshotConflictHorizon);
- ndeleted++;
}
- /* Nothing more to do */
- return ndeleted;
+ return;
}
}
@@ -518,8 +533,6 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
for (;;)
{
ItemId lp;
- bool tupdead,
- recent_dead;
/* Sanity check (pure paranoia) */
if (offnum < FirstOffsetNumber)
@@ -569,7 +582,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* the LP was already marked dead.
*/
if (unlikely(prstate->mark_unused_now))
- heap_prune_record_unused(prstate, offnum);
+ heap_prune_record_unused(prstate, offnum, false);
break;
}
@@ -592,20 +605,37 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
/*
* Check tuple's visibility status.
*/
- tupdead = recent_dead = false;
-
switch (htsv_get_valid_status(htsv[offnum]))
{
case HEAPTUPLE_DEAD:
- tupdead = true;
+
+ /*
+ * Remember the last DEAD tuple seen. We will advance past
+ * RECENTLY_DEAD tuples just in case there's a DEAD one after
+ * them; but we can't advance past anything else. We want to
+ * ensure that any line pointers for DEAD tuples are set
+ * LP_DEAD or LP_UNUSED. It is important that line pointers
+ * whose offsets are added to deadoffsets are in fact set
+ * LP_DEAD.
+ */
+ ndeadchain = nchain;
+ HeapTupleHeaderAdvanceConflictHorizon(htup,
+ &prstate->snapshotConflictHorizon);
break;
case HEAPTUPLE_RECENTLY_DEAD:
- recent_dead = true;
/*
* This tuple may soon become DEAD. Update the hint field so
* that the page is reconsidered for pruning in future.
+ *
+ * We don't need to advance the conflict horizon for
+ * RECENTLY_DEAD tuples, even if we are removing them. This is
+ * because we only remove RECENTLY_DEAD tuples if they precede
+ * a DEAD tuple, and the DEAD tuple must have been inserted by
+ * a newer transaction than the RECENTLY_DEAD tuple by virtue
+ * of being later in the chain. We will have advanced the
+ * conflict horizon for the DEAD tuple.
*/
heap_prune_record_prunable(prstate,
HeapTupleHeaderGetUpdateXid(htup));
@@ -619,7 +649,6 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
heap_prune_record_prunable(prstate,
HeapTupleHeaderGetUpdateXid(htup));
- break;
case HEAPTUPLE_LIVE:
case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -630,28 +659,12 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* But we don't. See related decisions about when to mark the
* page prunable in heapam.c.
*/
- break;
+ goto process_chains;
default:
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
-
- /*
- * Remember the last DEAD tuple seen. We will advance past
- * RECENTLY_DEAD tuples just in case there's a DEAD one after them;
- * but we can't advance past anything else. We have to make sure that
- * we don't miss any DEAD tuples, since DEAD tuples that still have
- * tuple storage after pruning will confuse VACUUM.
- */
- if (tupdead)
- {
- latestdead = offnum;
- HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
+ goto process_chains;
}
- else if (!recent_dead)
- break;
/*
* If the tuple is not HOT-updated, then we are at the end of this
@@ -672,57 +685,58 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
priorXmax = HeapTupleHeaderGetUpdateXid(htup);
}
- /*
- * If we found a DEAD tuple in the chain, adjust the HOT chain so that all
- * the DEAD tuples at the start of the chain are removed and the root line
- * pointer is appropriately redirected.
- */
- if (OffsetNumberIsValid(latestdead))
+ if (ItemIdIsRedirected(rootlp) && nchain < 2)
{
/*
- * Mark as unused each intermediate item that we are able to remove
- * from the chain.
- *
- * When the previous item is the last dead tuple seen, we are at the
- * right candidate for redirection.
+ * We found a redirect item that doesn't point to a valid follow-on
+ * item. This can happen if the loop in heap_page_prune caused us to
+ * visit the dead successor of a redirect item before visiting the
+ * redirect item. We can clean up by setting the redirect item to
+ * LP_DEAD state or LP_UNUSED if the caller indicated.
*/
- for (i = 1; (i < nchain) && (chainitems[i - 1] != latestdead); i++)
- {
- heap_prune_record_unused(prstate, chainitems[i]);
- ndeleted++;
- }
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, false);
+ return;
+ }
+process_chains:
+
+ if (ndeadchain == 0)
+ {
/*
- * If the root entry had been a normal tuple, we are deleting it, so
- * count it in the result. But changing a redirect (even to DEAD
- * state) doesn't count.
+ * If no DEAD tuple was found, the chain is entirely composed of
+ * normal, unchanged tuples, leave it alone.
*/
- if (ItemIdIsNormal(rootlp))
- ndeleted++;
-
+ }
+ else if (ndeadchain == nchain)
+ {
/*
* If the DEAD tuple is at the end of the chain, the entire chain is
- * dead and the root line pointer can be marked dead. Otherwise just
- * redirect the root to the correct chain member.
+ * dead and the root line pointer can be marked dead.
*/
- if (i >= nchain)
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
- else
- heap_prune_record_redirect(prstate, rootoffnum, chainitems[i]);
+ heap_prune_record_dead_or_unused(prstate, rootoffnum, ItemIdIsNormal(rootlp));
+
+ for (int i = 1; i < nchain; i++)
+ heap_prune_record_unused(prstate, chainitems[i], true);
}
- else if (nchain < 2 && ItemIdIsRedirected(rootlp))
+ else
{
/*
- * We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune caused us to
- * visit the dead successor of a redirect item before visiting the
- * redirect item. We can clean up by setting the redirect item to
- * DEAD state or LP_UNUSED if the caller indicated.
+ * If we found a DEAD tuple in the chain, adjust the HOT chain so that
+ * all the DEAD tuples at the start of the chain are removed and the
+ * root line pointer is appropriately redirected.
*/
- heap_prune_record_dead_or_unused(prstate, rootoffnum);
- }
+ heap_prune_record_redirect(prstate, rootoffnum, chainitems[ndeadchain],
+ ItemIdIsNormal(rootlp));
+
+ /*
+ * Mark as unused each intermediate item that we are able to remove
+ * from the chain.
+ */
+ for (int i = 1; i < ndeadchain; i++)
+ heap_prune_record_unused(prstate, chainitems[i], true);
- return ndeleted;
+ /* the rest of tuples in the chain are normal, unchanged tuples */
+ }
}
/* Record lowest soon-prunable XID */
@@ -742,7 +756,8 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
/* Record line pointer to be redirected */
static void
heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum)
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ bool was_normal)
{
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
@@ -752,17 +767,34 @@ heap_prune_record_redirect(PruneState *prstate,
prstate->marked[offnum] = true;
Assert(!prstate->marked[rdoffnum]);
prstate->marked[rdoffnum] = true;
+
+ /*
+ * If the root entry had been a normal tuple, we are deleting it, so count
+ * it in the result. But changing a redirect (even to DEAD state) doesn't
+ * count.
+ */
+ if (was_normal)
+ prstate->ndeleted++;
}
/* Record line pointer to be marked dead */
static void
-heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ bool was_normal)
{
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+
+ /*
+ * If the root entry had been a normal tuple, we are deleting it, so count
+ * it in the result. But changing a redirect (even to DEAD state) doesn't
+ * count.
+ */
+ if (was_normal)
+ prstate->ndeleted++;
}
/*
@@ -772,7 +804,8 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
* pointers LP_DEAD if mark_unused_now is true.
*/
static void
-heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ bool was_normal)
{
/*
* If the caller set mark_unused_now to true, we can remove dead tuples
@@ -781,20 +814,28 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum)
* likely.
*/
if (unlikely(prstate->mark_unused_now))
- heap_prune_record_unused(prstate, offnum);
+ heap_prune_record_unused(prstate, offnum, was_normal);
else
- heap_prune_record_dead(prstate, offnum);
+ heap_prune_record_dead(prstate, offnum, was_normal);
}
/* Record line pointer to be marked unused */
static void
-heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal)
{
Assert(prstate->nunused < MaxHeapTuplesPerPage);
prstate->nowunused[prstate->nunused] = offnum;
prstate->nunused++;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+
+ /*
+ * If the root entry had been a normal tuple, we are deleting it, so count
+ * it in the result. But changing a redirect (even to DEAD state) doesn't
+ * count.
+ */
+ if (was_normal)
+ prstate->ndeleted++;
}
--
2.40.1
v10-0002-heap_prune_chain-rename-dp-page.patchtext/x-diff; charset=us-asciiDownload
From fe060951d0122f2e05424836c18b4849fb0e67aa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 29 Mar 2024 19:01:51 -0400
Subject: [PATCH v10 02/10] heap_prune_chain() rename dp->page
---
src/backend/access/heap/pruneheap.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b3047536f5..c89b0f613d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -456,11 +456,11 @@ static void
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
int8 *htsv, PruneState *prstate)
{
- Page dp = (Page) BufferGetPage(buffer);
+ Page page = (Page) BufferGetPage(buffer);
TransactionId priorXmax = InvalidTransactionId;
ItemId rootlp;
HeapTupleHeader htup;
- OffsetNumber maxoff = PageGetMaxOffsetNumber(dp),
+ OffsetNumber maxoff = PageGetMaxOffsetNumber(page),
offnum;
OffsetNumber chainitems[MaxHeapTuplesPerPage];
@@ -471,7 +471,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
int ndeadchain = 0,
nchain = 0;
- rootlp = PageGetItemId(dp, rootoffnum);
+ rootlp = PageGetItemId(page, rootoffnum);
/*
* If it's a heap-only tuple, then it is not the start of a HOT chain.
@@ -479,7 +479,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (ItemIdIsNormal(rootlp))
{
Assert(htsv[rootoffnum] != -1);
- htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
+ htup = (HeapTupleHeader) PageGetItem(page, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
{
@@ -549,7 +549,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
if (prstate->marked[offnum])
break;
- lp = PageGetItemId(dp, offnum);
+ lp = PageGetItemId(page, offnum);
/* Unused item obviously isn't part of the chain */
if (!ItemIdIsUsed(lp))
@@ -588,7 +588,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
}
Assert(ItemIdIsNormal(lp));
- htup = (HeapTupleHeader) PageGetItem(dp, lp);
+ htup = (HeapTupleHeader) PageGetItem(page, lp);
/*
* Check the tuple XMIN against prior XMAX, if any
--
2.40.1
v10-0003-Mark-all-line-pointers-during-pruning.patchtext/x-diff; charset=us-asciiDownload
From 2c8c6c737259d3fff0026187f029987a57517b76 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 29 Mar 2024 15:22:18 -0400
Subject: [PATCH v10 03/10] Mark all line pointers during pruning
In anticipation of preparing to freeze and counting tuples which are not
candidates for pruning, this commit introduces heap_prune_record*()
functions for marking a line pointer which will not change. It also
introduces an assert to check that every offset was marked once.
---
src/backend/access/heap/pruneheap.c | 108 +++++++++++++++++++++++++---
1 file changed, 98 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c89b0f613d..6188c5b2f3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -69,6 +69,11 @@ static void heap_prune_record_redirect(PruneState *prstate,
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
+
+static void heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
+
static void page_verify_redirects(Page page);
@@ -330,6 +335,34 @@ heap_page_prune(Relation relation, Buffer buffer,
heap_prune_chain(buffer, offnum, presult->htsv, &prstate);
}
+ for (offnum = FirstOffsetNumber;
+ offnum <= maxoff;
+ offnum = OffsetNumberNext(offnum))
+ {
+ ItemId itemid = PageGetItemId(page, offnum);
+
+ if (ItemIdIsUsed(itemid) && !prstate.marked[offnum])
+ heap_prune_record_unchanged(page, &prstate, offnum);
+ }
+
+/* We should now have processed every tuple exactly once */
+#ifdef USE_ASSERT_CHECKING
+ for (offnum = FirstOffsetNumber;
+ offnum <= maxoff;
+ offnum = OffsetNumberNext(offnum))
+ {
+ ItemId itemid;
+
+ if (off_loc)
+ *off_loc = offnum;
+ itemid = PageGetItemId(page, offnum);
+ if (ItemIdIsUsed(itemid))
+ Assert(prstate.marked[offnum]);
+ else
+ Assert(!prstate.marked[offnum]);
+ }
+#endif
+
/* Clear the offset information once we have processed the given page. */
if (off_loc)
*off_loc = InvalidOffsetNumber;
@@ -583,6 +616,8 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum, false);
+ else
+ heap_prune_record_unchanged_lp_dead(prstate, offnum);
break;
}
@@ -702,10 +737,18 @@ process_chains:
if (ndeadchain == 0)
{
+ int i;
+
/*
- * If no DEAD tuple was found, the chain is entirely composed of
- * normal, unchanged tuples, leave it alone.
+ * If no DEAD tuple was found, and the root is redirected, mark it as
+ * such.
*/
+ if ((i = ItemIdIsRedirected(rootlp)))
+ heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum);
+
+ /* the rest of tuples in the chain are normal, unchanged tuples */
+ for (; i < nchain; i++)
+ heap_prune_record_unchanged(dp, prstate, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -736,6 +779,8 @@ process_chains:
heap_prune_record_unused(prstate, chainitems[i], true);
/* the rest of tuples in the chain are normal, unchanged tuples */
+ for (int i = ndeadchain; i < nchain; i++)
+ heap_prune_record_unchanged(dp, prstate, chainitems[i]);
}
}
@@ -759,14 +804,19 @@ heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
bool was_normal)
{
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+
+ /*
+ * Do not mark the redirect target here. It needs to be counted
+ * separately as an unchanged tuple.
+ */
+
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
prstate->redirected[prstate->nredirected * 2 + 1] = rdoffnum;
+
prstate->nredirected++;
- Assert(!prstate->marked[offnum]);
- prstate->marked[offnum] = true;
- Assert(!prstate->marked[rdoffnum]);
- prstate->marked[rdoffnum] = true;
/*
* If the root entry had been a normal tuple, we are deleting it, so count
@@ -782,11 +832,12 @@ static void
heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
bool was_normal)
{
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
- Assert(!prstate->marked[offnum]);
- prstate->marked[offnum] = true;
/*
* If the root entry had been a normal tuple, we are deleting it, so count
@@ -823,11 +874,12 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
static void
heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal)
{
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+
Assert(prstate->nunused < MaxHeapTuplesPerPage);
prstate->nowunused[prstate->nunused] = offnum;
prstate->nunused++;
- Assert(!prstate->marked[offnum]);
- prstate->marked[offnum] = true;
/*
* If the root entry had been a normal tuple, we are deleting it, so count
@@ -838,6 +890,42 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_norm
prstate->ndeleted++;
}
+/*
+ * Record LP_NORMAL line pointer that is left unchanged.
+ */
+static void
+heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum)
+{
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+}
+
+
+/*
+ * Record line pointer that was already LP_DEAD and is left unchanged.
+ */
+static void
+heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
+{
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+}
+
+
+static void
+heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum)
+{
+ /*
+ * A redirect line pointer doesn't count as a live tuple.
+ *
+ * If we leave a redirect line pointer in place, there will be another
+ * tuple on the page that it points to. We will do the bookkeeping for
+ * that separately. So we have nothing to do here, except remember that we
+ * processed this item.
+ */
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+}
/*
* Perform the actual page changes needed by heap_page_prune.
--
2.40.1
v10-0004-Handle-non-chain-tuples-outside-of-heap_prune_ch.patchtext/x-diff; charset=us-asciiDownload
From 1564aebed44f891b86d93b0edacb08deaf8937c6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 29 Mar 2024 16:05:41 -0400
Subject: [PATCH v10 04/10] Handle non-chain tuples outside of
heap_prune_chain()
Dead branches of aborted HOT chains or leftover LP_DEAD and LP_REDIRECT
line pointers can be handled outside of heap_prune_chain(). This
simplifies the logic in heap_prune_chain() as well as allowing us to
clean up more RECENTLY_DEAD -> DEAD chains.
To accomplish this efficiently, partition tuples into HOT and non-HOT
while first collecting visibility information for each tuple in
heap_page_prune(). Then call heap_prune_chain() only on potential chain
members. Then mop up the leftover HOT tuples afterwards.
---
src/backend/access/heap/pruneheap.c | 270 ++++++++++++++++------------
1 file changed, 154 insertions(+), 116 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6188c5b2f3..9e6cfbf9f9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,18 @@ typedef struct
OffsetNumber nowdead[MaxHeapTuplesPerPage];
OffsetNumber nowunused[MaxHeapTuplesPerPage];
+ /*
+ * Chain candidates contains indexes of all LP_NORMAL and LP_REDIRECT
+ * items. The first partition are the indexes of the LP_NORMAL and
+ * LP_REDIRECT items we know to be part of a chain. The second partition
+ * are the indexes of HOT tuples that may or may not be part of a HOT
+ * chain. Those which are part of a HOT chain will be visited and marked
+ * by heap_prune_chain() and the others will be processed afterward.
+ */
+ int nchain_members;
+ int nchain_candidates;
+ OffsetNumber chain_candidates[MaxHeapTuplesPerPage];
+
/*
* marked[i] is true if item i is entered in one of the above arrays.
*
@@ -250,6 +262,8 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
prstate.ndeleted = 0;
+ prstate.nchain_members = 0;
+ prstate.nchain_candidates = 0;
/*
* presult->htsv is not initialized here because all ntuple spots in the
@@ -288,17 +302,11 @@ heap_page_prune(Relation relation, Buffer buffer,
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup;
+ presult->htsv[offnum] = -1;
+
/* Nothing to do if slot doesn't contain a tuple */
- if (!ItemIdIsNormal(itemid))
- {
- presult->htsv[offnum] = -1;
+ if (!ItemIdIsUsed(itemid))
continue;
- }
-
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
- tup.t_data = htup;
- tup.t_len = ItemIdGetLength(itemid);
- ItemPointerSet(&(tup.t_self), blockno, offnum);
/*
* Set the offset number so that we can display it along with any
@@ -307,18 +315,66 @@ heap_page_prune(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
+ if (ItemIdIsDead(itemid))
+ {
+ /*
+ * If the caller set mark_unused_now true, we can set dead line
+ * pointers LP_UNUSED now.
+ */
+ if (unlikely(prstate.mark_unused_now))
+ heap_prune_record_unused(&prstate, offnum, false);
+ else
+ heap_prune_record_unchanged_lp_dead(&prstate, offnum);
+
+ continue;
+ }
+
+ if (ItemIdIsRedirected(itemid))
+ {
+ /* This is a chain member, so it goes in partition 1 */
+ OffsetNumber swap = prstate.chain_candidates[prstate.nchain_members];
+
+ prstate.chain_candidates[prstate.nchain_candidates++] = swap;
+ prstate.chain_candidates[prstate.nchain_members++] = offnum;
+
+ continue;
+ }
+
+ Assert(ItemIdIsNormal(itemid));
+
+ /*
+ * Given that we have an LP_NORMAL item, let's get its visibility
+ * status and then examine the tuple to decide whether to put it in
+ * partition 1 or 2 of chain_candidates.
+ */
+ htup = (HeapTupleHeader) PageGetItem(page, itemid);
+ tup.t_data = htup;
+ tup.t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&tup.t_self, blockno, offnum);
+
presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
+
+ if (!HeapTupleHeaderIsHeapOnly(htup))
+ {
+ /* All non-HOT tuples go in partition 1 */
+ OffsetNumber swap = prstate.chain_candidates[prstate.nchain_members];
+
+ prstate.chain_candidates[prstate.nchain_candidates++] = swap;
+ prstate.chain_candidates[prstate.nchain_members++] = offnum;
+
+ continue;
+ }
+
+ /* All LP_NORMAL HOT tuples go in partition 2 */
+ prstate.chain_candidates[prstate.nchain_candidates++] = offnum;
}
- /* Scan the page */
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
+ /* Process HOT chains */
+ for (int i = 0; i < prstate.nchain_members; i++)
{
- ItemId itemid;
+ offnum = prstate.chain_candidates[i];
- /* Ignore items already processed as part of an earlier chain */
if (prstate.marked[offnum])
continue;
@@ -326,23 +382,78 @@ heap_page_prune(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
- /* Nothing to do if slot is empty */
- itemid = PageGetItemId(page, offnum);
- if (!ItemIdIsUsed(itemid))
- continue;
-
/* Process this item or chain of items */
heap_prune_chain(buffer, offnum, presult->htsv, &prstate);
}
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
+ /*
+ * Check all HOT tuples to see if they have already been marked by
+ * heap_prune_chain() or if they still need to be processed. They will
+ * either be marked for removal or marked as unchanged.
+ */
+ for (int i = prstate.nchain_members; i < prstate.nchain_candidates; i++)
{
- ItemId itemid = PageGetItemId(page, offnum);
+ offnum = prstate.chain_candidates[i];
- if (ItemIdIsUsed(itemid) && !prstate.marked[offnum])
- heap_prune_record_unchanged(page, &prstate, offnum);
+ if (prstate.marked[offnum])
+ continue;
+
+ /* see preceding loop */
+ if (off_loc)
+ *off_loc = offnum;
+
+ if (presult->htsv[offnum] == HEAPTUPLE_DEAD)
+ {
+ ItemId itemid = PageGetItemId(page, offnum);
+ HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
+
+ /*
+ * If the tuple is DEAD and doesn't chain to anything else, mark
+ * it unused immediately. (If it does chain, we can only remove
+ * it as part of pruning its chain.)
+ *
+ * We need this primarily to handle aborted HOT updates, that is,
+ * XMIN_INVALID heap-only tuples. Those might not be linked to by
+ * any chain, since the parent tuple might be re-updated before
+ * any pruning occurs. So we have to be able to reap them
+ * separately from chain-pruning. (Note that
+ * HeapTupleHeaderIsHotUpdated will never return true for an
+ * XMIN_INVALID tuple, so this code will work even when there were
+ * sequential updates within the aborted transaction.)
+ *
+ * Note that we might first arrive at a dead heap-only tuple
+ * either above while following a chain or here. Whichever path
+ * gets there first will mark the tuple unused.
+ *
+ * Whether we arrive at the dead HOT tuple first here or while
+ * following a chain above affects whether preceding RECENTLY_DEAD
+ * tuples in the chain can be removed or not. Imagine that you
+ * have a chain with two tuples: RECENTLY_DEAD -> DEAD. If we
+ * reach the RECENTLY_DEAD tuple first, the chain-following logic
+ * will find the DEAD tuple and conclude that both tuples are in
+ * fact dead and can be removed. But if we reach the DEAD tuple
+ * at the end of the chain first, when we reach the RECENTLY_DEAD
+ * tuple later, we will not follow the chain because the DEAD
+ * TUPLE is already 'marked', and will not remove the
+ * RECENTLY_DEAD tuple. This is not a correctness issue, and the
+ * RECENTLY_DEAD tuple will be removed by a later VACUUM.
+ */
+ if (!HeapTupleHeaderIsHotUpdated(htup))
+ {
+ HeapTupleHeaderAdvanceConflictHorizon(htup,
+ &prstate.snapshotConflictHorizon);
+ heap_prune_record_unused(&prstate, offnum, true);
+ continue;
+ }
+ }
+
+ /*
+ * HOT tuple is not DEAD or has been HOT-updated. If it is a DEAD,
+ * HOT-updated member of a chain, it should have been already been
+ * marked by heap_prune_chain() and heap_prune_record_unchanged() will
+ * return immediately.
+ */
+ heap_prune_record_unchanged(page, &prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -351,12 +462,10 @@ heap_page_prune(Relation relation, Buffer buffer,
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
- ItemId itemid;
-
if (off_loc)
*off_loc = offnum;
- itemid = PageGetItemId(page, offnum);
- if (ItemIdIsUsed(itemid))
+
+ if (ItemIdIsUsed(PageGetItemId(page, offnum)))
Assert(prstate.marked[offnum]);
else
Assert(!prstate.marked[offnum]);
@@ -490,11 +599,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
int8 *htsv, PruneState *prstate)
{
Page page = (Page) BufferGetPage(buffer);
- TransactionId priorXmax = InvalidTransactionId;
- ItemId rootlp;
- HeapTupleHeader htup;
- OffsetNumber maxoff = PageGetMaxOffsetNumber(page),
- offnum;
+ ItemId rootlp = PageGetItemId(page, rootoffnum);
OffsetNumber chainitems[MaxHeapTuplesPerPage];
/*
@@ -504,67 +609,14 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
int ndeadchain = 0,
nchain = 0;
- rootlp = PageGetItemId(page, rootoffnum);
-
- /*
- * If it's a heap-only tuple, then it is not the start of a HOT chain.
- */
- if (ItemIdIsNormal(rootlp))
- {
- Assert(htsv[rootoffnum] != -1);
- htup = (HeapTupleHeader) PageGetItem(page, rootlp);
-
- if (HeapTupleHeaderIsHeapOnly(htup))
- {
- /*
- * If the tuple is DEAD and doesn't chain to anything else, mark
- * it unused immediately. (If it does chain, we can only remove
- * it as part of pruning its chain.)
- *
- * We need this primarily to handle aborted HOT updates, that is,
- * XMIN_INVALID heap-only tuples. Those might not be linked to by
- * any chain, since the parent tuple might be re-updated before
- * any pruning occurs. So we have to be able to reap them
- * separately from chain-pruning. (Note that
- * HeapTupleHeaderIsHotUpdated will never return true for an
- * XMIN_INVALID tuple, so this code will work even when there were
- * sequential updates within the aborted transaction.)
- *
- * Note that we might first arrive at a dead heap-only tuple
- * either here or while following a chain below. Whichever path
- * gets there first will mark the tuple unused.
- *
- * Whether we arrive at the dead HOT tuple first here or while
- * following a chain below affects whether preceding RECENTLY_DEAD
- * tuples in the chain can be removed or not. Imagine that you
- * have a chain with two tuples: RECENTLY_DEAD -> DEAD. If we
- * reach the RECENTLY_DEAD tuple first, the chain-following logic
- * will find the DEAD tuple and conclude that both tuples are in
- * fact dead and can be removed. But if we reach the DEAD tuple
- * at the end of the chain first, when we reach the RECENTLY_DEAD
- * tuple later, we will not follow the chain because the DEAD
- * TUPLE is already 'marked', and will not remove the
- * RECENTLY_DEAD tuple. This is not a correctness issue, and the
- * RECENTLY_DEAD tuple will be removed by a later VACUUM.
- */
- if (htsv[rootoffnum] == HEAPTUPLE_DEAD &&
- !HeapTupleHeaderIsHotUpdated(htup))
- {
- heap_prune_record_unused(prstate, rootoffnum, true);
- HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
- }
-
- return;
- }
- }
+ OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+ TransactionId priorXmax = InvalidTransactionId;
/* Start from the root tuple */
- offnum = rootoffnum;
-
/* while not end of the chain */
- for (;;)
+ for (OffsetNumber offnum = rootoffnum;;)
{
+ HeapTupleHeader htup;
ItemId lp;
/* Sanity check (pure paranoia) */
@@ -584,8 +636,13 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
lp = PageGetItemId(page, offnum);
- /* Unused item obviously isn't part of the chain */
- if (!ItemIdIsUsed(lp))
+ /*
+ * Unused item obviously isn't part of the chain. Likewise, a dead
+ * line pointer can't be part of the chain. (We already eliminated the
+ * case of dead root tuple outside this function.). MFIXME should dead
+ * item check be an assert?
+ */
+ if (!ItemIdIsUsed(lp) || ItemIdIsDead(lp))
break;
/*
@@ -602,27 +659,8 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
continue;
}
- /*
- * Likewise, a dead line pointer can't be part of the chain. (We
- * already eliminated the case of dead root tuple outside this
- * function.)
- */
- if (ItemIdIsDead(lp))
- {
- /*
- * If the caller set mark_unused_now true, we can set dead line
- * pointers LP_UNUSED now. We don't increment ndeleted here since
- * the LP was already marked dead.
- */
- if (unlikely(prstate->mark_unused_now))
- heap_prune_record_unused(prstate, offnum, false);
- else
- heap_prune_record_unchanged_lp_dead(prstate, offnum);
-
- break;
- }
-
Assert(ItemIdIsNormal(lp));
+
htup = (HeapTupleHeader) PageGetItem(page, lp);
/*
@@ -748,7 +786,7 @@ process_chains:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (; i < nchain; i++)
- heap_prune_record_unchanged(dp, prstate, chainitems[i]);
+ heap_prune_record_unchanged(page, prstate, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -780,7 +818,7 @@ process_chains:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (int i = ndeadchain; i < nchain; i++)
- heap_prune_record_unchanged(dp, prstate, chainitems[i]);
+ heap_prune_record_unchanged(page, prstate, chainitems[i]);
}
}
--
2.40.1
v10-0005-Invoke-heap_prune_record_prunable-during-record-.patchtext/x-diff; charset=us-asciiDownload
From 1c3b831bf7eaf1c76fe8b7d6cc763721655e6f0d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 29 Mar 2024 19:29:46 -0400
Subject: [PATCH v10 05/10] Invoke heap_prune_record_prunable() during record
unchanged
Recording the lowest soon-to-be prunable xid is one of the actions we
take for item pointers we will not be changing during pruning. Move this
to the recently introduced heap_prune_record_unchanged() function so
that we group all actions we take for unchanged LP_NORMAL line pointers
together.
---
src/backend/access/heap/pruneheap.c | 72 ++++++++++++++++++-----------
1 file changed, 44 insertions(+), 28 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9e6cfbf9f9..159f847689 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -82,7 +82,7 @@ static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, boo
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
@@ -453,7 +453,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* marked by heap_prune_chain() and heap_prune_record_unchanged() will
* return immediately.
*/
- heap_prune_record_unchanged(page, &prstate, offnum);
+ heap_prune_record_unchanged(page, presult->htsv, &prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -675,9 +675,6 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
chainitems[nchain++] = offnum;
- /*
- * Check tuple's visibility status.
- */
switch (htsv_get_valid_status(htsv[offnum]))
{
case HEAPTUPLE_DEAD:
@@ -699,9 +696,6 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
case HEAPTUPLE_RECENTLY_DEAD:
/*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- *
* We don't need to advance the conflict horizon for
* RECENTLY_DEAD tuples, even if we are removing them. This is
* because we only remove RECENTLY_DEAD tuples if they precede
@@ -710,28 +704,11 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
* of being later in the chain. We will have advanced the
* conflict horizon for the DEAD tuple.
*/
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- */
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
-
case HEAPTUPLE_LIVE:
case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * If we wanted to optimize for aborts, we might consider
- * marking the page prunable when we see INSERT_IN_PROGRESS.
- * But we don't. See related decisions about when to mark the
- * page prunable in heapam.c.
- */
goto process_chains;
default:
@@ -786,7 +763,7 @@ process_chains:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (; i < nchain; i++)
- heap_prune_record_unchanged(page, prstate, chainitems[i]);
+ heap_prune_record_unchanged(page, htsv, prstate, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -818,7 +795,7 @@ process_chains:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (int i = ndeadchain; i < nchain; i++)
- heap_prune_record_unchanged(page, prstate, chainitems[i]);
+ heap_prune_record_unchanged(page, htsv, prstate, chainitems[i]);
}
}
@@ -932,10 +909,49 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_norm
* Record LP_NORMAL line pointer that is left unchanged.
*/
static void
-heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum)
{
+ HeapTupleHeader htup;
+
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+
+ switch (htsv[offnum])
+ {
+ case HEAPTUPLE_LIVE:
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * If we wanted to optimize for aborts, we might consider marking
+ * the page prunable when we see INSERT_IN_PROGRESS. But we
+ * don't. See related decisions about when to mark the page
+ * prunable in heapam.c.
+ */
+ break;
+
+ case HEAPTUPLE_RECENTLY_DEAD:
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+ htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+
+ /*
+ * This tuple may soon become DEAD. Update the hint field so that
+ * the page is reconsidered for pruning in future.
+ */
+ heap_prune_record_prunable(prstate,
+ HeapTupleHeaderGetUpdateXid(htup));
+ break;
+
+
+ default:
+
+ /*
+ * DEAD tuples should've been passed to heap_prune_record_dead()
+ * or heap_prune_record_unused() instead.
+ */
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d", htsv[offnum]);
+ break;
+ }
}
--
2.40.1
v10-0006-Introduce-PRUNE_DO_-actions.patchtext/x-diff; charset=us-asciiDownload
From 81b0875be30bd83452aa78756355fa4d3b481d7b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 29 Mar 2024 19:43:09 -0400
Subject: [PATCH v10 06/10] Introduce PRUNE_DO_* actions
We will eventually take additional actions in heap_page_prune() at the
discretion of the caller. For now, introduce these PRUNE_DO_* macros and
turn mark_unused_now, a paramter to heap_page_prune(), into a PRUNE_DO_
action.
---
src/backend/access/heap/pruneheap.c | 51 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 11 ++++--
src/include/access/heapam.h | 13 ++++++-
3 files changed, 46 insertions(+), 29 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 159f847689..20d5ad7b80 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -29,10 +29,11 @@
/* Working data for heap_page_prune and subroutines */
typedef struct
{
+ /* PRUNE_DO_* arguments */
+ uint8 actions;
+
/* tuple visibility test, initialized for the relation */
GlobalVisState *vistest;
- /* whether or not dead items can be set LP_UNUSED during pruning */
- bool mark_unused_now;
TransactionId new_prune_xid; /* new prune hint value for page */
TransactionId snapshotConflictHorizon; /* latest xid removed */
@@ -167,11 +168,12 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneResult presult;
/*
- * For now, pass mark_unused_now as false regardless of whether or
- * not the relation has indexes, since we cannot safely determine
- * that during on-access pruning with the current implementation.
+ * For now, do not set PRUNE_DO_MARK_UNUSED_NOW regardless of
+ * whether or not the relation has indexes, since we cannot safely
+ * determine that during on-access pruning with the current
+ * implementation.
*/
- heap_page_prune(relation, buffer, vistest, false,
+ heap_page_prune(relation, buffer, vistest, 0,
&presult, PRUNE_ON_ACCESS, NULL);
/*
@@ -216,8 +218,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
- * mark_unused_now indicates whether or not dead items can be set LP_UNUSED
- * during pruning.
+ * actions are the pruning actions that heap_page_prune() should take.
*
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
@@ -232,7 +233,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
void
heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
- bool mark_unused_now,
+ uint8 actions,
PruneResult *presult,
PruneReason reason,
OffsetNumber *off_loc)
@@ -257,7 +258,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
prstate.new_prune_xid = InvalidTransactionId;
prstate.vistest = vistest;
- prstate.mark_unused_now = mark_unused_now;
+ prstate.actions = actions;
prstate.snapshotConflictHorizon = InvalidTransactionId;
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
@@ -318,10 +319,10 @@ heap_page_prune(Relation relation, Buffer buffer,
if (ItemIdIsDead(itemid))
{
/*
- * If the caller set mark_unused_now true, we can set dead line
- * pointers LP_UNUSED now.
+ * If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can set dead
+ * line pointers LP_UNUSED now.
*/
- if (unlikely(prstate.mark_unused_now))
+ if (unlikely(prstate.actions & PRUNE_DO_MARK_UNUSED_NOW))
heap_prune_record_unused(&prstate, offnum, false);
else
heap_prune_record_unchanged_lp_dead(&prstate, offnum);
@@ -864,22 +865,22 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
}
/*
- * Depending on whether or not the caller set mark_unused_now to true, record that a
- * line pointer should be marked LP_DEAD or LP_UNUSED. There are other cases in
- * which we will mark line pointers LP_UNUSED, but we will not mark line
- * pointers LP_DEAD if mark_unused_now is true.
+ * Depending on whether or not the caller set PRUNE_DO_MARK_UNUSED_NOW, record
+ * that a line pointer should be marked LP_DEAD or LP_UNUSED. There are other
+ * cases in which we will mark line pointers LP_UNUSED, but we will not mark
+ * line pointers LP_DEAD if PRUNE_DO_MARK_UNUSED_NOW is set.
*/
static void
heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
bool was_normal)
{
/*
- * If the caller set mark_unused_now to true, we can remove dead tuples
+ * If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can remove dead tuples
* during pruning instead of marking their line pointers dead. Set this
* tuple's line pointer LP_UNUSED. We hint that this option is less
* likely.
*/
- if (unlikely(prstate->mark_unused_now))
+ if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
heap_prune_record_unused(prstate, offnum, was_normal);
else
heap_prune_record_dead(prstate, offnum, was_normal);
@@ -1113,12 +1114,12 @@ heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
else
{
/*
- * When heap_page_prune() was called, mark_unused_now may have
- * been passed as true, which allows would-be LP_DEAD items to be
- * made LP_UNUSED instead. This is only possible if the relation
- * has no indexes. If there are any dead items, then
- * mark_unused_now was not true and every item being marked
- * LP_UNUSED must refer to a heap-only tuple.
+ * When heap_page_prune() was called, PRUNE_DO_MARK_UNUSED_NOW may
+ * have been set, which allows would-be LP_DEAD items to be made
+ * LP_UNUSED instead. This is only possible if the relation has
+ * no indexes. If there are any dead items, then
+ * PRUNE_DO_MARK_UNUSED_NOW was not set and every item being
+ * marked LP_UNUSED must refer to a heap-only tuple.
*/
if (ndead > 0)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ba5b7083a3..880a218cb4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1425,6 +1425,7 @@ lazy_scan_prune(LVRelState *vacrel,
bool all_visible,
all_frozen;
TransactionId visibility_cutoff_xid;
+ uint8 actions = 0;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1458,10 +1459,14 @@ lazy_scan_prune(LVRelState *vacrel,
* that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
- * items LP_UNUSED, so mark_unused_now should be true if no indexes and
- * false otherwise.
+ * items LP_UNUSED, so PRUNE_DO_MARK_UNUSED_NOW should be set if no
+ * indexes and unset otherwise.
*/
- heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+
+ if (vacrel->nindexes == 0)
+ actions |= PRUNE_DO_MARK_UNUSED_NOW;
+
+ heap_page_prune(rel, buf, vacrel->vistest, actions,
&presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
/*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f112245373..b5c711e790 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -191,6 +191,17 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
+/*
+ * Actions that can be taken during pruning and freezing. By default, we will
+ * at least attempt regular pruning.
+ */
+
+/*
+ * PRUNE_DO_MARK_UNUSED_NOW indicates whether or not dead items can be set
+ * LP_UNUSED during pruning.
+ */
+#define PRUNE_DO_MARK_UNUSED_NOW (1 << 1)
+
/*
* Per-page state returned from pruning
*/
@@ -331,7 +342,7 @@ struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
- bool mark_unused_now,
+ uint8 actions,
PruneResult *presult,
PruneReason reason,
OffsetNumber *off_loc);
--
2.40.1
v10-0007-Prepare-freeze-tuples-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From caf2b10b524b79e71c9bf96ea626df7da8daaf0b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 29 Mar 2024 21:22:14 -0400
Subject: [PATCH v10 07/10] Prepare freeze tuples in heap_page_prune()
In order to combine the freeze and prune records, we must determine
which tuples are freezable before actually executing pruning. All of the
page modifications should be made in the same critical section along
with emitting the combined WAL. Determine whether or not tuples should
or must be frozen and whether or not the page will be all frozen as a
consequence during pruning.
---
src/backend/access/heap/heapam.c | 6 +--
src/backend/access/heap/pruneheap.c | 64 +++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 67 ++++++++++------------------
src/include/access/heapam.h | 25 ++++++++++-
4 files changed, 103 insertions(+), 59 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2f6527df0d..f8fddce03b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6475,10 +6475,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
*/
bool
heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen)
{
+ const struct VacuumCutoffs *cutoffs = pagefrz->cutoffs;
bool xmin_already_frozen = false,
xmax_already_frozen = false;
bool freeze_xmin = false,
@@ -6889,9 +6889,9 @@ heap_freeze_tuple(HeapTupleHeader tuple,
pagefrz.FreezePageRelminMxid = MultiXactCutoff;
pagefrz.NoFreezePageRelfrozenXid = FreezeLimit;
pagefrz.NoFreezePageRelminMxid = MultiXactCutoff;
+ pagefrz.cutoffs = &cutoffs;
- do_freeze = heap_prepare_freeze_tuple(tuple, &cutoffs,
- &pagefrz, &frz, &totally_frozen);
+ do_freeze = heap_prepare_freeze_tuple(tuple, &pagefrz, &frz, &totally_frozen);
/*
* Note that because this is not a WAL-logged operation, we don't need to
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 20d5ad7b80..be06699523 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -17,6 +17,7 @@
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
+#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
@@ -75,7 +76,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
static void heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
int8 *htsv,
- PruneState *prstate);
+ PruneState *prstate, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum, bool was_normal);
@@ -83,7 +84,7 @@ static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, boo
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate, PruneResult *presult, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
@@ -167,6 +168,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
PruneResult presult;
+ presult.pagefrz.freeze_required = false;
+ presult.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ presult.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ presult.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ presult.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ presult.pagefrz.cutoffs = NULL;
+
/*
* For now, do not set PRUNE_DO_MARK_UNUSED_NOW regardless of
* whether or not the relation has indexes, since we cannot safely
@@ -266,6 +274,16 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.nchain_members = 0;
prstate.nchain_candidates = 0;
+ /*
+ * If we will prepare to freeze tuples, consider that it might be possible
+ * to set the page all-frozen in the visibility map.
+ */
+ if (prstate.actions & PRUNE_DO_TRY_FREEZE)
+ presult->all_frozen = true;
+ else
+ presult->all_frozen = false;
+
+
/*
* presult->htsv is not initialized here because all ntuple spots in the
* array will be set either to a valid HTSV_Result value or -1.
@@ -273,6 +291,8 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ presult->nfrozen = 0;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -384,7 +404,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*off_loc = offnum;
/* Process this item or chain of items */
- heap_prune_chain(buffer, offnum, presult->htsv, &prstate);
+ heap_prune_chain(buffer, offnum, presult->htsv, &prstate, presult);
}
/*
@@ -454,7 +474,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* marked by heap_prune_chain() and heap_prune_record_unchanged() will
* return immediately.
*/
- heap_prune_record_unchanged(page, presult->htsv, &prstate, offnum);
+ heap_prune_record_unchanged(page, presult->htsv, &prstate, presult, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -597,7 +617,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static void
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- int8 *htsv, PruneState *prstate)
+ int8 *htsv, PruneState *prstate, PruneResult *presult)
{
Page page = (Page) BufferGetPage(buffer);
ItemId rootlp = PageGetItemId(page, rootoffnum);
@@ -764,7 +784,7 @@ process_chains:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (; i < nchain; i++)
- heap_prune_record_unchanged(page, htsv, prstate, chainitems[i]);
+ heap_prune_record_unchanged(page, htsv, prstate, presult, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -796,7 +816,7 @@ process_chains:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (int i = ndeadchain; i < nchain; i++)
- heap_prune_record_unchanged(page, htsv, prstate, chainitems[i]);
+ heap_prune_record_unchanged(page, htsv, prstate, presult, chainitems[i]);
}
}
@@ -910,9 +930,10 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_norm
* Record LP_NORMAL line pointer that is left unchanged.
*/
static void
-heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate,
+ PruneResult *presult, OffsetNumber offnum)
{
- HeapTupleHeader htup;
+ HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
@@ -933,8 +954,6 @@ heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate, OffsetNu
case HEAPTUPLE_RECENTLY_DEAD:
case HEAPTUPLE_DELETE_IN_PROGRESS:
- htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
-
/*
* This tuple may soon become DEAD. Update the hint field so that
* the page is reconsidered for pruning in future.
@@ -953,6 +972,29 @@ heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate, OffsetNu
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d", htsv[offnum]);
break;
}
+
+ /* Consider freezing any normal tuples which will not be removed */
+ if (prstate->actions & PRUNE_DO_TRY_FREEZE)
+ {
+ /* Tuple with storage -- consider need to freeze */
+ bool totally_frozen;
+
+ if ((heap_prepare_freeze_tuple(htup, &presult->pagefrz,
+ &presult->frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ presult->frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to
+ * become totally frozen (according to its freeze plan), then the page
+ * definitely cannot be set all-frozen in the visibility map later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+ }
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 880a218cb4..679c6a866e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1416,19 +1416,15 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff;
ItemId itemid;
PruneResult presult;
- int tuples_frozen,
- lpdead_items,
+ int lpdead_items,
live_tuples,
recently_dead_tuples;
- HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_visible,
- all_frozen;
+ bool all_visible;
TransactionId visibility_cutoff_xid;
uint8 actions = 0;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1440,12 +1436,12 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff = PageGetMaxOffsetNumber(page);
/* Initialize (or reset) page-level state */
- pagefrz.freeze_required = false;
- pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
- pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
- tuples_frozen = 0;
+ presult.pagefrz.freeze_required = false;
+ presult.pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
+ presult.pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
+ presult.pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
+ presult.pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ presult.pagefrz.cutoffs = &vacrel->cutoffs;
lpdead_items = 0;
live_tuples = 0;
recently_dead_tuples = 0;
@@ -1462,6 +1458,7 @@ lazy_scan_prune(LVRelState *vacrel,
* items LP_UNUSED, so PRUNE_DO_MARK_UNUSED_NOW should be set if no
* indexes and unset otherwise.
*/
+ actions |= PRUNE_DO_TRY_FREEZE;
if (vacrel->nindexes == 0)
actions |= PRUNE_DO_MARK_UNUSED_NOW;
@@ -1479,7 +1476,6 @@ lazy_scan_prune(LVRelState *vacrel,
* Also keep track of the visibility cutoff xid for recovery conflicts.
*/
all_visible = true;
- all_frozen = true;
visibility_cutoff_xid = InvalidTransactionId;
/*
@@ -1491,7 +1487,6 @@ lazy_scan_prune(LVRelState *vacrel,
offnum = OffsetNumberNext(offnum))
{
HeapTupleHeader htup;
- bool totally_frozen;
/*
* Set the offset number so that we can display it along with any
@@ -1638,22 +1633,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
hastup = true; /* page makes rel truncation unsafe */
-
- /* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
- &frozen[tuples_frozen], &totally_frozen))
- {
- /* Save prepared freeze plan for later */
- frozen[tuples_frozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the page
- * definitely cannot be set all-frozen in the visibility map later on
- */
- if (!totally_frozen)
- all_frozen = false;
}
/*
@@ -1670,18 +1649,18 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (all_visible && all_frozen &&
+ if (presult.pagefrz.freeze_required || presult.nfrozen == 0 ||
+ (all_visible && presult.all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
* We're freezing the page. Our final NewRelfrozenXid doesn't need to
* be affected by the XIDs that are just about to be frozen anyway.
*/
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
+ vacrel->NewRelfrozenXid = presult.pagefrz.FreezePageRelfrozenXid;
+ vacrel->NewRelminMxid = presult.pagefrz.FreezePageRelminMxid;
- if (tuples_frozen == 0)
+ if (presult.nfrozen == 0)
{
/*
* We have no freeze plans to execute, so there's no added cost
@@ -1709,7 +1688,7 @@ lazy_scan_prune(LVRelState *vacrel,
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (all_visible && all_frozen)
+ if (all_visible && presult.all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
snapshotConflictHorizon = visibility_cutoff_xid;
@@ -1725,7 +1704,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(vacrel->rel, buf,
snapshotConflictHorizon,
- frozen, tuples_frozen);
+ presult.frozen, presult.nfrozen);
}
}
else
@@ -1734,10 +1713,10 @@ lazy_scan_prune(LVRelState *vacrel,
* Page requires "no freeze" processing. It might be set all-visible
* in the visibility map, but it can never be set all-frozen.
*/
- vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- all_frozen = false;
- tuples_frozen = 0; /* avoid miscounts in instrumentation */
+ vacrel->NewRelfrozenXid = presult.pagefrz.NoFreezePageRelfrozenXid;
+ vacrel->NewRelminMxid = presult.pagefrz.NoFreezePageRelminMxid;
+ presult.all_frozen = false;
+ presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1801,7 +1780,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += tuples_frozen;
+ vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
vacrel->live_tuples += live_tuples;
vacrel->recently_dead_tuples += recently_dead_tuples;
@@ -1824,7 +1803,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (all_frozen)
+ if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(visibility_cutoff_xid));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1895,7 +1874,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && all_visible &&
- all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b5c711e790..a7f5f19916 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -189,6 +189,7 @@ typedef struct HeapPageFreeze
TransactionId NoFreezePageRelfrozenXid;
MultiXactId NoFreezePageRelminMxid;
+ struct VacuumCutoffs *cutoffs;
} HeapPageFreeze;
/*
@@ -202,6 +203,15 @@ typedef struct HeapPageFreeze
*/
#define PRUNE_DO_MARK_UNUSED_NOW (1 << 1)
+/*
+ * Prepare to freeze if advantageous or required and try to advance
+ * relfrozenxid and relminmxid. To attempt freezing, we will need to determine
+ * if the page is all frozen. So, if this action is set, we will also inform
+ * the caller if the page is all-visible and/or all-frozen and calculate a
+ * snapshot conflict horizon for updating the visibility map.
+ */
+#define PRUNE_DO_TRY_FREEZE (1 << 2)
+
/*
* Per-page state returned from pruning
*/
@@ -220,6 +230,20 @@ typedef struct PruneResult
* 1. Otherwise every access would need to subtract 1.
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * Prepare to freeze in heap_page_prune(). lazy_scan_prune() will use the
+ * returned freeze plans to execute freezing.
+ */
+ HeapPageFreeze pagefrz;
+
+ /*
+ * Whether or not the page can be set all-frozen in the visibility map.
+ * This is only set if the PRUNE_DO_TRY_FREEZE action flag is set.
+ */
+ bool all_frozen;
+ int nfrozen;
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneResult;
/* 'reason' codes for heap_page_prune() */
@@ -314,7 +338,6 @@ extern TM_Result heap_lock_tuple(Relation relation, ItemPointer tid,
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
--
2.40.1
v10-0008-Set-hastup-in-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From b2bd2dcb71fe017dd89d00293a19cd06275ccb26 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 29 Mar 2024 21:36:37 -0400
Subject: [PATCH v10 08/10] Set hastup in heap_page_prune
lazy_scan_prune() loops through the line pointers and tuple visibility
information for each tuple on a page, setting hastup to true if there
are any LP_REDIRECT line pointers or tuples with storage which will not
be removed. We want to remove this extra loop from lazy_scan_prune(),
and we know about non-removable tuples during heap_page_prune() anyway.
Set hastup when recording LP_REDIRECT line pointers in
heap_prune_chain() and when LP_NORMAL line pointers refer to tuples
whose visibility status is not HEAPTUPLE_DEAD.
---
src/backend/access/heap/pruneheap.c | 24 ++++++++++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 17 +----------------
src/include/access/heapam.h | 8 ++++++++
3 files changed, 31 insertions(+), 18 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index be06699523..73fcc0081c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -67,6 +67,9 @@ typedef struct
bool marked[MaxHeapTuplesPerPage + 1];
int ndeleted; /* Number of tuples deleted from the page */
+
+ /* Whether or not the page makes rel truncation unsafe */
+ bool hastup;
} PruneState;
/* Local functions */
@@ -273,6 +276,7 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.ndeleted = 0;
prstate.nchain_members = 0;
prstate.nchain_candidates = 0;
+ prstate.hastup = false;
/*
* If we will prepare to freeze tuples, consider that it might be possible
@@ -282,7 +286,7 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->all_frozen = true;
else
presult->all_frozen = false;
-
+ presult->hastup = prstate.hastup;
/*
* presult->htsv is not initialized here because all ntuple spots in the
@@ -861,6 +865,8 @@ heap_prune_record_redirect(PruneState *prstate,
*/
if (was_normal)
prstate->ndeleted++;
+
+ prstate->hastup = true;
}
/* Record line pointer to be marked dead */
@@ -933,11 +939,15 @@ static void
heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate,
PruneResult *presult, OffsetNumber offnum)
{
- HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+ HeapTupleHeader htup;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+ presult->hastup = true; /* the page is not empty */
+
+ htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+
switch (htsv[offnum])
{
case HEAPTUPLE_LIVE:
@@ -1006,6 +1016,16 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
{
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
+
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the soft
+ * assumption that any LP_DEAD items encountered here will become
+ * LP_UNUSED later on, before count_nondeletable_pages is reached. If we
+ * don't make this assumption then rel truncation will only happen every
+ * other VACUUM, at most. Besides, VACUUM must treat
+ * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
+ * handled (handled here, or handled later on).
+ */
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 679c6a866e..212d76045e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1419,7 +1419,6 @@ lazy_scan_prune(LVRelState *vacrel,
int lpdead_items,
live_tuples,
recently_dead_tuples;
- bool hastup = false;
bool all_visible;
TransactionId visibility_cutoff_xid;
uint8 actions = 0;
@@ -1500,23 +1499,11 @@ lazy_scan_prune(LVRelState *vacrel,
/* Redirect items mustn't be touched */
if (ItemIdIsRedirected(itemid))
- {
- /* page makes rel truncation unsafe */
- hastup = true;
continue;
- }
if (ItemIdIsDead(itemid))
{
/*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- *
* Also deliberately delay unsetting all_visible until just before
* we return to lazy_scan_heap caller, as explained in full below.
* (This is another case where it's useful to anticipate that any
@@ -1631,8 +1618,6 @@ lazy_scan_prune(LVRelState *vacrel,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
-
- hastup = true; /* page makes rel truncation unsafe */
}
/*
@@ -1786,7 +1771,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->recently_dead_tuples += recently_dead_tuples;
/* Can't truncate this page */
- if (hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a7f5f19916..2311d01998 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -231,6 +231,14 @@ typedef struct PruneResult
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+ /*
+ * Whether or not the page makes rel truncation unsafe
+ *
+ * This is set to 'true', even if the page contains LP_DEAD items. VACUUM
+ * will remove them before attempting to truncate.
+ */
+ bool hastup;
+
/*
* Prepare to freeze in heap_page_prune(). lazy_scan_prune() will use the
* returned freeze plans to execute freezing.
--
2.40.1
v10-0009-Save-dead-tuple-offsets-during-heap_page_prune.patchtext/x-diff; charset=us-asciiDownload
From 957cebd15f7f2c6d42c9b5dcdf28fd75f1001a75 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 30 Mar 2024 01:27:08 -0400
Subject: [PATCH v10 09/10] Save dead tuple offsets during heap_page_prune
After heap_page_prune() returned, lazy_scan_prune() looped through all
of the offsets of LP_DEAD items which it later added to
LVRelState->dead_items. Instead take care of this when marking a line
pointer or when an existing non-removable LP_DEAD item is encountered in
heap_prune_chain().
Because deadoffsets are expected to be in order in
LVRelState->dead_items, sort the deadoffsets before saving them there.
---
src/backend/access/heap/pruneheap.c | 17 +++++++++++++
src/backend/access/heap/vacuumlazy.c | 38 +++++++++++++++++++---------
src/include/access/heapam.h | 7 +++++
3 files changed, 50 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 73fcc0081c..7f55e9c839 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -70,6 +70,13 @@ typedef struct
/* Whether or not the page makes rel truncation unsafe */
bool hastup;
+
+ /*
+ * LP_DEAD items on the page after pruning. Includes existing LP_DEAD
+ * items
+ */
+ int lpdead_items; /* includes existing LP_DEAD items */
+ OffsetNumber *deadoffsets; /* points directly to PruneResult->deadoffsets */
} PruneState;
/* Local functions */
@@ -277,6 +284,8 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.nchain_members = 0;
prstate.nchain_candidates = 0;
prstate.hastup = false;
+ prstate.lpdead_items = 0;
+ prstate.deadoffsets = presult->deadoffsets;
/*
* If we will prepare to freeze tuples, consider that it might be possible
@@ -570,6 +579,8 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Copy data back to 'presult' */
presult->nnewlpdead = prstate.ndead;
presult->ndeleted = prstate.ndeleted;
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
}
@@ -881,6 +892,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
+ /* Record the dead offset for vacuum */
+ prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
/*
* If the root entry had been a normal tuple, we are deleting it, so count
* it in the result. But changing a redirect (even to DEAD state) doesn't
@@ -1026,6 +1040,9 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
* hastup/nonempty_pages as provisional no matter how LP_DEAD items are
* handled (handled here, or handled later on).
*/
+
+ /* Record the dead offset for vacuum */
+ prstate->deadoffsets[prstate->lpdead_items++] = offnum;
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 212d76045e..7f1e4db55c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1373,6 +1373,15 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+static int
+OffsetNumber_cmp(const void *a, const void *b)
+{
+ OffsetNumber na = *(const OffsetNumber *) a,
+ nb = *(const OffsetNumber *) b;
+
+ return na < nb ? -1 : na > nb;
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -1416,14 +1425,12 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff;
ItemId itemid;
PruneResult presult;
- int lpdead_items,
- live_tuples,
+ int live_tuples,
recently_dead_tuples;
bool all_visible;
TransactionId visibility_cutoff_xid;
uint8 actions = 0;
int64 fpi_before = pgWalUsage.wal_fpi;
- OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1441,7 +1448,6 @@ lazy_scan_prune(LVRelState *vacrel,
presult.pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
presult.pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
presult.pagefrz.cutoffs = &vacrel->cutoffs;
- lpdead_items = 0;
live_tuples = 0;
recently_dead_tuples = 0;
@@ -1509,7 +1515,6 @@ lazy_scan_prune(LVRelState *vacrel,
* (This is another case where it's useful to anticipate that any
* LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
*/
- deadoffsets[lpdead_items++] = offnum;
continue;
}
@@ -1713,7 +1718,7 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && lpdead_items == 0)
+ if (all_visible && presult.lpdead_items == 0)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
@@ -1730,7 +1735,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
- if (lpdead_items > 0)
+ if (presult.lpdead_items > 0)
{
VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
@@ -1739,9 +1744,18 @@ lazy_scan_prune(LVRelState *vacrel,
ItemPointerSetBlockNumber(&tmp, blkno);
- for (int i = 0; i < lpdead_items; i++)
+ /*
+ * dead_items are expected to be in order. However, deadoffsets are
+ * collected incrementally in heap_page_prune_and_freeze() as each
+ * dead line pointer is recorded, with an indeterminate order. As
+ * such, sort the deadoffsets before saving them in LVRelState.
+ */
+ qsort(presult.deadoffsets, presult.lpdead_items, sizeof(OffsetNumber),
+ OffsetNumber_cmp);
+
+ for (int i = 0; i < presult.lpdead_items; i++)
{
- ItemPointerSetOffsetNumber(&tmp, deadoffsets[i]);
+ ItemPointerSetOffsetNumber(&tmp, presult.deadoffsets[i]);
dead_items->items[dead_items->num_items++] = tmp;
}
@@ -1766,7 +1780,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += lpdead_items;
+ vacrel->lpdead_items += presult.lpdead_items;
vacrel->live_tuples += live_tuples;
vacrel->recently_dead_tuples += recently_dead_tuples;
@@ -1775,7 +1789,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
- *has_lpdead_items = (lpdead_items > 0);
+ *has_lpdead_items = (presult.lpdead_items > 0);
Assert(!all_visible || !(*has_lpdead_items));
@@ -1843,7 +1857,7 @@ lazy_scan_prune(LVRelState *vacrel,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (lpdead_items > 0 && PageIsAllVisible(page))
+ else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2311d01998..e346312471 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -252,6 +252,13 @@ typedef struct PruneResult
bool all_frozen;
int nfrozen;
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+
+ /*
+ * LP_DEAD items on the page after pruning. Includes existing LP_DEAD
+ * items
+ */
+ int lpdead_items;
+ OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneResult;
/* 'reason' codes for heap_page_prune() */
--
2.40.1
v10-0010-Combine-freezing-and-pruning.patchtext/x-diff; charset=us-asciiDownload
From ed6a28c6832c29b7a7831dc5a30366d2fb67f052 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 30 Mar 2024 01:38:01 -0400
Subject: [PATCH v10 10/10] Combine freezing and pruning
Execute both freezing and pruning of tuples and emit a single WAL record
containing all changes.
---
src/backend/access/heap/heapam.c | 76 +--
src/backend/access/heap/heapam_handler.c | 2 +-
src/backend/access/heap/pruneheap.c | 712 ++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 352 ++---------
src/include/access/heapam.h | 75 +--
src/tools/pgindent/typedefs.list | 2 +-
6 files changed, 680 insertions(+), 539 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f8fddce03b..e07c959abe 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6125,9 +6125,9 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
*/
static TransactionId
FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
- const struct VacuumCutoffs *cutoffs, uint16 *flags,
- HeapPageFreeze *pagefrz)
+ uint16 *flags, HeapPageFreeze *pagefrz)
{
+ const struct VacuumCutoffs *cutoffs = pagefrz->cutoffs;
TransactionId newxmax;
MultiXactMember *members;
int nmembers;
@@ -6445,9 +6445,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* XIDs or MultiXactIds that will need to be processed by a future VACUUM.
*
* VACUUM caller must assemble HeapTupleFreeze freeze plan entries for every
- * tuple that we returned true for, and call heap_freeze_execute_prepared to
- * execute freezing. Caller must initialize pagefrz fields for page as a
- * whole before first call here for each heap page.
+ * tuple that we returned true for, and then execute freezing. Caller must
+ * initialize pagefrz fields for page as a whole before first call here for
+ * each heap page.
*
* VACUUM caller decides on whether or not to freeze the page as a whole.
* We'll often prepare freeze plans for a page that caller just discards.
@@ -6550,8 +6550,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* perform no-op xmax processing. The only constraint is that the
* FreezeLimit/MultiXactCutoff postcondition must never be violated.
*/
- newxmax = FreezeMultiXactId(xid, tuple->t_infomask, cutoffs,
- &flags, pagefrz);
+ newxmax = FreezeMultiXactId(xid, tuple->t_infomask, &flags, pagefrz);
if (flags & FRM_NOOP)
{
@@ -6729,7 +6728,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* Does this tuple force caller to freeze the entire page?
*/
pagefrz->freeze_required =
- heap_tuple_should_freeze(tuple, cutoffs,
+ heap_tuple_should_freeze(tuple, pagefrz->cutoffs,
&pagefrz->NoFreezePageRelfrozenXid,
&pagefrz->NoFreezePageRelminMxid);
}
@@ -6763,35 +6762,19 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
+ * Perform xmin/xmax XID status sanity checks before actually executing freeze
+ * plans.
+ *
+ * heap_prepare_freeze_tuple doesn't perform these checks directly because
+ * pg_xact lookups are relatively expensive. They shouldn't be repeated
+ * by successive VACUUMs that each decide against freezing the same page.
*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- /*
- * Perform xmin/xmax XID status sanity checks before critical section.
- *
- * heap_prepare_freeze_tuple doesn't perform these checks directly because
- * pg_xact lookups are relatively expensive. They shouldn't be repeated
- * by successive VACUUMs that each decide against freezing the same page.
- */
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6830,8 +6813,19 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xmax)));
}
}
+}
- START_CRIT_SECTION();
+/*
+ * Helper which executes freezing of one or more heap tuples on a page on
+ * behalf of caller. Caller passes an array of tuple plans from
+ * heap_prepare_freeze_tuple. Caller must set 'offset' in each plan for us.
+ * Must be called in a critical section that also marks the buffer dirty and,
+ * if needed, emits WAL.
+ */
+void
+heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
for (int i = 0; i < ntuples; i++)
{
@@ -6842,22 +6836,6 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
htup = (HeapTupleHeader) PageGetItem(page, itemid);
heap_execute_freeze_tuple(htup, frz);
}
-
- MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(rel))
- {
- log_heap_prune_and_freeze(rel, buffer, snapshotConflictHorizon,
- false, /* no cleanup lock required */
- PRUNE_VACUUM_SCAN,
- tuples, ntuples,
- NULL, 0, /* redirected */
- NULL, 0, /* dead */
- NULL, 0); /* unused */
- }
-
- END_CRIT_SECTION();
}
/*
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6abfe36dec..a793c0f56e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1106,7 +1106,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
* We ignore unused and redirect line pointers. DEAD line pointers
* should be counted as dead, because we need vacuum to run to get rid
* of them. Note that this rule agrees with the way that
- * heap_page_prune() counts things.
+ * heap_page_prune_and_freeze() counts things.
*/
if (!ItemIdIsNormal(itemid))
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7f55e9c839..4059e6d0c2 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,13 +21,15 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
-/* Working data for heap_page_prune and subroutines */
+/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
/* PRUNE_DO_* arguments */
@@ -36,38 +38,56 @@ typedef struct
/* tuple visibility test, initialized for the relation */
GlobalVisState *vistest;
- TransactionId new_prune_xid; /* new prune hint value for page */
- TransactionId snapshotConflictHorizon; /* latest xid removed */
+ /*
+ * Fields describing what to do to the page
+ */
+ TransactionId new_prune_xid; /* new prune hint value */
+ TransactionId latest_xid_removed;
int nredirected; /* numbers of entries in arrays below */
int ndead;
int nunused;
+ int nfrozen;
/* arrays that accumulate indexes of items to be changed */
OffsetNumber redirected[MaxHeapTuplesPerPage * 2];
OffsetNumber nowdead[MaxHeapTuplesPerPage];
OffsetNumber nowunused[MaxHeapTuplesPerPage];
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+
+ HeapPageFreeze pagefrz;
/*
- * Chain candidates contains indexes of all LP_NORMAL and LP_REDIRECT
- * items. The first partition are the indexes of the LP_NORMAL and
- * LP_REDIRECT items we know to be part of a chain. The second partition
- * are the indexes of HOT tuples that may or may not be part of a HOT
- * chain. Those which are part of a HOT chain will be visited and marked
- * by heap_prune_chain() and the others will be processed afterward.
+ * marked[i] is true when heap_prune_chain() has already processed item i.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
*/
- int nchain_members;
- int nchain_candidates;
- OffsetNumber chain_candidates[MaxHeapTuplesPerPage];
+ bool marked[MaxHeapTuplesPerPage + 1];
/*
- * marked[i] is true if item i is entered in one of the above arrays.
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
*
* This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
* 1. Otherwise every access would need to subtract 1.
*/
- bool marked[MaxHeapTuplesPerPage + 1];
+ int8 htsv[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * The rest of the fields are not used by pruning itself, but are used to
+ * collect information about what was pruned and what state the page is in
+ * after pruning, for the benefit of the caller. They are copied to
+ * PruneFreezeResult at the end.
+ */
int ndeleted; /* Number of tuples deleted from the page */
+ /* Number of live and recently dead tuples, after pruning */
+ int live_tuples;
+ int recently_dead_tuples;
+
/* Whether or not the page makes rel truncation unsafe */
bool hastup;
@@ -77,24 +97,59 @@ typedef struct
*/
int lpdead_items; /* includes existing LP_DEAD items */
OffsetNumber *deadoffsets; /* points directly to PruneResult->deadoffsets */
+
+ /*
+ * all_visible and all_frozen indicate if the all-visible and all-frozen
+ * bits in the visibility map can be set for this page, after pruning.
+ *
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page.
+ * The caller can use it as the conflict horizon, when setting the VM
+ * bits. It is only valid if we froze some tuples, and all_frozen is
+ * true.
+ *
+ * These are only set if the PRUNE_DO_TRY_FREEZE action flag is set.
+ *
+ * NOTE: This 'all_visible' doesn't include LP_DEAD items. That's
+ * convenient for heap_page_prune_and_freeze(), to use this to decide
+ * whether to freeze the page or not. The 'all_visible' value returned to
+ * the caller is adjusted to include LP_DEAD items at the end.
+ */
+ bool all_visible;
+ bool all_frozen;
+ TransactionId visibility_cutoff_xid;
+
+ /*
+ * Chain candidates contains indexes of all LP_NORMAL and LP_REDIRECT
+ * items. The first partition are the indexes of the LP_NORMAL and
+ * LP_REDIRECT items we know to be part of a chain. The second partition
+ * are the indexes of HOT tuples that may or may not be part of a HOT
+ * chain. Those which are part of a HOT chain will be visited and marked
+ * by heap_prune_chain() and the others will be processed afterward.
+ */
+ int nchain_members;
+ int nchain_candidates;
+ OffsetNumber chain_candidates[MaxHeapTuplesPerPage];
} PruneState;
/* Local functions */
static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
HeapTuple tup,
Buffer buffer);
-static void heap_prune_chain(Buffer buffer,
- OffsetNumber rootoffnum,
- int8 *htsv,
- PruneState *prstate, PruneResult *presult);
+static inline HTSV_Result htsv_get_valid_status(int status);
+static void heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
+ PruneState *prstate);
+
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum, bool was_normal);
-static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ bool was_normal);
+static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ bool was_normal);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ bool was_normal);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate, PruneResult *presult, OffsetNumber offnum);
+static void heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
@@ -176,14 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
- PruneResult presult;
-
- presult.pagefrz.freeze_required = false;
- presult.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
- presult.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
- presult.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
- presult.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
- presult.pagefrz.cutoffs = NULL;
+ PruneFreezeResult presult;
/*
* For now, do not set PRUNE_DO_MARK_UNUSED_NOW regardless of
@@ -191,8 +239,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* determine that during on-access pruning with the current
* implementation.
*/
- heap_page_prune(relation, buffer, vistest, 0,
- &presult, PRUNE_ON_ACCESS, NULL);
+ heap_page_prune_and_freeze(relation, buffer, 0, vistest,
+ NULL, &presult, PRUNE_ON_ACCESS, NULL, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -226,35 +274,52 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
/*
- * Prune and repair fragmentation in the specified page.
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * If the page can be marked all-frozen in the visibility map, we may
+ * opportunistically freeze tuples on the page if either its tuples are old
+ * enough or freezing will be cheap enough.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * actions are the pruning actions that heap_page_prune_and_freeze() should
+ * take.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
- * actions are the pruning actions that heap_page_prune() should take.
+ * cutoffs contains the information on visibility for the whole relation
+ * collected by vacuum at the beginning of vacuuming the relation. It will be
+ * NULL for callers other than vacuum.
*
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
- * heap_page_prune() is responsible for initializing it.
+ * heap_page_prune_and_freeze() is responsible for initializing it.
*
* reason indicates why the pruning is performed. It is included in the WAL
* record for debugging and analysis purposes, but otherwise has no effect.
*
* off_loc is the offset location required by the caller to use in error
* callback.
+ *
+ * new_relfrozen_xid and new_relmin_xid are provided by the caller if they
+ * would like the current values of those updated as part of advancing
+ * relfrozenxid/relminmxid.
*/
void
-heap_page_prune(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- uint8 actions,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc)
+heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ uint8 actions,
+ GlobalVisState *vistest,
+ struct VacuumCutoffs *cutoffs,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -262,6 +327,41 @@ heap_page_prune(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
+ bool do_freeze;
+ bool do_prune;
+ bool do_hint;
+ bool hint_bit_fpi;
+ int64 fpi_before = pgWalUsage.wal_fpi;
+
+ /*
+ * pagefrz contains visibility cutoff information and the current
+ * relfrozenxid and relminmxids used if the caller is interested in
+ * freezing tuples on the page.
+ */
+ prstate.pagefrz.cutoffs = cutoffs;
+ prstate.pagefrz.freeze_required = false;
+
+ if (new_relmin_mxid)
+ {
+ prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
+ prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+ }
+ else
+ {
+ prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ }
+
+ if (new_relfrozen_xid)
+ {
+ prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
+ }
+ else
+ {
+ prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ }
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -277,38 +377,73 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.new_prune_xid = InvalidTransactionId;
prstate.vistest = vistest;
prstate.actions = actions;
- prstate.snapshotConflictHorizon = InvalidTransactionId;
- prstate.nredirected = prstate.ndead = prstate.nunused = 0;
+ prstate.latest_xid_removed = InvalidTransactionId;
+ prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
+
+ /*
+ * prstate.htsv is not initialized here because all ntuple spots in the
+ * array will be set either to a valid HTSV_Result value or -1.
+ */
+
prstate.ndeleted = 0;
- prstate.nchain_members = 0;
- prstate.nchain_candidates = 0;
prstate.hastup = false;
+ prstate.live_tuples = 0;
+ prstate.recently_dead_tuples = 0;
prstate.lpdead_items = 0;
prstate.deadoffsets = presult->deadoffsets;
/*
- * If we will prepare to freeze tuples, consider that it might be possible
- * to set the page all-frozen in the visibility map.
+ * Caller may update the VM after we're done. We keep track of whether
+ * the page will be all_visible and all_frozen, once we're done with the
+ * pruning and freezing, to help the caller to do that.
+ *
+ * Currently, only VACUUM sets the VM bits. To save the effort, only do
+ * only the bookkeeping if the caller needs it. Currently, that's tied to
+ * PRUNE_DO_TRY_FREEZE, but it could be a separate flag, if you wanted to
+ * update the VM bits without also freezing, or freezing without setting
+ * the VM bits.
+ *
+ * In addition to telling the caller whether it can set the VM bit, we
+ * also use 'all_visible' and 'all_frozen' for our own decision-making. If
+ * the whole page will become frozen, we consider opportunistically
+ * freezing tuples. We will not be able to freeze the whole page if there
+ * are tuples present which are not visible to everyone or if there are
+ * dead tuples which are not yet removable. However, dead tuples which
+ * will be removed by the end of vacuuming should not preclude us from
+ * opportunistically freezing. Because of that, we do not clear
+ * all_visible when we see LP_DEAD items. We fix that at the end of the
+ * function, when we return the value to the caller, so that the caller
+ * doesn't set the VM bit incorrectly.
*/
if (prstate.actions & PRUNE_DO_TRY_FREEZE)
- presult->all_frozen = true;
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = true;
+ }
else
- presult->all_frozen = false;
- presult->hastup = prstate.hastup;
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
/*
- * presult->htsv is not initialized here because all ntuple spots in the
- * array will be set either to a valid HTSV_Result value or -1.
+ * The visibility cutoff xid is the newest xmin of live tuples on the
+ * page. In the common case, this will be set as the conflict horizon the
+ * caller can use for updating the VM. If, at the end of freezing and
+ * pruning, the page is all-frozen, there is no possibility that any
+ * running transaction on the standby does not see tuples on the page as
+ * all-visible, so the conflict horizon remains InvalidTransactionId.
*/
- presult->ndeleted = 0;
- presult->nnewlpdead = 0;
+ prstate.visibility_cutoff_xid = InvalidTransactionId;
- presult->nfrozen = 0;
+ prstate.nchain_members = 0;
+ prstate.nchain_candidates = 0;
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
+
/*
* Determine HTSV for all tuples.
*
@@ -336,7 +471,7 @@ heap_page_prune(Relation relation, Buffer buffer,
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup;
- presult->htsv[offnum] = -1;
+ prstate.htsv[offnum] = -1;
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
@@ -386,8 +521,8 @@ heap_page_prune(Relation relation, Buffer buffer,
tup.t_len = ItemIdGetLength(itemid);
ItemPointerSet(&tup.t_self, blockno, offnum);
- presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
if (!HeapTupleHeaderIsHeapOnly(htup))
{
@@ -404,6 +539,12 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.chain_candidates[prstate.nchain_candidates++] = offnum;
}
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted.
+ */
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+
/* Process HOT chains */
for (int i = 0; i < prstate.nchain_members; i++)
{
@@ -417,7 +558,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*off_loc = offnum;
/* Process this item or chain of items */
- heap_prune_chain(buffer, offnum, presult->htsv, &prstate, presult);
+ heap_prune_chain(buffer, offnum, &prstate);
}
/*
@@ -436,7 +577,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
- if (presult->htsv[offnum] == HEAPTUPLE_DEAD)
+ if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -475,7 +616,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (!HeapTupleHeaderIsHotUpdated(htup))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate.snapshotConflictHorizon);
+ &prstate.latest_xid_removed);
heap_prune_record_unused(&prstate, offnum, true);
continue;
}
@@ -487,7 +628,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* marked by heap_prune_chain() and heap_prune_record_unchanged() will
* return immediately.
*/
- heap_prune_record_unchanged(page, presult->htsv, &prstate, presult, offnum);
+ heap_prune_record_unchanged(page, &prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -510,21 +651,80 @@ heap_page_prune(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = InvalidOffsetNumber;
- /* Any error while applying the changes is critical */
- START_CRIT_SECTION();
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
+
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and prune
+ * records are combined, this heuristic couldn't be used anymore. The
+ * opportunistic freeze heuristic must be improved; however, for now, try
+ * to approximate it.
+ */
+ do_freeze = false;
+ if (prstate.actions & PRUNE_DO_TRY_FREEZE)
+ {
+ /* Is the whole page freezable? And is there something to freeze? */
+ bool whole_page_freezable = prstate.all_visible &&
+ prstate.all_frozen;
+
+ if (prstate.pagefrz.freeze_required)
+ do_freeze = true;
+ else if (whole_page_freezable && prstate.nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. In this case, we will
+ * freeze if we have already emitted an FPI or will do so anyway.
+ * Be sure only to incur the overhead of checking if we will do an
+ * FPI if we may use that information.
+ */
+ if (hint_bit_fpi ||
+ ((do_prune || do_hint) && XLogCheckBufferNeedsBackup(buffer)))
+ {
+ do_freeze = true;
+ }
+ }
+ }
- /* Have we found any prunable items? */
- if (prstate.nredirected > 0 || prstate.ndead > 0 || prstate.nunused > 0)
+ /*
+ * Validate the tuples we are considering freezing. We do this even if
+ * pruning and hint bit setting have not emitted an FPI so far because we
+ * still may emit an FPI while setting the page hint bit later. But we
+ * want to avoid doing the pre-freeze checks in a critical section.
+ */
+ if (do_freeze)
+ heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
+ else if (!prstate.all_frozen || prstate.nfrozen > 0)
{
+ Assert(!prstate.pagefrz.freeze_required);
+
/*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all-frozen and there
+ * will be no newly frozen tuples.
*/
- heap_page_prune_execute(buffer, false,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
+ prstate.all_frozen = false;
+ prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+
+ /* Any error while applying the changes is critical */
+ START_CRIT_SECTION();
+ if (do_hint)
+ {
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
* XID of any soon-prunable tuple.
@@ -532,12 +732,35 @@ heap_page_prune(Relation relation, Buffer buffer,
((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
/*
- * Also clear the "page is full" flag, since there's no point in
- * repeating the prune/defrag process until something else happens to
- * the page.
+ * Clear the "page is full" flag if it is set since there's no point
+ * in repeating the prune/defrag process until something else happens
+ * to the page.
*/
PageClearFull(page);
+ /*
+ * We only needed to update pd_prune_xid and clear the page-is-full
+ * hint bit, this is a non-WAL-logged hint. If we will also freeze or
+ * prune the page, we will mark the buffer dirty below.
+ */
+ if (!do_freeze && !do_prune)
+ MarkBufferDirtyHint(buffer, true);
+ }
+
+ if (do_prune || do_freeze)
+ {
+ /* Apply the planned item changes, then repair page fragmentation. */
+ if (do_prune)
+ {
+ heap_page_prune_execute(buffer, false,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
+ }
+
+ if (do_freeze)
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+
MarkBufferDirty(buffer);
/*
@@ -545,42 +768,123 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
+ /*
+ * The snapshotConflictHorizon for the whole record should be the
+ * most conservative of all the horizons calculated for any of the
+ * possible modifications. If this record will prune tuples, any
+ * transactions on the standby older than the youngest xmax of the
+ * most recently removed tuple this record will prune will
+ * conflict. If this record will freeze tuples, any transactions
+ * on the standby with xids older than the youngest tuple this
+ * record will freeze will conflict.
+ */
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+ TransactionId conflict_xid;
+
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for
+ * conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (do_freeze)
+ {
+ if (prstate.all_visible && prstate.all_frozen)
+ frz_conflict_horizon = prstate.visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = prstate.pagefrz.cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+ }
+
+ if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
+ conflict_xid = frz_conflict_horizon;
+ else
+ conflict_xid = prstate.latest_xid_removed;
+
log_heap_prune_and_freeze(relation, buffer,
- prstate.snapshotConflictHorizon,
+ conflict_xid,
true, reason,
- NULL, 0,
+ prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
}
}
- else
- {
- /*
- * If we didn't prune anything, but have found a new value for the
- * pd_prune_xid field, update it and mark the buffer dirty. This is
- * treated as a non-WAL-logged hint.
- *
- * Also clear the "page is full" flag if it is set, since there's no
- * point in repeating the prune/defrag process until something else
- * happens to the page.
- */
- if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page))
- {
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
- PageClearFull(page);
- MarkBufferDirtyHint(buffer, true);
- }
- }
END_CRIT_SECTION();
/* Copy data back to 'presult' */
- presult->nnewlpdead = prstate.ndead;
presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which heap pass (initial pass or final pass) ends up setting the
+ * page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state
+ * of things, as expected by our caller.
+ */
+ if (prstate.lpdead_items == 0)
+ {
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
+ }
+ else
+ {
+ presult->all_visible = false;
+ presult->all_frozen = false;
+ }
+ presult->hastup = prstate.hastup;
+
+ /*
+ * For callers planning to update the visibility map, the conflict horizon
+ * for that record must be the newest xmin on the page. However, if the
+ * page is completely frozen, there can be no conflict and the
+ * vm_conflict_horizon should remain InvalidTransactionId. This includes
+ * the case that we just froze all the tuples; the prune-freeze record
+ * included the conflict XID already so the caller doesn't need it.
+ */
+ if (!presult->all_frozen)
+ presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ else
+ presult->vm_conflict_horizon = InvalidTransactionId;
+
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+
+ /*
+ * If we will freeze tuples on the page or, even if we don't freeze tuples
+ * on the page, if we will set the page all-frozen in the visibility map,
+ * we can advance relfrozenxid and relminmxid to the values in
+ * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
+ */
+ Assert(presult->nfrozen > 0 || !prstate.pagefrz.freeze_required);
+
+ if (new_relfrozen_xid)
+ {
+ if (presult->nfrozen > 0)
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ else
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ }
+ if (new_relmin_mxid)
+ {
+ if (presult->nfrozen > 0)
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ else
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
}
@@ -605,10 +909,24 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
}
+/*
+ * Pruning calculates tuple visibility once and saves the results in an array
+ * of int8. See PruneState.htsv for details. This helper function is meant to
+ * guard against examining visibility status array members which have not yet
+ * been computed.
+ */
+static inline HTSV_Result
+htsv_get_valid_status(int status)
+{
+ Assert(status >= HEAPTUPLE_DEAD &&
+ status <= HEAPTUPLE_DELETE_IN_PROGRESS);
+ return (HTSV_Result) status;
+}
+
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in htsv.
+ * Tuple visibility information is provided in prstate->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -628,11 +946,17 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* prstate showing the changes to be made. Items to be redirected are added
* to the redirected[] array (two entries per redirection); items to be set to
* LP_DEAD state are added to nowdead[]; and items to be set to LP_UNUSED
- * state are added to nowunused[].
+ * state are added to nowunused[]. We perform bookkeeping of live tuples,
+ * visibility etc. based on what the page will look like after the changes
+ * applied. All that bookkeeping is performed in the heap_prune_record_*()
+ * subroutines. The division of labor is that heap_prune_chain() decides the
+ * fate of each tuple, ie. whether it's going to be removed, redirected or
+ * left unchanged, and the heap_prune_record_*() subroutines update PruneState
+ * based on that outcome.
*/
static void
heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
- int8 *htsv, PruneState *prstate, PruneResult *presult)
+ PruneState *prstate)
{
Page page = (Page) BufferGetPage(buffer);
ItemId rootlp = PageGetItemId(page, rootoffnum);
@@ -711,7 +1035,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
chainitems[nchain++] = offnum;
- switch (htsv_get_valid_status(htsv[offnum]))
+ switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
@@ -726,7 +1050,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
*/
ndeadchain = nchain;
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
+ &prstate->latest_xid_removed);
break;
case HEAPTUPLE_RECENTLY_DEAD:
@@ -775,10 +1099,11 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
{
/*
* We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune caused us to
- * visit the dead successor of a redirect item before visiting the
- * redirect item. We can clean up by setting the redirect item to
- * LP_DEAD state or LP_UNUSED if the caller indicated.
+ * item. This can happen if the loop in heap_page_prune_and_freeze()
+ * caused us to visit the dead successor of a redirect item before
+ * visiting the redirect item. We can clean up by setting the
+ * redirect item to LP_DEAD state or LP_UNUSED if the caller
+ * indicated.
*/
heap_prune_record_dead_or_unused(prstate, rootoffnum, false);
return;
@@ -799,7 +1124,7 @@ process_chains:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (; i < nchain; i++)
- heap_prune_record_unchanged(page, htsv, prstate, presult, chainitems[i]);
+ heap_prune_record_unchanged(page, prstate, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -831,7 +1156,7 @@ process_chains:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (int i = ndeadchain; i < nchain; i++)
- heap_prune_record_unchanged(page, htsv, prstate, presult, chainitems[i]);
+ heap_prune_record_unchanged(page, prstate, chainitems[i]);
}
}
@@ -892,6 +1217,18 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
+ /*
+ * Deliberately delay unsetting all_visible until later during pruning.
+ * Removable dead tuples shouldn't preclude freezing the page. After
+ * finishing this first pass of tuple visibility checks, initialize
+ * all_visible_except_removable with the current value of all_visible to
+ * indicate whether or not the page is all visible except for dead tuples.
+ * This will allow us to attempt to freeze the page after pruning. Later
+ * during pruning, if we encounter an LP_DEAD item or are setting an item
+ * LP_DEAD, we will unset all_visible. As long as we unset it before
+ * updating the visibility map, this will be correct.
+ */
+
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -947,37 +1284,121 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_norm
}
/*
- * Record LP_NORMAL line pointer that is left unchanged.
+ * Record line pointer that is left unchanged. We consider freezing it, and
+ * update bookkeeping of tuple counts and page visibility.
*/
static void
-heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate,
- PruneResult *presult, OffsetNumber offnum)
+heap_prune_record_unchanged(Page page, PruneState *prstate, OffsetNumber offnum)
{
HeapTupleHeader htup;
Assert(!prstate->marked[offnum]);
prstate->marked[offnum] = true;
- presult->hastup = true; /* the page is not empty */
+ prstate->hastup = true; /* the page is not empty */
+ /*
+ * The criteria for counting a tuple as live in this block need to match
+ * what analyze.c's acquire_sample_rows() does, otherwise VACUUM and
+ * ANALYZE may produce wildly different reltuples values, e.g. when there
+ * are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as VACUUM
+ * can't run inside a transaction block, which makes some cases impossible
+ * (e.g. in-progress insert from the same transaction).
+ *
+ * HEAPTUPLE_DEAD are handled by the other heap_prune_record_*()
+ * subroutines. They don't count dead items like acquire_sample_rows()
+ * does, because we assume that all dead items will become LP_UNUSED
+ * before VACUUM finishes. This difference is only superficial. VACUUM
+ * effectively agrees with ANALYZE about DEAD items, in the end. VACUUM
+ * won't remember LP_DEAD items, but only because they're not supposed to
+ * be left behind when it is done. (Cases where we bypass index vacuuming
+ * will violate this optimistic assumption, but the overall impact of that
+ * should be negligible.)
+ */
htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
- switch (htsv[offnum])
+ switch (prstate->htsv[offnum])
{
case HEAPTUPLE_LIVE:
- case HEAPTUPLE_INSERT_IN_PROGRESS:
/*
- * If we wanted to optimize for aborts, we might consider marking
- * the page prunable when we see INSERT_IN_PROGRESS. But we
- * don't. See related decisions about when to mark the page
- * prunable in heapam.c.
+ * Count it as live. Not only is this natural, but it's also what
+ * acquire_sample_rows() does.
+ */
+ prstate->live_tuples++;
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed asynchronously.
+ * See SetHintBits for more info. Check that the tuple is hinted
+ * xmin-committed because of that.
*/
+ if (prstate->all_visible)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ prstate->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed? A FrozenTransactionId
+ * is seen as committed to everyone. Otherwise, we check if
+ * there is a snapshot that considers this xid to still be
+ * running, and if so, we don't consider the page all-visible.
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+
+ /* For now always use pagefrz->cutoffs */
+ Assert(prstate->pagefrz.cutoffs);
+ if (!TransactionIdPrecedes(xmin, prstate->pagefrz.cutoffs->OldestXmin))
+ {
+ prstate->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
+ TransactionIdIsNormal(xmin))
+ prstate->visibility_cutoff_xid = xmin;
+ }
break;
case HEAPTUPLE_RECENTLY_DEAD:
+
+ /*
+ * If tuple is recently dead then we must not remove it from the
+ * relation. (We only remove items that are LP_DEAD from
+ * pruning.)
+ */
+ prstate->recently_dead_tuples++;
+ prstate->all_visible = false;
+
+ /*
+ * This tuple may soon become DEAD. Update the hint field so that
+ * the page is reconsidered for pruning in future.
+ */
+ heap_prune_record_prunable(prstate,
+ HeapTupleHeaderGetUpdateXid(htup));
+ break;
+
case HEAPTUPLE_DELETE_IN_PROGRESS:
+ /*
+ * This an expected case during concurrent vacuum. Count such rows
+ * as live. As above, we assume the deleting transaction will
+ * commit and update the counters after we report.
+ */
+ prstate->live_tuples++;
+ prstate->all_visible = false;
+
/*
* This tuple may soon become DEAD. Update the hint field so that
* the page is reconsidered for pruning in future.
@@ -986,6 +1407,24 @@ heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate,
HeapTupleHeaderGetUpdateXid(htup));
break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and we
+ * assume that will happen only after we report our results. This
+ * assumption is a bit shaky, but it is what acquire_sample_rows()
+ * does, so be consistent.
+ */
+ prstate->all_visible = false;
+
+ /*
+ * If we wanted to optimize for aborts, we might consider marking
+ * the page prunable when we see INSERT_IN_PROGRESS. But we
+ * don't. See related decisions about when to mark the page
+ * prunable in heapam.c.
+ */
+ break;
default:
@@ -993,7 +1432,8 @@ heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate,
* DEAD tuples should've been passed to heap_prune_record_dead()
* or heap_prune_record_unused() instead.
*/
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d", htsv[offnum]);
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d",
+ prstate->htsv[offnum]);
break;
}
@@ -1003,12 +1443,12 @@ heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate,
/* Tuple with storage -- consider need to freeze */
bool totally_frozen;
- if ((heap_prepare_freeze_tuple(htup, &presult->pagefrz,
- &presult->frozen[presult->nfrozen],
+ if ((heap_prepare_freeze_tuple(htup, &prstate->pagefrz,
+ &prstate->frozen[prstate->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- presult->frozen[presult->nfrozen++].offset = offnum;
+ prstate->frozen[prstate->nfrozen++].offset = offnum;
}
/*
@@ -1017,7 +1457,7 @@ heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate,
* definitely cannot be set all-frozen in the visibility map later on
*/
if (!totally_frozen)
- presult->all_frozen = false;
+ prstate->all_frozen = false;
}
}
@@ -1028,9 +1468,6 @@ heap_prune_record_unchanged(Page page, int8 *htsv, PruneState *prstate,
static void
heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
{
- Assert(!prstate->marked[offnum]);
- prstate->marked[offnum] = true;
-
/*
* Deliberately don't set hastup for LP_DEAD items. We make the soft
* assumption that any LP_DEAD items encountered here will become
@@ -1039,12 +1476,19 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
* other VACUUM, at most. Besides, VACUUM must treat
* hastup/nonempty_pages as provisional no matter how LP_DEAD items are
* handled (handled here, or handled later on).
+ *
+ * Similarly, don't unset all_visible until later, at the end of
+ * heap_page_prune_and_freeze(). This will allow us to attempt to freeze
+ * the page after pruning. As long as we unset it before updating the
+ * visibility map, this will be correct.
*/
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
-}
+ Assert(!prstate->marked[offnum]);
+ prstate->marked[offnum] = true;
+}
static void
heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum)
@@ -1062,7 +1506,7 @@ heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum
}
/*
- * Perform the actual page changes needed by heap_page_prune.
+ * Perform the actual page changes needed by heap_page_prune_and_freeze().
*
* If 'lp_truncate_only' is set, we are merely marking LP_DEAD line pointers
* as unused, not redirecting or removing anything else. The
@@ -1193,12 +1637,12 @@ heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
else
{
/*
- * When heap_page_prune() was called, PRUNE_DO_MARK_UNUSED_NOW may
- * have been set, which allows would-be LP_DEAD items to be made
- * LP_UNUSED instead. This is only possible if the relation has
- * no indexes. If there are any dead items, then
- * PRUNE_DO_MARK_UNUSED_NOW was not set and every item being
- * marked LP_UNUSED must refer to a heap-only tuple.
+ * When heap_page_prune_and_freeze() was called,
+ * PRUNE_DO_MARK_UNUSED_NOW may have been set, which allows
+ * would-be LP_DEAD items to be made LP_UNUSED instead. This is
+ * only possible if the relation has no indexes. If there are any
+ * dead items, then PRUNE_DO_MARK_UNUSED_NOW was not set and every
+ * item being marked LP_UNUSED must refer to a heap-only tuple.
*/
if (ndead > 0)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7f1e4db55c..3913da7e16 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* as an upper bound on the XIDs stored in the pages we'll actually scan
* (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
*
- * Next acquire vistest, a related cutoff that's used in heap_page_prune.
- * We expect vistest will always make heap_page_prune remove any deleted
- * tuple whose xmax is < OldestXmin. lazy_scan_prune must never become
- * confused about whether a tuple should be frozen or removed. (In the
- * future we might want to teach lazy_scan_prune to recompute vistest from
- * time to time, to increase the number of dead tuples it can prune away.)
+ * Next acquire vistest, a related cutoff that's used in
+ * heap_page_prune_and_freeze(). We expect vistest will always make
+ * heap_page_prune_and_freeze() remove any deleted tuple whose xmax is <
+ * OldestXmin. lazy_scan_prune must never become confused about whether a
+ * tuple should be frozen or removed. (In the future we might want to
+ * teach lazy_scan_prune to recompute vistest from time to time, to
+ * increase the number of dead tuples it can prune away.)
*/
vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
@@ -1387,22 +1388,6 @@ OffsetNumber_cmp(const void *a, const void *b)
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
- * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
- * whether or not a tuple should be considered DEAD. This happened when an
- * inserting transaction concurrently aborted (after our heap_page_prune()
- * call, before our HeapTupleSatisfiesVacuum() call). There was rather a lot
- * of complexity just so we could deal with tuples that were DEAD to VACUUM,
- * but nevertheless were left with storage after pruning.
- *
- * As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune()'s visibility check. Without the second call to
- * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
- * disagreement. We'll just handle such tuples as if they had become fully dead
- * right after this operation completes instead of in the middle of it. Note that
- * any tuple that becomes dead after the call to heap_page_prune() can't need to
- * be frozen, because it was visible to another session when vacuum started.
- *
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
@@ -1421,292 +1406,50 @@ lazy_scan_prune(LVRelState *vacrel,
bool *has_lpdead_items)
{
Relation rel = vacrel->rel;
- OffsetNumber offnum,
- maxoff;
- ItemId itemid;
- PruneResult presult;
- int live_tuples,
- recently_dead_tuples;
- bool all_visible;
- TransactionId visibility_cutoff_xid;
+ PruneFreezeResult presult;
uint8 actions = 0;
- int64 fpi_before = pgWalUsage.wal_fpi;
Assert(BufferGetBlockNumber(buf) == blkno);
/*
- * maxoff might be reduced following line pointer array truncation in
- * heap_page_prune. That's safe for us to ignore, since the reclaimed
- * space will continue to look like LP_UNUSED items below.
- */
- maxoff = PageGetMaxOffsetNumber(page);
-
- /* Initialize (or reset) page-level state */
- presult.pagefrz.freeze_required = false;
- presult.pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- presult.pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
- presult.pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- presult.pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
- presult.pagefrz.cutoffs = &vacrel->cutoffs;
- live_tuples = 0;
- recently_dead_tuples = 0;
-
- /*
- * Prune all HOT-update chains in this page.
+ * Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
- * in presult.ndeleted. It should not be confused with lpdead_items;
- * lpdead_items's final value can be thought of as the number of tuples
- * that were deleted from indexes.
+ * in presult.ndeleted. It should not be confused with
+ * presult.lpdead_items; presult.lpdead_items's final value can be thought
+ * of as the number of tuples that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED, so PRUNE_DO_MARK_UNUSED_NOW should be set if no
* indexes and unset otherwise.
+ *
+ * We will update the VM after collecting LP_DEAD items and freezing
+ * tuples. Pruning will have determined whether or not the page is
+ * all-visible.
*/
actions |= PRUNE_DO_TRY_FREEZE;
if (vacrel->nindexes == 0)
actions |= PRUNE_DO_MARK_UNUSED_NOW;
- heap_page_prune(rel, buf, vacrel->vistest, actions,
- &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
-
- /*
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Keep track of whether or not the page is all_visible and
- * all_frozen and use this information to update the VM. all_visible
- * implies 0 lpdead_items, but don't trust all_frozen result unless
- * all_visible is also set to true.
- *
- * Also keep track of the visibility cutoff xid for recovery conflicts.
- */
- all_visible = true;
- visibility_cutoff_xid = InvalidTransactionId;
-
- /*
- * Now scan the page to collect LP_DEAD items and update the variables set
- * just above.
- */
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
- {
- HeapTupleHeader htup;
-
- /*
- * Set the offset number so that we can display it along with any
- * error that occurred while processing this tuple.
- */
- vacrel->offnum = offnum;
- itemid = PageGetItemId(page, offnum);
-
- if (!ItemIdIsUsed(itemid))
- continue;
-
- /* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid))
- continue;
-
- if (ItemIdIsDead(itemid))
- {
- /*
- * Also deliberately delay unsetting all_visible until just before
- * we return to lazy_scan_heap caller, as explained in full below.
- * (This is another case where it's useful to anticipate that any
- * LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
- */
- continue;
- }
-
- Assert(ItemIdIsNormal(itemid));
-
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (htsv_get_valid_status(presult.htsv[offnum]))
- {
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
- {
- all_visible = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- recently_dead_tuples++;
- all_visible = false;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- all_visible = false;
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
- all_visible = false;
+ heap_page_prune_and_freeze(rel, buf, actions, vacrel->vistest,
+ &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum,
+ &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
- /*
- * Count such rows as live. As above, we assume the deleting
- * transaction will commit and update the counters after we
- * report.
- */
- live_tuples++;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
- }
+ Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
+ Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
- /*
- * We have now divided every item on the page into either an LP_DEAD item
- * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
- * that remains and needs to be considered for freezing now (LP_UNUSED and
- * LP_REDIRECT items also remain, but are of no further interest to us).
- */
vacrel->offnum = InvalidOffsetNumber;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- if (presult.pagefrz.freeze_required || presult.nfrozen == 0 ||
- (all_visible && presult.all_frozen &&
- fpi_before != pgWalUsage.wal_fpi))
+ if (presult.nfrozen > 0)
{
/*
- * We're freezing the page. Our final NewRelfrozenXid doesn't need to
- * be affected by the XIDs that are just about to be frozen anyway.
+ * We never increment the frozen_pages instrumentation counter when
+ * nfrozen == 0, since it only counts pages with newly frozen tuples
+ * (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->NewRelfrozenXid = presult.pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = presult.pagefrz.FreezePageRelminMxid;
-
- if (presult.nfrozen == 0)
- {
- /*
- * We have no freeze plans to execute, so there's no added cost
- * from following the freeze path. That's why it was chosen. This
- * is important in the case where the page only contains totally
- * frozen tuples at this point (perhaps only following pruning).
- * Such pages can be marked all-frozen in the VM by our caller,
- * even though none of its tuples were newly frozen here (note
- * that the "no freeze" path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter
- * here, since it only counts pages with newly frozen tuples
- * (don't confuse that with pages newly set all-frozen in VM).
- */
- }
- else
- {
- TransactionId snapshotConflictHorizon;
-
- vacrel->frozen_pages++;
-
- /*
- * We can use visibility_cutoff_xid as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM
- * once we're done with it. Otherwise we generate a conservative
- * cutoff by stepping back from OldestXmin.
- */
- if (all_visible && presult.all_frozen)
- {
- /* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = visibility_cutoff_xid;
- visibility_cutoff_xid = InvalidTransactionId;
- }
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
+ vacrel->frozen_pages++;
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
- }
- else
- {
- /*
- * Page requires "no freeze" processing. It might be set all-visible
- * in the visibility map, but it can never be set all-frozen.
- */
- vacrel->NewRelfrozenXid = presult.pagefrz.NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = presult.pagefrz.NoFreezePageRelminMxid;
- presult.all_frozen = false;
- presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1718,17 +1461,21 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && presult.lpdead_items == 0)
+ if (presult.all_visible)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
+ Assert(presult.lpdead_items == 0);
+
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
Assert(false);
+ Assert(presult.all_frozen == debug_all_frozen);
+
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == visibility_cutoff_xid);
+ debug_cutoff == presult.vm_conflict_horizon);
}
#endif
@@ -1762,27 +1509,14 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(dead_items->num_items <= dead_items->max_items);
pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
dead_items->num_items);
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on
- * to make the choice of whether or not to freeze the page unaffected
- * by the short-term presence of LP_DEAD items. These LP_DEAD items
- * were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible. It needs
- * to reflect the present state of things, as expected by our caller.
- */
- all_visible = false;
}
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += presult.lpdead_items;
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
+ vacrel->live_tuples += presult.live_tuples;
+ vacrel->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
@@ -1791,20 +1525,20 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_visible || !(*has_lpdead_items));
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (presult.all_frozen)
{
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1824,7 +1558,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
+ vmbuffer, presult.vm_conflict_horizon,
flags);
}
@@ -1872,7 +1606,7 @@ lazy_scan_prune(LVRelState *vacrel,
* it as all-frozen. Note that all_frozen is only valid if all_visible is
* true, so we must check both all_visible and all_frozen.
*/
- else if (all_visible_according_to_vm && all_visible &&
+ else if (all_visible_according_to_vm && presult.all_visible &&
presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
@@ -1889,11 +1623,11 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Set the page all-frozen (and all-visible) in the VM.
*
- * We can pass InvalidTransactionId as our visibility_cutoff_xid,
- * since a snapshotConflictHorizon sufficient to make everything safe
- * for REDO was logged when the page's tuples were frozen.
+ * We can pass InvalidTransactionId as our vm_conflict_horizon, since
+ * a snapshotConflictHorizon sufficient to make everything safe for
+ * REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e346312471..dfb36ea404 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -215,21 +215,15 @@ typedef struct HeapPageFreeze
/*
* Per-page state returned from pruning
*/
-typedef struct PruneResult
+typedef struct PruneFreezeResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ int nfrozen; /* Number of tuples we froze */
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
- *
- * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
- * 1. Otherwise every access would need to subtract 1.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
+ /* Number of live and recently dead tuples on the page, after pruning */
+ int live_tuples;
+ int recently_dead_tuples;
/*
* Whether or not the page makes rel truncation unsafe
@@ -240,18 +234,18 @@ typedef struct PruneResult
bool hastup;
/*
- * Prepare to freeze in heap_page_prune(). lazy_scan_prune() will use the
- * returned freeze plans to execute freezing.
- */
- HeapPageFreeze pagefrz;
-
- /*
- * Whether or not the page can be set all-frozen in the visibility map.
- * This is only set if the PRUNE_DO_TRY_FREEZE action flag is set.
+ * all_visible and all_frozen indicate if the all-visible and all-frozen
+ * bits in the visibility map can be set for this page, after pruning.
+ *
+ * vm_conflict_horizon is the newest xmin of live tuples on the page. The
+ * caller can use it as the conflict horizon, when setting the VM bits. It
+ * is only valid if we froze some tuples, and all_frozen is true.
+ *
+ * These are only set if the PRUNE_DO_TRY_FREEZE action flag is set.
*/
+ bool all_visible;
bool all_frozen;
- int nfrozen;
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+ TransactionId vm_conflict_horizon;
/*
* LP_DEAD items on the page after pruning. Includes existing LP_DEAD
@@ -259,7 +253,7 @@ typedef struct PruneResult
*/
int lpdead_items;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
-} PruneResult;
+} PruneFreezeResult;
/* 'reason' codes for heap_page_prune() */
typedef enum
@@ -269,20 +263,6 @@ typedef enum
PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
} PruneReason;
-/*
- * Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneResult.htsv for details. This helper function is meant to
- * guard against examining visibility status array members which have not yet
- * been computed.
- */
-static inline HTSV_Result
-htsv_get_valid_status(int status)
-{
- Assert(status >= HEAPTUPLE_DEAD &&
- status <= HEAPTUPLE_DELETE_IN_PROGRESS);
- return (HTSV_Result) status;
-}
-
/* ----------------
* function prototypes for heap access method
*
@@ -355,9 +335,11 @@ extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
-extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples);
+
+extern void heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
+extern void heap_freeze_prepared_tuples(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern bool heap_freeze_tuple(HeapTupleHeader tuple,
TransactionId relfrozenxid, TransactionId relminmxid,
TransactionId FreezeLimit, TransactionId MultiXactCutoff);
@@ -378,12 +360,15 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
- uint8 actions,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc);
+extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ uint8 actions,
+ struct GlobalVisState *vistest,
+ struct VacuumCutoffs *cutoffs,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid);
extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cfa9d5aaea..cbb9707b6a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2192,7 +2192,7 @@ PromptInterruptContext
ProtocolVersion
PrsStorage
PruneReason
-PruneResult
+PruneFreezeResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.40.1
On Sat, Mar 30, 2024 at 1:57 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
I think that we are actually successfully removing more RECENTLY_DEAD
HOT tuples than in master with heap_page_prune()'s new approach, and I
think it is correct; but let me know if I am missing something.
/me blinks.
Isn't zero the only correct number of RECENTLY_DEAD tuples to remove?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, Mar 30, 2024 at 8:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Mar 30, 2024 at 1:57 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:I think that we are actually successfully removing more RECENTLY_DEAD
HOT tuples than in master with heap_page_prune()'s new approach, and I
think it is correct; but let me know if I am missing something./me blinks.
Isn't zero the only correct number of RECENTLY_DEAD tuples to remove?
At the top of the comment for heap_prune_chain() in master, it says
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
* chain. We also prune any RECENTLY_DEAD tuples preceding a DEAD tuple.
* This is OK because a RECENTLY_DEAD tuple preceding a DEAD tuple is really
* DEAD, our visibility test is just too coarse to detect it.
Heikki had added a comment in one of his patches to the fast path for
HOT tuples at the top of heap_prune_chain():
* Note that we might first arrive at a dead heap-only tuple
* either while following a chain or here (in the fast
path). Whichever path
* gets there first will mark the tuple unused.
*
* Whether we arrive at the dead HOT tuple first here or while
* following a chain above affects whether preceding RECENTLY_DEAD
* tuples in the chain can be removed or not. Imagine that you
* have a chain with two tuples: RECENTLY_DEAD -> DEAD. If we
* reach the RECENTLY_DEAD tuple first, the chain-following logic
* will find the DEAD tuple and conclude that both tuples are in
* fact dead and can be removed. But if we reach the DEAD tuple
* at the end of the chain first, when we reach the RECENTLY_DEAD
* tuple later, we will not follow the chain because the DEAD
* TUPLE is already 'marked', and will not remove the
* RECENTLY_DEAD tuple. This is not a correctness issue, and the
* RECENTLY_DEAD tuple will be removed by a later VACUUM.
My patch splits the tuples into HOT and non-HOT while gathering their
visibility information and first calls heap_prune_chain() on the
non-HOT tuples and then processes the yet unmarked HOT tuples in a
separate loop afterward. This will follow all of the chains and
process them completely as well as processing all HOT tuples which may
not be reachable from a valid chain. The fast path contains a special
check to ensure that line pointers for DEAD not HOT-updated HOT tuples
(dead orphaned tuples from aborted HOT updates) are still marked
LP_UNUSED even though they are not reachable from a valid HOT chain.
By doing this later, we don't preclude ourselves from following all
chains.
- Melanie
On 30/03/2024 07:57, Melanie Plageman wrote:
On Fri, Mar 29, 2024 at 12:32:21PM -0400, Melanie Plageman wrote:
On Fri, Mar 29, 2024 at 12:00 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Here's another idea: In the first loop through the offsets, where we
gather the HTSV status of each item, also collect the offsets of all HOT
and non-HOT items to two separate arrays. Call heap_prune_chain() for
all the non-HOT items first, and then process any remaining HOT tuples
that haven't been marked yet.That's an interesting idea. I'll try it out and see how it works.
Attached v10 implements this method of dividing tuples into HOT and
non-HOT and processing the potential HOT chains first then processing
tuples not marked by calling heap_prune_chain().I have applied the refactoring of heap_prune_chain() to master and then
built the other patches on top of that.
Committed some of the changes. Continuing to reviewing the rest.
I discovered while writing this that LP_DEAD item offsets must be in
order in the deadoffsets array (the one that is used to populate
LVRelState->dead_items).When I changed heap_page_prune_and_freeze() to partition the offsets
into HOT and non-HOT during the first loop through the item pointers
array (where we get tuple visibility information), we add dead item
offsets as they are encountered. So, they are no longer in order. I've
added a quicksort of the deadoffsets array to satisfy vacuum.
Good catch.
I think that we are actually successfully removing more RECENTLY_DEAD
HOT tuples than in master with heap_page_prune()'s new approach, and I
think it is correct; but let me know if I am missing something.
Yep. In the case of a two-item chain, RECENTLY_DEAD -> DEAD, the new
code can always remove both items. On 'master', it depends on which item
it happens to process first. If it processes the RECENTLY_DEAD item
first, then it follows the chain and removes both. But if it processes
the DEAD item first, the RECENTLY_DEAD item is left behind. It will be
removed by the next VACUUM, so it's not a correctness issue, and
probably doesn't make any practical performance difference either as
it's a rare corner case, but I feel better that it's more deterministic now.
The early patches in the set include some additional comment cleanup as
well. 0001 is fairly polished. 0004 could use some variable renaming
(this patch partitions the tuples into HOT and not HOT and then
processes them). I was struggling with some of the names here
(chainmembers and chaincandidates is confusing).
I didn't understand why you wanted to juggle both partitions in the same
array. So I separated them into two arrays, and called them 'root_items'
and 'heaponly_items'.
In some micro-benchmarks, the order that the items were processed made a
measurable difference. So I'm processing the items in the reverse order.
That roughly matches the order the items are processed in master, as it
iterates the offsets from high-to-low in the first loop, and low-to-high
in the second loop.
The bulk of the combining of pruning and freezing is lumped into 0010.
I had planned to separate 0010 into 4 separate patches: 1 to execute
freezing in heap_prune_chain(), 1 for the freeze heuristic approximating
what is on master, and 1 for emitting a single record containing both
the pruning and freezing page modifications.I ended up not doing this because I felt like the grouping of changes in
0007-0009 is off. As long as I still execute freezing in
lazy_scan_prune(), I have to share lots of state between
lazy_scan_prune() and heap_page_prune(). This meant I added a lot of
parameters to heap_page_prune() that later commits removed -- making the
later patches noisy and not so easy to understand.I'm actually not sure what should go in what commit (either for review
clarity or for the actual final version).But, I think we should probably focus on review of the code and not as
much how it is split up yet.
Yeah, that's fine, 0010 is manageable-sized now.
The final state of the code could definitely use more cleanup. I've been
staring at it for awhile, so I could use some thoughts/ideas about what
part to focus on improving.
Committed some of the changes. I plan to commit at least the first of
these remaining patches later today. I'm happy with it now, but I'll
give it a final glance over after dinner.
I'll continue to review the rest after that, but attached is what I have
now.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
v11-0001-Handle-non-chain-tuples-outside-of-heap_prune_ch.patchtext/x-patch; charset=UTF-8; name=v11-0001-Handle-non-chain-tuples-outside-of-heap_prune_ch.patchDownload
From a6ab891779876e7cc1b4fb6fddb09f52f0094646 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 1 Apr 2024 16:59:38 +0300
Subject: [PATCH v11 1/7] Handle non-chain tuples outside of heap_prune_chain()
Dead branches of aborted HOT chains or leftover LP_DEAD and LP_REDIRECT
line pointers can be handled outside of heap_prune_chain(). This
simplifies the logic in heap_prune_chain(), as well as allowing us to
clean up more RECENTLY_DEAD -> DEAD chains.
To accomplish this efficiently, partition tuples into HOT and non-HOT
while first collecting visibility information for each tuple in
heap_page_prune(). Then call heap_prune_chain() only on potential chain
members. Then mop up the leftover HOT tuples afterwards.
As part of this, keep track of which items on page have already been
processed, in 'processed' array. This replaces the 'marked' array
which was only set for tuples marked for removal or redirection. The
'processed' array is updated also for items that are left unchanged,
when we conclude that an item can be left unchanged. At the end of
pruning, every item on the page should be marked as processed in the
array; an assertion is added for that.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
---
src/backend/access/heap/pruneheap.c | 264 +++++++++++++++++-----------
1 file changed, 166 insertions(+), 98 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 35a7c6147e9..164ae86a60f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -45,12 +45,23 @@ typedef struct
OffsetNumber nowunused[MaxHeapTuplesPerPage];
/*
- * marked[i] is true if item i is entered in one of the above arrays.
+ * 'root_items' contains offsets of all LP_REDIRECT line pointers and
+ * normal non-HOT tuples. They can be stand-alone items or the first item
+ * in a HOT chain. 'heaponly_items' contains heap-only tuples which can
+ * only be removed as part of a HOT chain.
+ */
+ int nroot_items;
+ OffsetNumber root_items[MaxHeapTuplesPerPage];
+ int nheaponly_items;
+ OffsetNumber heaponly_items[MaxHeapTuplesPerPage];
+
+ /*
+ * processed[offnum] is true if item at offnum has been processed.
*
* This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
* 1. Otherwise every access would need to subtract 1.
*/
- bool marked[MaxHeapTuplesPerPage + 1];
+ bool processed[MaxHeapTuplesPerPage + 1];
int ndeleted; /* Number of tuples deleted from the page */
} PruneState;
@@ -67,6 +78,8 @@ static void heap_prune_record_redirect(PruneState *prstate,
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
+static void heap_prune_record_unchanged(PruneState *prstate, OffsetNumber offnum);
+
static void page_verify_redirects(Page page);
@@ -242,8 +255,9 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.mark_unused_now = mark_unused_now;
prstate.snapshotConflictHorizon = InvalidTransactionId;
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
- memset(prstate.marked, 0, sizeof(prstate.marked));
prstate.ndeleted = 0;
+ prstate.nroot_items = 0;
+ prstate.nheaponly_items = 0;
/*
* presult->htsv is not initialized here because all ntuple spots in the
@@ -256,15 +270,16 @@ heap_page_prune(Relation relation, Buffer buffer,
tup.t_tableOid = RelationGetRelid(relation);
/*
- * Determine HTSV for all tuples.
+ * Determine HTSV for all tuples, and queue them up for processing as HOT
+ * chain roots or as a heap-only items.
*
* This is required for correctness to deal with cases where running HTSV
* twice could result in different results (e.g. RECENTLY_DEAD can turn to
* DEAD if another checked item causes GlobalVisTestIsRemovableFullXid()
* to update the horizon, INSERT_IN_PROGRESS can change to DEAD if the
- * inserting transaction aborts, ...). That in turn could cause
- * heap_prune_chain() to behave incorrectly if a tuple is reached twice,
- * once directly via a heap_prune_chain() and once following a HOT chain.
+ * inserting transaction aborts, ...). VACUUM assumes that there are no
+ * normal DEAD tuples left on the page after pruning, so it needs to have
+ * the same understanding of what is DEAD and what is not.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -282,52 +297,140 @@ heap_page_prune(Relation relation, Buffer buffer,
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup;
+ /*
+ * Set the offset number so that we can display it along with any
+ * error that occurred while processing this tuple.
+ */
+ *off_loc = offnum;
+
+ prstate.processed[offnum] = false;
+ presult->htsv[offnum] = -1;
+
/* Nothing to do if slot doesn't contain a tuple */
- if (!ItemIdIsNormal(itemid))
+ if (!ItemIdIsUsed(itemid))
{
- presult->htsv[offnum] = -1;
+ heap_prune_record_unchanged(&prstate, offnum);
continue;
}
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
- tup.t_data = htup;
- tup.t_len = ItemIdGetLength(itemid);
- ItemPointerSet(&(tup.t_self), blockno, offnum);
+ if (ItemIdIsDead(itemid))
+ {
+ /*
+ * If the caller set mark_unused_now true, we can set dead line
+ * pointers LP_UNUSED now.
+ */
+ if (unlikely(prstate.mark_unused_now))
+ heap_prune_record_unused(&prstate, offnum, false);
+ else
+ heap_prune_record_unchanged(&prstate, offnum);
+ continue;
+ }
+
+ if (ItemIdIsRedirected(itemid))
+ {
+ /* This is the start of a HOT chain */
+ prstate.root_items[prstate.nroot_items++] = offnum;
+ continue;
+ }
+
+ Assert(ItemIdIsNormal(itemid));
/*
- * Set the offset number so that we can display it along with any
- * error that occurred while processing this tuple.
+ * Get the tuple's visibility status and queue it up for processing.
*/
- *off_loc = offnum;
+ htup = (HeapTupleHeader) PageGetItem(page, itemid);
+ tup.t_data = htup;
+ tup.t_len = ItemIdGetLength(itemid);
+ ItemPointerSet(&tup.t_self, blockno, offnum);
presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
+
+ if (!HeapTupleHeaderIsHeapOnly(htup))
+ prstate.root_items[prstate.nroot_items++] = offnum;
+ else
+ prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
}
- /* Scan the page */
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
+ /* Process HOT chains */
+ for (int i = prstate.nroot_items - 1; i >= 0; i--)
{
- ItemId itemid;
+ offnum = prstate.root_items[i];
/* Ignore items already processed as part of an earlier chain */
- if (prstate.marked[offnum])
+ if (prstate.processed[offnum])
continue;
/* see preceding loop */
*off_loc = offnum;
- /* Nothing to do if slot is empty */
- itemid = PageGetItemId(page, offnum);
- if (!ItemIdIsUsed(itemid))
- continue;
-
/* Process this item or chain of items */
heap_prune_chain(page, blockno, maxoff,
offnum, presult->htsv, &prstate);
}
+ /*
+ * Process any heap-only tuples that were not already processed as part of
+ * a HOT chain.
+ */
+ for (int i = prstate.nheaponly_items - 1; i >= 0; i--)
+ {
+ offnum = prstate.heaponly_items[i];
+
+ if (prstate.processed[offnum])
+ continue;
+
+ /* see preceding loop */
+ *off_loc = offnum;
+
+ if (presult->htsv[offnum] == HEAPTUPLE_DEAD)
+ {
+ ItemId itemid = PageGetItemId(page, offnum);
+ HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
+
+ /*
+ * If the tuple is DEAD and doesn't chain to anything else, mark
+ * it unused immediately. (If it does chain, we can only remove
+ * it as part of pruning its chain.)
+ *
+ * We need this primarily to handle aborted HOT updates, that is,
+ * XMIN_INVALID heap-only tuples. Those might not be linked to by
+ * any chain, since the parent tuple might be re-updated before
+ * any pruning occurs. So we have to be able to reap them
+ * separately from chain-pruning. (Note that
+ * HeapTupleHeaderIsHotUpdated will never return true for an
+ * XMIN_INVALID tuple, so this code will work even when there were
+ * sequential updates within the aborted transaction.)
+ */
+ if (!HeapTupleHeaderIsHotUpdated(htup))
+ {
+ HeapTupleHeaderAdvanceConflictHorizon(htup,
+ &prstate.snapshotConflictHorizon);
+ heap_prune_record_unused(&prstate, offnum, true);
+ continue;
+ }
+ }
+
+ /*
+ * HOT tuple is not DEAD or has been HOT-updated. If it is a DEAD,
+ * HOT-updated member of a chain, it should have already been
+ * processed by heap_prune_chain().
+ */
+ heap_prune_record_unchanged(&prstate, offnum);
+ }
+
+ /* We should now have processed every tuple exactly once */
+#ifdef USE_ASSERT_CHECKING
+ for (offnum = FirstOffsetNumber;
+ offnum <= maxoff;
+ offnum = OffsetNumberNext(offnum))
+ {
+ *off_loc = offnum;
+
+ Assert(prstate.processed[offnum]);
+ }
+#endif
+
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
@@ -455,7 +558,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
{
TransactionId priorXmax = InvalidTransactionId;
ItemId rootlp;
- HeapTupleHeader htup;
OffsetNumber offnum;
OffsetNumber chainitems[MaxHeapTuplesPerPage];
@@ -468,52 +570,13 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
rootlp = PageGetItemId(page, rootoffnum);
- /*
- * If it's a heap-only tuple, then it is not the start of a HOT chain.
- */
- if (ItemIdIsNormal(rootlp))
- {
- Assert(htsv[rootoffnum] != -1);
- htup = (HeapTupleHeader) PageGetItem(page, rootlp);
-
- if (HeapTupleHeaderIsHeapOnly(htup))
- {
- /*
- * If the tuple is DEAD and doesn't chain to anything else, mark
- * it unused immediately. (If it does chain, we can only remove
- * it as part of pruning its chain.)
- *
- * We need this primarily to handle aborted HOT updates, that is,
- * XMIN_INVALID heap-only tuples. Those might not be linked to by
- * any chain, since the parent tuple might be re-updated before
- * any pruning occurs. So we have to be able to reap them
- * separately from chain-pruning. (Note that
- * HeapTupleHeaderIsHotUpdated will never return true for an
- * XMIN_INVALID tuple, so this code will work even when there were
- * sequential updates within the aborted transaction.)
- *
- * Note that we might first arrive at a dead heap-only tuple
- * either here or while following a chain below. Whichever path
- * gets there first will mark the tuple unused.
- */
- if (htsv[rootoffnum] == HEAPTUPLE_DEAD &&
- !HeapTupleHeaderIsHotUpdated(htup))
- {
- heap_prune_record_unused(prstate, rootoffnum, true);
- HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
- }
-
- return;
- }
- }
-
/* Start from the root tuple */
offnum = rootoffnum;
/* while not end of the chain */
for (;;)
{
+ HeapTupleHeader htup;
ItemId lp;
/* Sanity check (pure paranoia) */
@@ -528,14 +591,18 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
break;
/* If item is already processed, stop --- it must not be same chain */
- if (prstate->marked[offnum])
+ if (prstate->processed[offnum])
break;
lp = PageGetItemId(page, offnum);
- /* Unused item obviously isn't part of the chain */
- if (!ItemIdIsUsed(lp))
- break;
+ /*
+ * Unused item obviously isn't part of the chain. Likewise, a dead
+ * line pointer can't be part of the chain. Both of those cases were
+ * already marked as processed.
+ */
+ Assert(ItemIdIsUsed(lp));
+ Assert(!ItemIdIsDead(lp));
/*
* If we are looking at the redirected root line pointer, jump to the
@@ -551,25 +618,8 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
continue;
}
- /*
- * Likewise, a dead line pointer can't be part of the chain. (We
- * already eliminated the case of dead root tuple outside this
- * function.)
- */
- if (ItemIdIsDead(lp))
- {
- /*
- * If the caller set mark_unused_now true, we can set dead line
- * pointers LP_UNUSED now. We don't increment ndeleted here since
- * the LP was already marked dead.
- */
- if (unlikely(prstate->mark_unused_now))
- heap_prune_record_unused(prstate, offnum, false);
-
- break;
- }
-
Assert(ItemIdIsNormal(lp));
+
htup = (HeapTupleHeader) PageGetItem(page, lp);
/*
@@ -689,6 +739,8 @@ process_chain:
* No DEAD tuple was found, so the chain is entirely composed of
* normal, unchanged tuples. Leave it alone.
*/
+ for (int i = 0; i < nchain; i++)
+ heap_prune_record_unchanged(prstate, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -713,6 +765,8 @@ process_chain:
heap_prune_record_unused(prstate, chainitems[i], true);
/* the rest of tuples in the chain are normal, unchanged tuples */
+ for (int i = ndeadchain; i < nchain; i++)
+ heap_prune_record_unchanged(prstate, chainitems[i]);
}
}
@@ -736,14 +790,19 @@ heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum,
bool was_normal)
{
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+
+ /*
+ * Do not mark the redirect target here. It needs to be counted
+ * separately as an unchanged tuple.
+ */
+
Assert(prstate->nredirected < MaxHeapTuplesPerPage);
prstate->redirected[prstate->nredirected * 2] = offnum;
prstate->redirected[prstate->nredirected * 2 + 1] = rdoffnum;
+
prstate->nredirected++;
- Assert(!prstate->marked[offnum]);
- prstate->marked[offnum] = true;
- Assert(!prstate->marked[rdoffnum]);
- prstate->marked[rdoffnum] = true;
/*
* If the root entry had been a normal tuple, we are deleting it, so count
@@ -759,11 +818,12 @@ static void
heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
bool was_normal)
{
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+
Assert(prstate->ndead < MaxHeapTuplesPerPage);
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
- Assert(!prstate->marked[offnum]);
- prstate->marked[offnum] = true;
/*
* If the root entry had been a normal tuple, we are deleting it, so count
@@ -800,11 +860,12 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
static void
heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal)
{
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+
Assert(prstate->nunused < MaxHeapTuplesPerPage);
prstate->nowunused[prstate->nunused] = offnum;
prstate->nunused++;
- Assert(!prstate->marked[offnum]);
- prstate->marked[offnum] = true;
/*
* If the root entry had been a normal tuple, we are deleting it, so count
@@ -815,6 +876,13 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_norm
prstate->ndeleted++;
}
+/* Record a line pointer that is left unchanged */
+static void
+heap_prune_record_unchanged(PruneState *prstate, OffsetNumber offnum)
+{
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+}
/*
* Perform the actual page changes needed by heap_page_prune.
--
2.39.2
v11-0002-Invoke-heap_prune_record_prunable-during-record-.patchtext/x-patch; charset=UTF-8; name=v11-0002-Invoke-heap_prune_record_prunable-during-record-.patchDownload
From 71a84c40d19716071b23f45c2a40c03fe7f69b59 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 1 Apr 2024 15:37:19 +0300
Subject: [PATCH v11 2/7] Invoke heap_prune_record_prunable() during record
unchanged
In anticipation of preparing to freeze and counting tuples which are not
candidates for pruning, this commit introduces heap_prune_record*()
functions for marking a line pointer which will not change.
Recording the lowest soon-to-be prunable xid is one of the actions we
take for item pointers we will not be changing during pruning. Move this
to the recently introduced heap_prune_record_unchanged() function so
that we group all actions we take for unchanged LP_NORMAL line pointers
together.
Author: Melanie Plageman <melanieplageman@gmail.com>
---
src/backend/access/heap/pruneheap.c | 130 +++++++++++++++++++++-------
1 file changed, 97 insertions(+), 33 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 164ae86a60f..fb0ad834f1b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -78,7 +78,11 @@ static void heap_prune_record_redirect(PruneState *prstate,
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_unchanged(PruneState *prstate, OffsetNumber offnum);
+
+static void heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -309,7 +313,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
{
- heap_prune_record_unchanged(&prstate, offnum);
+ heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
continue;
}
@@ -322,7 +326,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (unlikely(prstate.mark_unused_now))
heap_prune_record_unused(&prstate, offnum, false);
else
- heap_prune_record_unchanged(&prstate, offnum);
+ heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
continue;
}
@@ -416,7 +420,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* HOT-updated member of a chain, it should have already been
* processed by heap_prune_chain().
*/
- heap_prune_record_unchanged(&prstate, offnum);
+ heap_prune_record_unchanged_lp_normal(page, presult->htsv, &prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -634,9 +638,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
*/
chainitems[nchain++] = offnum;
- /*
- * Check tuple's visibility status.
- */
switch (htsv_get_valid_status(htsv[offnum]))
{
case HEAPTUPLE_DEAD:
@@ -652,9 +653,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
case HEAPTUPLE_RECENTLY_DEAD:
/*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- *
* We don't need to advance the conflict horizon for
* RECENTLY_DEAD tuples, even if we are removing them. This
* is because we only remove RECENTLY_DEAD tuples if they
@@ -663,8 +661,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
* tuple by virtue of being later in the chain. We will have
* advanced the conflict horizon for the DEAD tuple.
*/
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
/*
* Advance past RECENTLY_DEAD tuples just in case there's a
@@ -675,24 +671,8 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- */
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
- goto process_chain;
-
case HEAPTUPLE_LIVE:
case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * If we wanted to optimize for aborts, we might consider
- * marking the page prunable when we see INSERT_IN_PROGRESS.
- * But we don't. See related decisions about when to mark the
- * page prunable in heapam.c.
- */
goto process_chain;
default:
@@ -739,8 +719,15 @@ process_chain:
* No DEAD tuple was found, so the chain is entirely composed of
* normal, unchanged tuples. Leave it alone.
*/
- for (int i = 0; i < nchain; i++)
- heap_prune_record_unchanged(prstate, chainitems[i]);
+ int i = 0;
+
+ if (ItemIdIsRedirected(rootlp))
+ {
+ heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum);
+ i++;
+ }
+ for (; i < nchain; i++)
+ heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -766,7 +753,7 @@ process_chain:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (int i = ndeadchain; i < nchain; i++)
- heap_prune_record_unchanged(prstate, chainitems[i]);
+ heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
}
}
@@ -876,10 +863,87 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_norm
prstate->ndeleted++;
}
-/* Record a line pointer that is left unchanged */
+/*
+ * Record an unused line pointer that is left unchanged.
+ */
+static void
+heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum)
+{
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+}
+
+/*
+ * Record LP_NORMAL line pointer that is left unchanged.
+ */
static void
-heap_prune_record_unchanged(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum)
{
+ HeapTupleHeader htup;
+
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+
+ switch (htsv[offnum])
+ {
+ case HEAPTUPLE_LIVE:
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * If we wanted to optimize for aborts, we might consider marking
+ * the page prunable when we see INSERT_IN_PROGRESS. But we
+ * don't. See related decisions about when to mark the page
+ * prunable in heapam.c.
+ */
+ break;
+
+ case HEAPTUPLE_RECENTLY_DEAD:
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+ htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+
+ /*
+ * This tuple may soon become DEAD. Update the hint field so that
+ * the page is reconsidered for pruning in future.
+ */
+ heap_prune_record_prunable(prstate,
+ HeapTupleHeaderGetUpdateXid(htup));
+ break;
+
+
+ default:
+
+ /*
+ * DEAD tuples should've been passed to heap_prune_record_dead()
+ * or heap_prune_record_unused() instead.
+ */
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d", htsv[offnum]);
+ break;
+ }
+}
+
+
+/*
+ * Record line pointer that was already LP_DEAD and is left unchanged.
+ */
+static void
+heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum)
+{
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+}
+
+static void
+heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum)
+{
+ /*
+ * A redirect line pointer doesn't count as a live tuple.
+ *
+ * If we leave a redirect line pointer in place, there will be another
+ * tuple on the page that it points to. We will do the bookkeeping for
+ * that separately. So we have nothing to do here, except remember that we
+ * processed this item.
+ */
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
}
--
2.39.2
v11-0003-Introduce-PRUNE_DO_-actions.patchtext/x-patch; charset=UTF-8; name=v11-0003-Introduce-PRUNE_DO_-actions.patchDownload
From 17e183835a968e81daf7b74a4164b243e2de35aa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 29 Mar 2024 19:43:09 -0400
Subject: [PATCH v11 3/7] Introduce PRUNE_DO_* actions
We will eventually take additional actions in heap_page_prune() at the
discretion of the caller. For now, introduce these PRUNE_DO_* macros and
turn mark_unused_now, a paramter to heap_page_prune(), into a PRUNE_DO_
action.
---
src/backend/access/heap/pruneheap.c | 51 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 11 ++++--
src/include/access/heapam.h | 13 ++++++-
3 files changed, 46 insertions(+), 29 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fb0ad834f1b..30965c3c5a1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -29,10 +29,11 @@
/* Working data for heap_page_prune and subroutines */
typedef struct
{
+ /* PRUNE_DO_* arguments */
+ uint8 actions;
+
/* tuple visibility test, initialized for the relation */
GlobalVisState *vistest;
- /* whether or not dead items can be set LP_UNUSED during pruning */
- bool mark_unused_now;
TransactionId new_prune_xid; /* new prune hint value for page */
TransactionId snapshotConflictHorizon; /* latest xid removed */
@@ -166,11 +167,12 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneResult presult;
/*
- * For now, pass mark_unused_now as false regardless of whether or
- * not the relation has indexes, since we cannot safely determine
- * that during on-access pruning with the current implementation.
+ * For now, do not set PRUNE_DO_MARK_UNUSED_NOW regardless of
+ * whether or not the relation has indexes, since we cannot safely
+ * determine that during on-access pruning with the current
+ * implementation.
*/
- heap_page_prune(relation, buffer, vistest, false,
+ heap_page_prune(relation, buffer, vistest, 0,
&presult, PRUNE_ON_ACCESS, &dummy_off_loc);
/*
@@ -215,8 +217,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
- * mark_unused_now indicates whether or not dead items can be set LP_UNUSED
- * during pruning.
+ * actions are the pruning actions that heap_page_prune() should take.
*
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
@@ -231,7 +232,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
void
heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
- bool mark_unused_now,
+ uint8 actions,
PruneResult *presult,
PruneReason reason,
OffsetNumber *off_loc)
@@ -256,7 +257,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
prstate.new_prune_xid = InvalidTransactionId;
prstate.vistest = vistest;
- prstate.mark_unused_now = mark_unused_now;
+ prstate.actions = actions;
prstate.snapshotConflictHorizon = InvalidTransactionId;
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
prstate.ndeleted = 0;
@@ -320,10 +321,10 @@ heap_page_prune(Relation relation, Buffer buffer,
if (ItemIdIsDead(itemid))
{
/*
- * If the caller set mark_unused_now true, we can set dead line
- * pointers LP_UNUSED now.
+ * If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can set dead
+ * line pointers LP_UNUSED now.
*/
- if (unlikely(prstate.mark_unused_now))
+ if (unlikely(prstate.actions & PRUNE_DO_MARK_UNUSED_NOW))
heap_prune_record_unused(&prstate, offnum, false);
else
heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
@@ -822,22 +823,22 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
}
/*
- * Depending on whether or not the caller set mark_unused_now to true, record that a
- * line pointer should be marked LP_DEAD or LP_UNUSED. There are other cases in
- * which we will mark line pointers LP_UNUSED, but we will not mark line
- * pointers LP_DEAD if mark_unused_now is true.
+ * Depending on whether or not the caller set PRUNE_DO_MARK_UNUSED_NOW, record
+ * that a line pointer should be marked LP_DEAD or LP_UNUSED. There are other
+ * cases in which we will mark line pointers LP_UNUSED, but we will not mark
+ * line pointers LP_DEAD if PRUNE_DO_MARK_UNUSED_NOW is set.
*/
static void
heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
bool was_normal)
{
/*
- * If the caller set mark_unused_now to true, we can remove dead tuples
+ * If the caller set PRUNE_DO_MARK_UNUSED_NOW, we can remove dead tuples
* during pruning instead of marking their line pointers dead. Set this
* tuple's line pointer LP_UNUSED. We hint that this option is less
* likely.
*/
- if (unlikely(prstate->mark_unused_now))
+ if (unlikely(prstate->actions & PRUNE_DO_MARK_UNUSED_NOW))
heap_prune_record_unused(prstate, offnum, was_normal);
else
heap_prune_record_dead(prstate, offnum, was_normal);
@@ -1080,12 +1081,12 @@ heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
else
{
/*
- * When heap_page_prune() was called, mark_unused_now may have
- * been passed as true, which allows would-be LP_DEAD items to be
- * made LP_UNUSED instead. This is only possible if the relation
- * has no indexes. If there are any dead items, then
- * mark_unused_now was not true and every item being marked
- * LP_UNUSED must refer to a heap-only tuple.
+ * When heap_page_prune() was called, PRUNE_DO_MARK_UNUSED_NOW may
+ * have been set, which allows would-be LP_DEAD items to be made
+ * LP_UNUSED instead. This is only possible if the relation has
+ * no indexes. If there are any dead items, then
+ * PRUNE_DO_MARK_UNUSED_NOW was not set and every item being
+ * marked LP_UNUSED must refer to a heap-only tuple.
*/
if (ndead > 0)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ba5b7083a3a..880a218cb4d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1425,6 +1425,7 @@ lazy_scan_prune(LVRelState *vacrel,
bool all_visible,
all_frozen;
TransactionId visibility_cutoff_xid;
+ uint8 actions = 0;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
@@ -1458,10 +1459,14 @@ lazy_scan_prune(LVRelState *vacrel,
* that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
- * items LP_UNUSED, so mark_unused_now should be true if no indexes and
- * false otherwise.
+ * items LP_UNUSED, so PRUNE_DO_MARK_UNUSED_NOW should be set if no
+ * indexes and unset otherwise.
*/
- heap_page_prune(rel, buf, vacrel->vistest, vacrel->nindexes == 0,
+
+ if (vacrel->nindexes == 0)
+ actions |= PRUNE_DO_MARK_UNUSED_NOW;
+
+ heap_page_prune(rel, buf, vacrel->vistest, actions,
&presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
/*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 32a3fbce961..35b8486c34a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -191,6 +191,17 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
+/*
+ * Actions that can be taken during pruning and freezing. By default, we will
+ * at least attempt regular pruning.
+ */
+
+/*
+ * PRUNE_DO_MARK_UNUSED_NOW indicates whether or not dead items can be set
+ * LP_UNUSED during pruning.
+ */
+#define PRUNE_DO_MARK_UNUSED_NOW (1 << 1)
+
/*
* Per-page state returned from pruning
*/
@@ -331,7 +342,7 @@ struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
- bool mark_unused_now,
+ uint8 actions,
PruneResult *presult,
PruneReason reason,
OffsetNumber *off_loc);
--
2.39.2
v11-0004-Prepare-freeze-tuples-in-heap_page_prune.patchtext/x-patch; charset=UTF-8; name=v11-0004-Prepare-freeze-tuples-in-heap_page_prune.patchDownload
From 083690b946e19ab5e536a9f2689772e7b91d2a70 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 29 Mar 2024 21:22:14 -0400
Subject: [PATCH v11 4/7] Prepare freeze tuples in heap_page_prune()
In order to combine the freeze and prune records, we must determine
which tuples are freezable before actually executing pruning. All of the
page modifications should be made in the same critical section along
with emitting the combined WAL. Determine whether or not tuples should
or must be frozen and whether or not the page will be all frozen as a
consequence during pruning.
---
src/backend/access/heap/heapam.c | 6 +--
src/backend/access/heap/pruneheap.c | 64 +++++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 67 ++++++++++------------------
src/include/access/heapam.h | 25 ++++++++++-
4 files changed, 103 insertions(+), 59 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b661d9811eb..c5b52978380 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6477,10 +6477,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
*/
bool
heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen)
{
+ const struct VacuumCutoffs *cutoffs = pagefrz->cutoffs;
bool xmin_already_frozen = false,
xmax_already_frozen = false;
bool freeze_xmin = false,
@@ -6891,9 +6891,9 @@ heap_freeze_tuple(HeapTupleHeader tuple,
pagefrz.FreezePageRelminMxid = MultiXactCutoff;
pagefrz.NoFreezePageRelfrozenXid = FreezeLimit;
pagefrz.NoFreezePageRelminMxid = MultiXactCutoff;
+ pagefrz.cutoffs = &cutoffs;
- do_freeze = heap_prepare_freeze_tuple(tuple, &cutoffs,
- &pagefrz, &frz, &totally_frozen);
+ do_freeze = heap_prepare_freeze_tuple(tuple, &pagefrz, &frz, &totally_frozen);
/*
* Note that because this is not a WAL-logged operation, we don't need to
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 30965c3c5a1..8bdd6389b25 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -17,6 +17,7 @@
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
+#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
@@ -72,7 +73,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
HeapTuple tup,
Buffer buffer);
static void heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
- OffsetNumber rootoffnum, int8 *htsv, PruneState *prstate);
+ OffsetNumber rootoffnum, int8 *htsv, PruneState *prstate, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum, bool was_normal);
@@ -81,7 +82,7 @@ static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber o
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, PruneResult *presult, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
@@ -166,6 +167,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
OffsetNumber dummy_off_loc;
PruneResult presult;
+ presult.pagefrz.freeze_required = false;
+ presult.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ presult.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ presult.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ presult.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ presult.pagefrz.cutoffs = NULL;
+
/*
* For now, do not set PRUNE_DO_MARK_UNUSED_NOW regardless of
* whether or not the relation has indexes, since we cannot safely
@@ -264,6 +272,16 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.nroot_items = 0;
prstate.nheaponly_items = 0;
+ /*
+ * If we will prepare to freeze tuples, consider that it might be possible
+ * to set the page all-frozen in the visibility map.
+ */
+ if (prstate.actions & PRUNE_DO_TRY_FREEZE)
+ presult->all_frozen = true;
+ else
+ presult->all_frozen = false;
+
+
/*
* presult->htsv is not initialized here because all ntuple spots in the
* array will be set either to a valid HTSV_Result value or -1.
@@ -271,6 +289,8 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->ndeleted = 0;
presult->nnewlpdead = 0;
+ presult->nfrozen = 0;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
@@ -371,7 +391,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Process this item or chain of items */
heap_prune_chain(page, blockno, maxoff,
- offnum, presult->htsv, &prstate);
+ offnum, presult->htsv, &prstate, presult);
}
/*
@@ -421,7 +441,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* HOT-updated member of a chain, it should have already been
* processed by heap_prune_chain().
*/
- heap_prune_record_unchanged_lp_normal(page, presult->htsv, &prstate, offnum);
+ heap_prune_record_unchanged_lp_normal(page, presult->htsv, &prstate, presult, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -559,7 +579,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
*/
static void
heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
- OffsetNumber rootoffnum, int8 *htsv, PruneState *prstate)
+ OffsetNumber rootoffnum, int8 *htsv, PruneState *prstate, PruneResult *presult)
{
TransactionId priorXmax = InvalidTransactionId;
ItemId rootlp;
@@ -728,7 +748,7 @@ process_chain:
i++;
}
for (; i < nchain; i++)
- heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
+ heap_prune_record_unchanged_lp_normal(page, htsv, prstate, presult, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -754,7 +774,7 @@ process_chain:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (int i = ndeadchain; i < nchain; i++)
- heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
+ heap_prune_record_unchanged_lp_normal(page, htsv, prstate, presult, chainitems[i]);
}
}
@@ -878,9 +898,10 @@ heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumb
* Record LP_NORMAL line pointer that is left unchanged.
*/
static void
-heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate,
+ PruneResult *presult, OffsetNumber offnum)
{
- HeapTupleHeader htup;
+ HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
@@ -901,8 +922,6 @@ heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate
case HEAPTUPLE_RECENTLY_DEAD:
case HEAPTUPLE_DELETE_IN_PROGRESS:
- htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
-
/*
* This tuple may soon become DEAD. Update the hint field so that
* the page is reconsidered for pruning in future.
@@ -921,6 +940,29 @@ heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d", htsv[offnum]);
break;
}
+
+ /* Consider freezing any normal tuples which will not be removed */
+ if (prstate->actions & PRUNE_DO_TRY_FREEZE)
+ {
+ /* Tuple with storage -- consider need to freeze */
+ bool totally_frozen;
+
+ if ((heap_prepare_freeze_tuple(htup, &presult->pagefrz,
+ &presult->frozen[presult->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ presult->frozen[presult->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to
+ * become totally frozen (according to its freeze plan), then the page
+ * definitely cannot be set all-frozen in the visibility map later on
+ */
+ if (!totally_frozen)
+ presult->all_frozen = false;
+ }
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 880a218cb4d..679c6a866ea 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1416,19 +1416,15 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff;
ItemId itemid;
PruneResult presult;
- int tuples_frozen,
- lpdead_items,
+ int lpdead_items,
live_tuples,
recently_dead_tuples;
- HeapPageFreeze pagefrz;
bool hastup = false;
- bool all_visible,
- all_frozen;
+ bool all_visible;
TransactionId visibility_cutoff_xid;
uint8 actions = 0;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1440,12 +1436,12 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff = PageGetMaxOffsetNumber(page);
/* Initialize (or reset) page-level state */
- pagefrz.freeze_required = false;
- pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
- pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
- tuples_frozen = 0;
+ presult.pagefrz.freeze_required = false;
+ presult.pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
+ presult.pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
+ presult.pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
+ presult.pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ presult.pagefrz.cutoffs = &vacrel->cutoffs;
lpdead_items = 0;
live_tuples = 0;
recently_dead_tuples = 0;
@@ -1462,6 +1458,7 @@ lazy_scan_prune(LVRelState *vacrel,
* items LP_UNUSED, so PRUNE_DO_MARK_UNUSED_NOW should be set if no
* indexes and unset otherwise.
*/
+ actions |= PRUNE_DO_TRY_FREEZE;
if (vacrel->nindexes == 0)
actions |= PRUNE_DO_MARK_UNUSED_NOW;
@@ -1479,7 +1476,6 @@ lazy_scan_prune(LVRelState *vacrel,
* Also keep track of the visibility cutoff xid for recovery conflicts.
*/
all_visible = true;
- all_frozen = true;
visibility_cutoff_xid = InvalidTransactionId;
/*
@@ -1491,7 +1487,6 @@ lazy_scan_prune(LVRelState *vacrel,
offnum = OffsetNumberNext(offnum))
{
HeapTupleHeader htup;
- bool totally_frozen;
/*
* Set the offset number so that we can display it along with any
@@ -1638,22 +1633,6 @@ lazy_scan_prune(LVRelState *vacrel,
}
hastup = true; /* page makes rel truncation unsafe */
-
- /* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
- &frozen[tuples_frozen], &totally_frozen))
- {
- /* Save prepared freeze plan for later */
- frozen[tuples_frozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the page
- * definitely cannot be set all-frozen in the visibility map later on
- */
- if (!totally_frozen)
- all_frozen = false;
}
/*
@@ -1670,18 +1649,18 @@ lazy_scan_prune(LVRelState *vacrel,
* freeze when pruning generated an FPI, if doing so means that we set the
* page all-frozen afterwards (might not happen until final heap pass).
*/
- if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (all_visible && all_frozen &&
+ if (presult.pagefrz.freeze_required || presult.nfrozen == 0 ||
+ (all_visible && presult.all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
* We're freezing the page. Our final NewRelfrozenXid doesn't need to
* be affected by the XIDs that are just about to be frozen anyway.
*/
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
+ vacrel->NewRelfrozenXid = presult.pagefrz.FreezePageRelfrozenXid;
+ vacrel->NewRelminMxid = presult.pagefrz.FreezePageRelminMxid;
- if (tuples_frozen == 0)
+ if (presult.nfrozen == 0)
{
/*
* We have no freeze plans to execute, so there's no added cost
@@ -1709,7 +1688,7 @@ lazy_scan_prune(LVRelState *vacrel,
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (all_visible && all_frozen)
+ if (all_visible && presult.all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
snapshotConflictHorizon = visibility_cutoff_xid;
@@ -1725,7 +1704,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Execute all freeze plans for page as a single atomic action */
heap_freeze_execute_prepared(vacrel->rel, buf,
snapshotConflictHorizon,
- frozen, tuples_frozen);
+ presult.frozen, presult.nfrozen);
}
}
else
@@ -1734,10 +1713,10 @@ lazy_scan_prune(LVRelState *vacrel,
* Page requires "no freeze" processing. It might be set all-visible
* in the visibility map, but it can never be set all-frozen.
*/
- vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- all_frozen = false;
- tuples_frozen = 0; /* avoid miscounts in instrumentation */
+ vacrel->NewRelfrozenXid = presult.pagefrz.NoFreezePageRelfrozenXid;
+ vacrel->NewRelminMxid = presult.pagefrz.NoFreezePageRelminMxid;
+ presult.all_frozen = false;
+ presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1801,7 +1780,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += tuples_frozen;
+ vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += lpdead_items;
vacrel->live_tuples += live_tuples;
vacrel->recently_dead_tuples += recently_dead_tuples;
@@ -1824,7 +1803,7 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (all_frozen)
+ if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(visibility_cutoff_xid));
flags |= VISIBILITYMAP_ALL_FROZEN;
@@ -1895,7 +1874,7 @@ lazy_scan_prune(LVRelState *vacrel,
* true, so we must check both all_visible and all_frozen.
*/
else if (all_visible_according_to_vm && all_visible &&
- all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 35b8486c34a..ac129692c13 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -189,6 +189,7 @@ typedef struct HeapPageFreeze
TransactionId NoFreezePageRelfrozenXid;
MultiXactId NoFreezePageRelminMxid;
+ struct VacuumCutoffs *cutoffs;
} HeapPageFreeze;
/*
@@ -202,6 +203,15 @@ typedef struct HeapPageFreeze
*/
#define PRUNE_DO_MARK_UNUSED_NOW (1 << 1)
+/*
+ * Prepare to freeze if advantageous or required and try to advance
+ * relfrozenxid and relminmxid. To attempt freezing, we will need to determine
+ * if the page is all frozen. So, if this action is set, we will also inform
+ * the caller if the page is all-visible and/or all-frozen and calculate a
+ * snapshot conflict horizon for updating the visibility map.
+ */
+#define PRUNE_DO_TRY_FREEZE (1 << 2)
+
/*
* Per-page state returned from pruning
*/
@@ -220,6 +230,20 @@ typedef struct PruneResult
* 1. Otherwise every access would need to subtract 1.
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * Prepare to freeze in heap_page_prune(). lazy_scan_prune() will use the
+ * returned freeze plans to execute freezing.
+ */
+ HeapPageFreeze pagefrz;
+
+ /*
+ * Whether or not the page can be set all-frozen in the visibility map.
+ * This is only set if the PRUNE_DO_TRY_FREEZE action flag is set.
+ */
+ bool all_frozen;
+ int nfrozen;
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
} PruneResult;
/* 'reason' codes for heap_page_prune() */
@@ -314,7 +338,6 @@ extern TM_Result heap_lock_tuple(Relation relation, ItemPointer tid,
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
- const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
--
2.39.2
v11-0005-Set-hastup-in-heap_page_prune.patchtext/x-patch; charset=UTF-8; name=v11-0005-Set-hastup-in-heap_page_prune.patchDownload
From ef8cb2c089ad9474a6da309593029c08f71b0bb9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 29 Mar 2024 21:36:37 -0400
Subject: [PATCH v11 5/7] Set hastup in heap_page_prune
lazy_scan_prune() loops through the line pointers and tuple visibility
information for each tuple on a page, setting hastup to true if there
are any LP_REDIRECT line pointers or tuples with storage which will not
be removed. We want to remove this extra loop from lazy_scan_prune(),
and we know about non-removable tuples during heap_page_prune() anyway.
Set hastup when recording LP_REDIRECT line pointers in
heap_prune_chain() and when LP_NORMAL line pointers refer to tuples
whose visibility status is not HEAPTUPLE_DEAD.
---
src/backend/access/heap/pruneheap.c | 24 ++++++++++++++++++++++--
src/backend/access/heap/vacuumlazy.c | 17 +----------------
src/include/access/heapam.h | 8 ++++++++
3 files changed, 31 insertions(+), 18 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8bdd6389b25..65b0ed185ff 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -66,6 +66,9 @@ typedef struct
bool processed[MaxHeapTuplesPerPage + 1];
int ndeleted; /* Number of tuples deleted from the page */
+
+ /* Whether or not the page makes rel truncation unsafe */
+ bool hastup;
} PruneState;
/* Local functions */
@@ -271,6 +274,7 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.ndeleted = 0;
prstate.nroot_items = 0;
prstate.nheaponly_items = 0;
+ prstate.hastup = false;
/*
* If we will prepare to freeze tuples, consider that it might be possible
@@ -280,7 +284,7 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->all_frozen = true;
else
presult->all_frozen = false;
-
+ presult->hastup = prstate.hastup;
/*
* presult->htsv is not initialized here because all ntuple spots in the
@@ -819,6 +823,8 @@ heap_prune_record_redirect(PruneState *prstate,
*/
if (was_normal)
prstate->ndeleted++;
+
+ prstate->hastup = true;
}
/* Record line pointer to be marked dead */
@@ -901,11 +907,15 @@ static void
heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate,
PruneResult *presult, OffsetNumber offnum)
{
- HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+ HeapTupleHeader htup;
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
+ presult->hastup = true; /* the page is not empty */
+
+ htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+
switch (htsv[offnum])
{
case HEAPTUPLE_LIVE:
@@ -974,6 +984,16 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
{
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
+
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the soft
+ * assumption that any LP_DEAD items encountered here will become
+ * LP_UNUSED later on, before count_nondeletable_pages is reached. If we
+ * don't make this assumption then rel truncation will only happen every
+ * other VACUUM, at most. Besides, VACUUM must treat
+ * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
+ * handled (handled here, or handled later on).
+ */
}
static void
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 679c6a866ea..212d76045ef 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1419,7 +1419,6 @@ lazy_scan_prune(LVRelState *vacrel,
int lpdead_items,
live_tuples,
recently_dead_tuples;
- bool hastup = false;
bool all_visible;
TransactionId visibility_cutoff_xid;
uint8 actions = 0;
@@ -1500,23 +1499,11 @@ lazy_scan_prune(LVRelState *vacrel,
/* Redirect items mustn't be touched */
if (ItemIdIsRedirected(itemid))
- {
- /* page makes rel truncation unsafe */
- hastup = true;
continue;
- }
if (ItemIdIsDead(itemid))
{
/*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- *
* Also deliberately delay unsetting all_visible until just before
* we return to lazy_scan_heap caller, as explained in full below.
* (This is another case where it's useful to anticipate that any
@@ -1631,8 +1618,6 @@ lazy_scan_prune(LVRelState *vacrel,
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
break;
}
-
- hastup = true; /* page makes rel truncation unsafe */
}
/*
@@ -1786,7 +1771,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->recently_dead_tuples += recently_dead_tuples;
/* Can't truncate this page */
- if (hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ac129692c13..58cfa544ac0 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -231,6 +231,14 @@ typedef struct PruneResult
*/
int8 htsv[MaxHeapTuplesPerPage + 1];
+ /*
+ * Whether or not the page makes rel truncation unsafe
+ *
+ * This is set to 'true', even if the page contains LP_DEAD items. VACUUM
+ * will remove them before attempting to truncate.
+ */
+ bool hastup;
+
/*
* Prepare to freeze in heap_page_prune(). lazy_scan_prune() will use the
* returned freeze plans to execute freezing.
--
2.39.2
v11-0006-Save-dead-tuple-offsets-during-heap_page_prune.patchtext/x-patch; charset=UTF-8; name=v11-0006-Save-dead-tuple-offsets-during-heap_page_prune.patchDownload
From dffa90d0bd8e972cfe26da96860051dc8c2a8576 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 30 Mar 2024 01:27:08 -0400
Subject: [PATCH v11 6/7] Save dead tuple offsets during heap_page_prune
After heap_page_prune() returned, lazy_scan_prune() looped through all
of the offsets of LP_DEAD items which it later added to
LVRelState->dead_items. Instead take care of this when marking a line
pointer or when an existing non-removable LP_DEAD item is encountered in
heap_prune_chain().
Because deadoffsets are expected to be in order in
LVRelState->dead_items, sort the deadoffsets before saving them there.
---
src/backend/access/heap/pruneheap.c | 17 +++++++++++++
src/backend/access/heap/vacuumlazy.c | 38 +++++++++++++++++++---------
src/include/access/heapam.h | 7 +++++
3 files changed, 50 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 65b0ed185ff..0f0391b3165 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -69,6 +69,13 @@ typedef struct
/* Whether or not the page makes rel truncation unsafe */
bool hastup;
+
+ /*
+ * LP_DEAD items on the page after pruning. Includes existing LP_DEAD
+ * items
+ */
+ int lpdead_items; /* includes existing LP_DEAD items */
+ OffsetNumber *deadoffsets; /* points directly to PruneResult->deadoffsets */
} PruneState;
/* Local functions */
@@ -275,6 +282,8 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.nroot_items = 0;
prstate.nheaponly_items = 0;
prstate.hastup = false;
+ prstate.lpdead_items = 0;
+ prstate.deadoffsets = presult->deadoffsets;
/*
* If we will prepare to freeze tuples, consider that it might be possible
@@ -532,6 +541,8 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Copy information back for caller */
presult->nnewlpdead = prstate.ndead;
presult->ndeleted = prstate.ndeleted;
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
}
@@ -839,6 +850,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
+ /* Record the dead offset for vacuum */
+ prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
/*
* If the root entry had been a normal tuple, we are deleting it, so count
* it in the result. But changing a redirect (even to DEAD state) doesn't
@@ -994,6 +1008,9 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* hastup/nonempty_pages as provisional no matter how LP_DEAD items are
* handled (handled here, or handled later on).
*/
+
+ /* Record the dead offset for vacuum */
+ prstate->deadoffsets[prstate->lpdead_items++] = offnum;
}
static void
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 212d76045ef..7f1e4db55c0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1373,6 +1373,15 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+static int
+OffsetNumber_cmp(const void *a, const void *b)
+{
+ OffsetNumber na = *(const OffsetNumber *) a,
+ nb = *(const OffsetNumber *) b;
+
+ return na < nb ? -1 : na > nb;
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -1416,14 +1425,12 @@ lazy_scan_prune(LVRelState *vacrel,
maxoff;
ItemId itemid;
PruneResult presult;
- int lpdead_items,
- live_tuples,
+ int live_tuples,
recently_dead_tuples;
bool all_visible;
TransactionId visibility_cutoff_xid;
uint8 actions = 0;
int64 fpi_before = pgWalUsage.wal_fpi;
- OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -1441,7 +1448,6 @@ lazy_scan_prune(LVRelState *vacrel,
presult.pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
presult.pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
presult.pagefrz.cutoffs = &vacrel->cutoffs;
- lpdead_items = 0;
live_tuples = 0;
recently_dead_tuples = 0;
@@ -1509,7 +1515,6 @@ lazy_scan_prune(LVRelState *vacrel,
* (This is another case where it's useful to anticipate that any
* LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
*/
- deadoffsets[lpdead_items++] = offnum;
continue;
}
@@ -1713,7 +1718,7 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && lpdead_items == 0)
+ if (all_visible && presult.lpdead_items == 0)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
@@ -1730,7 +1735,7 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
- if (lpdead_items > 0)
+ if (presult.lpdead_items > 0)
{
VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
@@ -1739,9 +1744,18 @@ lazy_scan_prune(LVRelState *vacrel,
ItemPointerSetBlockNumber(&tmp, blkno);
- for (int i = 0; i < lpdead_items; i++)
+ /*
+ * dead_items are expected to be in order. However, deadoffsets are
+ * collected incrementally in heap_page_prune_and_freeze() as each
+ * dead line pointer is recorded, with an indeterminate order. As
+ * such, sort the deadoffsets before saving them in LVRelState.
+ */
+ qsort(presult.deadoffsets, presult.lpdead_items, sizeof(OffsetNumber),
+ OffsetNumber_cmp);
+
+ for (int i = 0; i < presult.lpdead_items; i++)
{
- ItemPointerSetOffsetNumber(&tmp, deadoffsets[i]);
+ ItemPointerSetOffsetNumber(&tmp, presult.deadoffsets[i]);
dead_items->items[dead_items->num_items++] = tmp;
}
@@ -1766,7 +1780,7 @@ lazy_scan_prune(LVRelState *vacrel,
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
- vacrel->lpdead_items += lpdead_items;
+ vacrel->lpdead_items += presult.lpdead_items;
vacrel->live_tuples += live_tuples;
vacrel->recently_dead_tuples += recently_dead_tuples;
@@ -1775,7 +1789,7 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
- *has_lpdead_items = (lpdead_items > 0);
+ *has_lpdead_items = (presult.lpdead_items > 0);
Assert(!all_visible || !(*has_lpdead_items));
@@ -1843,7 +1857,7 @@ lazy_scan_prune(LVRelState *vacrel,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (lpdead_items > 0 && PageIsAllVisible(page))
+ else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 58cfa544ac0..a2a86ffa078 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -252,6 +252,13 @@ typedef struct PruneResult
bool all_frozen;
int nfrozen;
HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+
+ /*
+ * LP_DEAD items on the page after pruning. Includes existing LP_DEAD
+ * items
+ */
+ int lpdead_items;
+ OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneResult;
/* 'reason' codes for heap_page_prune() */
--
2.39.2
v11-0007-Combine-freezing-and-pruning.patchtext/x-patch; charset=UTF-8; name=v11-0007-Combine-freezing-and-pruning.patchDownload
From a2669427c9a3cc7062d5c7f715002a8bd03576f8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Sat, 30 Mar 2024 01:38:01 -0400
Subject: [PATCH v11 7/7] Combine freezing and pruning
Execute both freezing and pruning of tuples and emit a single WAL record
containing all changes.
---
src/backend/access/heap/heapam.c | 76 +--
src/backend/access/heap/heapam_handler.c | 2 +-
src/backend/access/heap/pruneheap.c | 698 ++++++++++++++++++-----
src/backend/access/heap/vacuumlazy.c | 352 ++----------
src/include/access/heapam.h | 75 +--
src/tools/pgindent/typedefs.list | 2 +-
6 files changed, 673 insertions(+), 532 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c5b52978380..cff0f080660 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6127,9 +6127,9 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
*/
static TransactionId
FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
- const struct VacuumCutoffs *cutoffs, uint16 *flags,
- HeapPageFreeze *pagefrz)
+ uint16 *flags, HeapPageFreeze *pagefrz)
{
+ const struct VacuumCutoffs *cutoffs = pagefrz->cutoffs;
TransactionId newxmax;
MultiXactMember *members;
int nmembers;
@@ -6447,9 +6447,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* XIDs or MultiXactIds that will need to be processed by a future VACUUM.
*
* VACUUM caller must assemble HeapTupleFreeze freeze plan entries for every
- * tuple that we returned true for, and call heap_freeze_execute_prepared to
- * execute freezing. Caller must initialize pagefrz fields for page as a
- * whole before first call here for each heap page.
+ * tuple that we returned true for, and then execute freezing. Caller must
+ * initialize pagefrz fields for page as a whole before first call here for
+ * each heap page.
*
* VACUUM caller decides on whether or not to freeze the page as a whole.
* We'll often prepare freeze plans for a page that caller just discards.
@@ -6552,8 +6552,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* perform no-op xmax processing. The only constraint is that the
* FreezeLimit/MultiXactCutoff postcondition must never be violated.
*/
- newxmax = FreezeMultiXactId(xid, tuple->t_infomask, cutoffs,
- &flags, pagefrz);
+ newxmax = FreezeMultiXactId(xid, tuple->t_infomask, &flags, pagefrz);
if (flags & FRM_NOOP)
{
@@ -6731,7 +6730,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
* Does this tuple force caller to freeze the entire page?
*/
pagefrz->freeze_required =
- heap_tuple_should_freeze(tuple, cutoffs,
+ heap_tuple_should_freeze(tuple, pagefrz->cutoffs,
&pagefrz->NoFreezePageRelfrozenXid,
&pagefrz->NoFreezePageRelminMxid);
}
@@ -6765,35 +6764,19 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
+ * Perform xmin/xmax XID status sanity checks before actually executing freeze
+ * plans.
+ *
+ * heap_prepare_freeze_tuple doesn't perform these checks directly because
+ * pg_xact lookups are relatively expensive. They shouldn't be repeated
+ * by successive VACUUMs that each decide against freezing the same page.
*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- /*
- * Perform xmin/xmax XID status sanity checks before critical section.
- *
- * heap_prepare_freeze_tuple doesn't perform these checks directly because
- * pg_xact lookups are relatively expensive. They shouldn't be repeated
- * by successive VACUUMs that each decide against freezing the same page.
- */
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6832,8 +6815,19 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xmax)));
}
}
+}
- START_CRIT_SECTION();
+/*
+ * Helper which executes freezing of one or more heap tuples on a page on
+ * behalf of caller. Caller passes an array of tuple plans from
+ * heap_prepare_freeze_tuple. Caller must set 'offset' in each plan for us.
+ * Must be called in a critical section that also marks the buffer dirty and,
+ * if needed, emits WAL.
+ */
+void
+heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
for (int i = 0; i < ntuples; i++)
{
@@ -6844,22 +6838,6 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
htup = (HeapTupleHeader) PageGetItem(page, itemid);
heap_execute_freeze_tuple(htup, frz);
}
-
- MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(rel))
- {
- log_heap_prune_and_freeze(rel, buffer, snapshotConflictHorizon,
- false, /* no cleanup lock required */
- PRUNE_VACUUM_SCAN,
- tuples, ntuples,
- NULL, 0, /* redirected */
- NULL, 0, /* dead */
- NULL, 0); /* unused */
- }
-
- END_CRIT_SECTION();
}
/*
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 41a4bb0981d..e879ea70b3c 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1123,7 +1123,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
* We ignore unused and redirect line pointers. DEAD line pointers
* should be counted as dead, because we need vacuum to run to get rid
* of them. Note that this rule agrees with the way that
- * heap_page_prune() counts things.
+ * heap_page_prune_and_freeze() counts things.
*/
if (!ItemIdIsNormal(itemid))
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0f0391b3165..e03ab43c497 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,13 +21,15 @@
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
-/* Working data for heap_page_prune and subroutines */
+/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
/* PRUNE_DO_* arguments */
@@ -36,26 +38,22 @@ typedef struct
/* tuple visibility test, initialized for the relation */
GlobalVisState *vistest;
- TransactionId new_prune_xid; /* new prune hint value for page */
- TransactionId snapshotConflictHorizon; /* latest xid removed */
+ /*
+ * Fields describing what to do to the page
+ */
+ TransactionId new_prune_xid; /* new prune hint value */
+ TransactionId latest_xid_removed;
int nredirected; /* numbers of entries in arrays below */
int ndead;
int nunused;
+ int nfrozen;
/* arrays that accumulate indexes of items to be changed */
OffsetNumber redirected[MaxHeapTuplesPerPage * 2];
OffsetNumber nowdead[MaxHeapTuplesPerPage];
OffsetNumber nowunused[MaxHeapTuplesPerPage];
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
- /*
- * 'root_items' contains offsets of all LP_REDIRECT line pointers and
- * normal non-HOT tuples. They can be stand-alone items or the first item
- * in a HOT chain. 'heaponly_items' contains heap-only tuples which can
- * only be removed as part of a HOT chain.
- */
- int nroot_items;
- OffsetNumber root_items[MaxHeapTuplesPerPage];
- int nheaponly_items;
- OffsetNumber heaponly_items[MaxHeapTuplesPerPage];
+ HeapPageFreeze pagefrz;
/*
* processed[offnum] is true if item at offnum has been processed.
@@ -65,8 +63,31 @@ typedef struct
*/
bool processed[MaxHeapTuplesPerPage + 1];
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * The rest of the fields are not used by pruning itself, but are used to
+ * collect information about what was pruned and what state the page is in
+ * after pruning, for the benefit of the caller. They are copied to
+ * PruneFreezeResult at the end.
+ */
+
int ndeleted; /* Number of tuples deleted from the page */
+ /* Number of live and recently dead tuples, after pruning */
+ int live_tuples;
+ int recently_dead_tuples;
+
/* Whether or not the page makes rel truncation unsafe */
bool hastup;
@@ -76,23 +97,58 @@ typedef struct
*/
int lpdead_items; /* includes existing LP_DEAD items */
OffsetNumber *deadoffsets; /* points directly to PruneResult->deadoffsets */
+
+ /*
+ * all_visible and all_frozen indicate if the all-visible and all-frozen
+ * bits in the visibility map can be set for this page, after pruning.
+ *
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page.
+ * The caller can use it as the conflict horizon, when setting the VM
+ * bits. It is only valid if we froze some tuples, and all_frozen is
+ * true.
+ *
+ * These are only set if the PRUNE_DO_TRY_FREEZE action flag is set.
+ *
+ * NOTE: This 'all_visible' doesn't include LP_DEAD items. That's
+ * convenient for heap_page_prune_and_freeze(), to use this to decide
+ * whether to freeze the page or not. The 'all_visible' value returned to
+ * the caller is adjusted to include LP_DEAD items at the end.
+ */
+ bool all_visible;
+ bool all_frozen;
+ TransactionId visibility_cutoff_xid;
+
+ /*
+ * 'root_items' contains offsets of all LP_REDIRECT line pointers and
+ * normal non-HOT tuples. They can be stand-alone items or the first item
+ * in a HOT chain. 'heaponly_items' contains heap-only tuples which can
+ * only be removed as part of a HOT chain.
+ */
+ int nroot_items;
+ OffsetNumber root_items[MaxHeapTuplesPerPage];
+ int nheaponly_items;
+ OffsetNumber heaponly_items[MaxHeapTuplesPerPage];
} PruneState;
/* Local functions */
static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
HeapTuple tup,
Buffer buffer);
+static inline HTSV_Result htsv_get_valid_status(int status);
static void heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
- OffsetNumber rootoffnum, int8 *htsv, PruneState *prstate, PruneResult *presult);
+ OffsetNumber rootoffnum, PruneState *prstate);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum, bool was_normal);
-static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ bool was_normal);
+static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ bool was_normal);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ bool was_normal);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, PruneResult *presult, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
@@ -175,14 +231,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
OffsetNumber dummy_off_loc;
- PruneResult presult;
-
- presult.pagefrz.freeze_required = false;
- presult.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
- presult.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
- presult.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
- presult.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
- presult.pagefrz.cutoffs = NULL;
+ PruneFreezeResult presult;
/*
* For now, do not set PRUNE_DO_MARK_UNUSED_NOW regardless of
@@ -190,8 +239,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* determine that during on-access pruning with the current
* implementation.
*/
- heap_page_prune(relation, buffer, vistest, 0,
- &presult, PRUNE_ON_ACCESS, &dummy_off_loc);
+ heap_page_prune_and_freeze(relation, buffer, 0, vistest,
+ NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -225,35 +274,52 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
/*
- * Prune and repair fragmentation in the specified page.
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
+ *
+ * If the page can be marked all-frozen in the visibility map, we may
+ * opportunistically freeze tuples on the page if either its tuples are old
+ * enough or freezing will be cheap enough.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * actions are the pruning actions that heap_page_prune_and_freeze() should
+ * take.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
- * actions are the pruning actions that heap_page_prune() should take.
+ * cutoffs contains the information on visibility for the whole relation
+ * collected by vacuum at the beginning of vacuuming the relation. It will be
+ * NULL for callers other than vacuum.
*
* presult contains output parameters needed by callers such as the number of
* tuples removed and the number of line pointers newly marked LP_DEAD.
- * heap_page_prune() is responsible for initializing it.
+ * heap_page_prune_and_freeze() is responsible for initializing it.
*
* reason indicates why the pruning is performed. It is included in the WAL
* record for debugging and analysis purposes, but otherwise has no effect.
*
* off_loc is the offset location required by the caller to use in error
* callback.
+ *
+ * new_relfrozen_xid and new_relmin_xid are provided by the caller if they
+ * would like the current values of those updated as part of advancing
+ * relfrozenxid/relminmxid.
*/
void
-heap_page_prune(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- uint8 actions,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc)
+heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ uint8 actions,
+ GlobalVisState *vistest,
+ struct VacuumCutoffs *cutoffs,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -261,6 +327,41 @@ heap_page_prune(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
+ bool do_freeze;
+ bool do_prune;
+ bool do_hint;
+ bool hint_bit_fpi;
+ int64 fpi_before = pgWalUsage.wal_fpi;
+
+ /*
+ * pagefrz contains visibility cutoff information and the current
+ * relfrozenxid and relminmxids used if the caller is interested in
+ * freezing tuples on the page.
+ */
+ prstate.pagefrz.cutoffs = cutoffs;
+ prstate.pagefrz.freeze_required = false;
+
+ if (new_relmin_mxid)
+ {
+ prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
+ prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+ }
+ else
+ {
+ prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ }
+
+ if (new_relfrozen_xid)
+ {
+ prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
+ }
+ else
+ {
+ prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ }
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -276,37 +377,71 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.new_prune_xid = InvalidTransactionId;
prstate.vistest = vistest;
prstate.actions = actions;
- prstate.snapshotConflictHorizon = InvalidTransactionId;
- prstate.nredirected = prstate.ndead = prstate.nunused = 0;
+ prstate.latest_xid_removed = InvalidTransactionId;
+ prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
+
+ /*
+ * prstate.htsv is not initialized here because all ntuple spots in the
+ * array will be set either to a valid HTSV_Result value or -1.
+ */
+
prstate.ndeleted = 0;
prstate.nroot_items = 0;
prstate.nheaponly_items = 0;
prstate.hastup = false;
+ prstate.live_tuples = 0;
+ prstate.recently_dead_tuples = 0;
prstate.lpdead_items = 0;
prstate.deadoffsets = presult->deadoffsets;
/*
- * If we will prepare to freeze tuples, consider that it might be possible
- * to set the page all-frozen in the visibility map.
+ * Caller may update the VM after we're done. We keep track of whether
+ * the page will be all_visible and all_frozen, once we're done with the
+ * pruning and freezing, to help the caller to do that.
+ *
+ * Currently, only VACUUM sets the VM bits. To save the effort, only do
+ * only the bookkeeping if the caller needs it. Currently, that's tied to
+ * PRUNE_DO_TRY_FREEZE, but it could be a separate flag, if you wanted to
+ * update the VM bits without also freezing, or freezing without setting
+ * the VM bits.
+ *
+ * In addition to telling the caller whether it can set the VM bit, we
+ * also use 'all_visible' and 'all_frozen' for our own decision-making. If
+ * the whole page will become frozen, we consider opportunistically
+ * freezing tuples. We will not be able to freeze the whole page if there
+ * are tuples present which are not visible to everyone or if there are
+ * dead tuples which are not yet removable. However, dead tuples which
+ * will be removed by the end of vacuuming should not preclude us from
+ * opportunistically freezing. Because of that, we do not clear
+ * all_visible when we see LP_DEAD items. We fix that at the end of the
+ * function, when we return the value to the caller, so that the caller
+ * doesn't set the VM bit incorrectly.
*/
if (prstate.actions & PRUNE_DO_TRY_FREEZE)
- presult->all_frozen = true;
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = true;
+ }
else
- presult->all_frozen = false;
- presult->hastup = prstate.hastup;
+ {
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
/*
- * presult->htsv is not initialized here because all ntuple spots in the
- * array will be set either to a valid HTSV_Result value or -1.
+ * The visibility cutoff xid is the newest xmin of live tuples on the
+ * page. In the common case, this will be set as the conflict horizon the
+ * caller can use for updating the VM. If, at the end of freezing and
+ * pruning, the page is all-frozen, there is no possibility that any
+ * running transaction on the standby does not see tuples on the page as
+ * all-visible, so the conflict horizon remains InvalidTransactionId.
*/
- presult->ndeleted = 0;
- presult->nnewlpdead = 0;
-
- presult->nfrozen = 0;
+ prstate.visibility_cutoff_xid = InvalidTransactionId;
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
+
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
* chain roots or as a heap-only items.
@@ -342,7 +477,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*off_loc = offnum;
prstate.processed[offnum] = false;
- presult->htsv[offnum] = -1;
+ prstate.htsv[offnum] = -1;
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
@@ -381,8 +516,8 @@ heap_page_prune(Relation relation, Buffer buffer,
tup.t_len = ItemIdGetLength(itemid);
ItemPointerSet(&tup.t_self, blockno, offnum);
- presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
if (!HeapTupleHeaderIsHeapOnly(htup))
prstate.root_items[prstate.nroot_items++] = offnum;
@@ -390,6 +525,12 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
}
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted.
+ */
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+
/* Process HOT chains */
for (int i = prstate.nroot_items - 1; i >= 0; i--)
{
@@ -403,8 +544,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*off_loc = offnum;
/* Process this item or chain of items */
- heap_prune_chain(page, blockno, maxoff,
- offnum, presult->htsv, &prstate, presult);
+ heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
}
/*
@@ -421,7 +561,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* see preceding loop */
*off_loc = offnum;
- if (presult->htsv[offnum] == HEAPTUPLE_DEAD)
+ if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -443,7 +583,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (!HeapTupleHeaderIsHotUpdated(htup))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate.snapshotConflictHorizon);
+ &prstate.latest_xid_removed);
heap_prune_record_unused(&prstate, offnum, true);
continue;
}
@@ -454,7 +594,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* HOT-updated member of a chain, it should have already been
* processed by heap_prune_chain().
*/
- heap_prune_record_unchanged_lp_normal(page, presult->htsv, &prstate, presult, offnum);
+ heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -472,21 +612,80 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
- /* Any error while applying the changes is critical */
- START_CRIT_SECTION();
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
+
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
- /* Have we found any prunable items? */
- if (prstate.nredirected > 0 || prstate.ndead > 0 || prstate.nunused > 0)
+ /*
+ * Freeze the page when heap_prepare_freeze_tuple indicates that at least
+ * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
+ * freeze when pruning generated an FPI, if doing so means that we set the
+ * page all-frozen afterwards (might not happen until final heap pass).
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and prune
+ * records are combined, this heuristic couldn't be used anymore. The
+ * opportunistic freeze heuristic must be improved; however, for now, try
+ * to approximate it.
+ */
+ do_freeze = false;
+ if (prstate.actions & PRUNE_DO_TRY_FREEZE)
{
+ /* Is the whole page freezable? And is there something to freeze? */
+ bool whole_page_freezable = prstate.all_visible &&
+ prstate.all_frozen;
+
+ if (prstate.pagefrz.freeze_required)
+ do_freeze = true;
+ else if (whole_page_freezable && prstate.nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. In this case, we will
+ * freeze if we have already emitted an FPI or will do so anyway.
+ * Be sure only to incur the overhead of checking if we will do an
+ * FPI if we may use that information.
+ */
+ if (hint_bit_fpi ||
+ ((do_prune || do_hint) && XLogCheckBufferNeedsBackup(buffer)))
+ {
+ do_freeze = true;
+ }
+ }
+ }
+
+ /*
+ * Validate the tuples we are considering freezing. We do this even if
+ * pruning and hint bit setting have not emitted an FPI so far because we
+ * still may emit an FPI while setting the page hint bit later. But we
+ * want to avoid doing the pre-freeze checks in a critical section.
+ */
+ if (do_freeze)
+ heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
+ else if (!prstate.all_frozen || prstate.nfrozen > 0)
+ {
+ Assert(!prstate.pagefrz.freeze_required);
+
/*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
+ * If we will neither freeze tuples on the page nor set the page all
+ * frozen in the visibility map, the page is not all-frozen and there
+ * will be no newly frozen tuples.
*/
- heap_page_prune_execute(buffer, false,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
+ prstate.all_frozen = false;
+ prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+
+ /* Any error while applying the changes is critical */
+ START_CRIT_SECTION();
+ if (do_hint)
+ {
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
* XID of any soon-prunable tuple.
@@ -494,12 +693,35 @@ heap_page_prune(Relation relation, Buffer buffer,
((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
/*
- * Also clear the "page is full" flag, since there's no point in
- * repeating the prune/defrag process until something else happens to
- * the page.
+ * Clear the "page is full" flag if it is set since there's no point
+ * in repeating the prune/defrag process until something else happens
+ * to the page.
*/
PageClearFull(page);
+ /*
+ * We only needed to update pd_prune_xid and clear the page-is-full
+ * hint bit, this is a non-WAL-logged hint. If we will also freeze or
+ * prune the page, we will mark the buffer dirty below.
+ */
+ if (!do_freeze && !do_prune)
+ MarkBufferDirtyHint(buffer, true);
+ }
+
+ if (do_prune || do_freeze)
+ {
+ /* Apply the planned item changes, then repair page fragmentation. */
+ if (do_prune)
+ {
+ heap_page_prune_execute(buffer, false,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
+ }
+
+ if (do_freeze)
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+
MarkBufferDirty(buffer);
/*
@@ -507,42 +729,123 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
+ /*
+ * The snapshotConflictHorizon for the whole record should be the
+ * most conservative of all the horizons calculated for any of the
+ * possible modifications. If this record will prune tuples, any
+ * transactions on the standby older than the youngest xmax of the
+ * most recently removed tuple this record will prune will
+ * conflict. If this record will freeze tuples, any transactions
+ * on the standby with xids older than the youngest tuple this
+ * record will freeze will conflict.
+ */
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+ TransactionId conflict_xid;
+
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for
+ * conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (do_freeze)
+ {
+ if (prstate.all_visible && prstate.all_frozen)
+ frz_conflict_horizon = prstate.visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = prstate.pagefrz.cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+ }
+
+ if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
+ conflict_xid = frz_conflict_horizon;
+ else
+ conflict_xid = prstate.latest_xid_removed;
+
log_heap_prune_and_freeze(relation, buffer,
- prstate.snapshotConflictHorizon,
+ conflict_xid,
true, reason,
- NULL, 0,
+ prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
}
}
- else
- {
- /*
- * If we didn't prune anything, but have found a new value for the
- * pd_prune_xid field, update it and mark the buffer dirty. This is
- * treated as a non-WAL-logged hint.
- *
- * Also clear the "page is full" flag if it is set, since there's no
- * point in repeating the prune/defrag process until something else
- * happens to the page.
- */
- if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page))
- {
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
- PageClearFull(page);
- MarkBufferDirtyHint(buffer, true);
- }
- }
END_CRIT_SECTION();
/* Copy information back for caller */
- presult->nnewlpdead = prstate.ndead;
presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which heap pass (initial pass or final pass) ends up setting the
+ * page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state
+ * of things, as expected by our caller.
+ */
+ if (prstate.lpdead_items == 0)
+ {
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
+ }
+ else
+ {
+ presult->all_visible = false;
+ presult->all_frozen = false;
+ }
+ presult->hastup = prstate.hastup;
+
+ /*
+ * For callers planning to update the visibility map, the conflict horizon
+ * for that record must be the newest xmin on the page. However, if the
+ * page is completely frozen, there can be no conflict and the
+ * vm_conflict_horizon should remain InvalidTransactionId. This includes
+ * the case that we just froze all the tuples; the prune-freeze record
+ * included the conflict XID already so the caller doesn't need it.
+ */
+ if (!presult->all_frozen)
+ presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ else
+ presult->vm_conflict_horizon = InvalidTransactionId;
+
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
+
+ /*
+ * If we will freeze tuples on the page or, even if we don't freeze tuples
+ * on the page, if we will set the page all-frozen in the visibility map,
+ * we can advance relfrozenxid and relminmxid to the values in
+ * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
+ */
+ Assert(presult->nfrozen > 0 || !prstate.pagefrz.freeze_required);
+
+ if (new_relfrozen_xid)
+ {
+ if (presult->nfrozen > 0)
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ else
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ }
+ if (new_relmin_mxid)
+ {
+ if (presult->nfrozen > 0)
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ else
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
}
@@ -567,10 +870,24 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
}
+/*
+ * Pruning calculates tuple visibility once and saves the results in an array
+ * of int8. See PruneState.htsv for details. This helper function is meant to
+ * guard against examining visibility status array members which have not yet
+ * been computed.
+ */
+static inline HTSV_Result
+htsv_get_valid_status(int status)
+{
+ Assert(status >= HEAPTUPLE_DEAD &&
+ status <= HEAPTUPLE_DELETE_IN_PROGRESS);
+ return (HTSV_Result) status;
+}
+
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in htsv.
+ * Tuple visibility information is provided in prstate->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -590,11 +907,17 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* prstate showing the changes to be made. Items to be redirected are added
* to the redirected[] array (two entries per redirection); items to be set to
* LP_DEAD state are added to nowdead[]; and items to be set to LP_UNUSED
- * state are added to nowunused[].
+ * state are added to nowunused[]. We perform bookkeeping of live tuples,
+ * visibility etc. based on what the page will look like after the changes
+ * applied. All that bookkeeping is performed in the heap_prune_record_*()
+ * subroutines. The division of labor is that heap_prune_chain() decides the
+ * fate of each tuple, ie. whether it's going to be removed, redirected or
+ * left unchanged, and the heap_prune_record_*() subroutines update PruneState
+ * based on that outcome.
*/
static void
heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
- OffsetNumber rootoffnum, int8 *htsv, PruneState *prstate, PruneResult *presult)
+ OffsetNumber rootoffnum, PruneState *prstate)
{
TransactionId priorXmax = InvalidTransactionId;
ItemId rootlp;
@@ -674,15 +997,14 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
*/
chainitems[nchain++] = offnum;
- switch (htsv_get_valid_status(htsv[offnum]))
+ switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
/* Remember the last DEAD tuple seen */
ndeadchain = nchain;
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
-
+ &prstate->latest_xid_removed);
/* Advance to next chain member */
break;
@@ -738,10 +1060,11 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
{
/*
* We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune caused us to
- * visit the dead successor of a redirect item before visiting the
- * redirect item. We can clean up by setting the redirect item to
- * LP_DEAD state or LP_UNUSED if the caller indicated.
+ * item. This can happen if the loop in heap_page_prune_and_freeze()
+ * caused us to visit the dead successor of a redirect item before
+ * visiting the redirect item. We can clean up by setting the
+ * redirect item to LP_DEAD state or LP_UNUSED if the caller
+ * indicated.
*/
heap_prune_record_dead_or_unused(prstate, rootoffnum, false);
return;
@@ -763,7 +1086,7 @@ process_chain:
i++;
}
for (; i < nchain; i++)
- heap_prune_record_unchanged_lp_normal(page, htsv, prstate, presult, chainitems[i]);
+ heap_prune_record_unchanged_lp_normal(page, prstate, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -789,7 +1112,7 @@ process_chain:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (int i = ndeadchain; i < nchain; i++)
- heap_prune_record_unchanged_lp_normal(page, htsv, prstate, presult, chainitems[i]);
+ heap_prune_record_unchanged_lp_normal(page, prstate, chainitems[i]);
}
}
@@ -850,6 +1173,18 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
+ /*
+ * Deliberately delay unsetting all_visible until later during pruning.
+ * Removable dead tuples shouldn't preclude freezing the page. After
+ * finishing this first pass of tuple visibility checks, initialize
+ * all_visible_except_removable with the current value of all_visible to
+ * indicate whether or not the page is all visible except for dead tuples.
+ * This will allow us to attempt to freeze the page after pruning. Later
+ * during pruning, if we encounter an LP_DEAD item or are setting an item
+ * LP_DEAD, we will unset all_visible. As long as we unset it before
+ * updating the visibility map, this will be correct.
+ */
+
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -915,37 +1250,122 @@ heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumb
}
/*
- * Record LP_NORMAL line pointer that is left unchanged.
+ * Record line pointer that is left unchanged. We consider freezing it, and
+ * update bookkeeping of tuple counts and page visibility.
*/
static void
-heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate,
- PruneResult *presult, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate,
+ OffsetNumber offnum)
{
HeapTupleHeader htup;
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
- presult->hastup = true; /* the page is not empty */
+ prstate->hastup = true; /* the page is not empty */
+ /*
+ * The criteria for counting a tuple as live in this block need to match
+ * what analyze.c's acquire_sample_rows() does, otherwise VACUUM and
+ * ANALYZE may produce wildly different reltuples values, e.g. when there
+ * are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as VACUUM
+ * can't run inside a transaction block, which makes some cases impossible
+ * (e.g. in-progress insert from the same transaction).
+ *
+ * HEAPTUPLE_DEAD are handled by the other heap_prune_record_*()
+ * subroutines. They don't count dead items like acquire_sample_rows()
+ * does, because we assume that all dead items will become LP_UNUSED
+ * before VACUUM finishes. This difference is only superficial. VACUUM
+ * effectively agrees with ANALYZE about DEAD items, in the end. VACUUM
+ * won't remember LP_DEAD items, but only because they're not supposed to
+ * be left behind when it is done. (Cases where we bypass index vacuuming
+ * will violate this optimistic assumption, but the overall impact of that
+ * should be negligible.)
+ */
htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
- switch (htsv[offnum])
+ switch (prstate->htsv[offnum])
{
case HEAPTUPLE_LIVE:
- case HEAPTUPLE_INSERT_IN_PROGRESS:
/*
- * If we wanted to optimize for aborts, we might consider marking
- * the page prunable when we see INSERT_IN_PROGRESS. But we
- * don't. See related decisions about when to mark the page
- * prunable in heapam.c.
+ * Count it as live. Not only is this natural, but it's also what
+ * acquire_sample_rows() does.
*/
+ prstate->live_tuples++;
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed asynchronously.
+ * See SetHintBits for more info. Check that the tuple is hinted
+ * xmin-committed because of that.
+ */
+ if (prstate->all_visible)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ prstate->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed? A FrozenTransactionId
+ * is seen as committed to everyone. Otherwise, we check if
+ * there is a snapshot that considers this xid to still be
+ * running, and if so, we don't consider the page all-visible.
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+
+ /* For now always use pagefrz->cutoffs */
+ Assert(prstate->pagefrz.cutoffs);
+ if (!TransactionIdPrecedes(xmin, prstate->pagefrz.cutoffs->OldestXmin))
+ {
+ prstate->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
+ TransactionIdIsNormal(xmin))
+ prstate->visibility_cutoff_xid = xmin;
+ }
break;
case HEAPTUPLE_RECENTLY_DEAD:
+
+ /*
+ * If tuple is recently dead then we must not remove it from the
+ * relation. (We only remove items that are LP_DEAD from
+ * pruning.)
+ */
+ prstate->recently_dead_tuples++;
+ prstate->all_visible = false;
+
+ /*
+ * This tuple may soon become DEAD. Update the hint field so that
+ * the page is reconsidered for pruning in future.
+ */
+ heap_prune_record_prunable(prstate,
+ HeapTupleHeaderGetUpdateXid(htup));
+ break;
+
case HEAPTUPLE_DELETE_IN_PROGRESS:
+ /*
+ * This an expected case during concurrent vacuum. Count such rows
+ * as live. As above, we assume the deleting transaction will
+ * commit and update the counters after we report.
+ */
+ prstate->live_tuples++;
+ prstate->all_visible = false;
+
/*
* This tuple may soon become DEAD. Update the hint field so that
* the page is reconsidered for pruning in future.
@@ -954,6 +1374,24 @@ heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate
HeapTupleHeaderGetUpdateXid(htup));
break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and we
+ * assume that will happen only after we report our results. This
+ * assumption is a bit shaky, but it is what acquire_sample_rows()
+ * does, so be consistent.
+ */
+ prstate->all_visible = false;
+
+ /*
+ * If we wanted to optimize for aborts, we might consider marking
+ * the page prunable when we see INSERT_IN_PROGRESS. But we
+ * don't. See related decisions about when to mark the page
+ * prunable in heapam.c.
+ */
+ break;
default:
@@ -961,7 +1399,8 @@ heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate
* DEAD tuples should've been passed to heap_prune_record_dead()
* or heap_prune_record_unused() instead.
*/
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d", htsv[offnum]);
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d",
+ prstate->htsv[offnum]);
break;
}
@@ -971,12 +1410,12 @@ heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate
/* Tuple with storage -- consider need to freeze */
bool totally_frozen;
- if ((heap_prepare_freeze_tuple(htup, &presult->pagefrz,
- &presult->frozen[presult->nfrozen],
+ if ((heap_prepare_freeze_tuple(htup, &prstate->pagefrz,
+ &prstate->frozen[prstate->nfrozen],
&totally_frozen)))
{
/* Save prepared freeze plan for later */
- presult->frozen[presult->nfrozen++].offset = offnum;
+ prstate->frozen[prstate->nfrozen++].offset = offnum;
}
/*
@@ -985,7 +1424,7 @@ heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate
* definitely cannot be set all-frozen in the visibility map later on
*/
if (!totally_frozen)
- presult->all_frozen = false;
+ prstate->all_frozen = false;
}
}
@@ -1007,6 +1446,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* other VACUUM, at most. Besides, VACUUM must treat
* hastup/nonempty_pages as provisional no matter how LP_DEAD items are
* handled (handled here, or handled later on).
+ *
+ * Similarly, don't unset all_visible until later, at the end of
+ * heap_page_prune_and_freeze(). This will allow us to attempt to freeze
+ * the page after pruning. As long as we unset it before updating the
+ * visibility map, this will be correct.
*/
/* Record the dead offset for vacuum */
@@ -1029,7 +1473,7 @@ heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum
}
/*
- * Perform the actual page changes needed by heap_page_prune.
+ * Perform the actual page changes needed by heap_page_prune_and_freeze().
*
* If 'lp_truncate_only' is set, we are merely marking LP_DEAD line pointers
* as unused, not redirecting or removing anything else. The
@@ -1160,12 +1604,12 @@ heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
else
{
/*
- * When heap_page_prune() was called, PRUNE_DO_MARK_UNUSED_NOW may
- * have been set, which allows would-be LP_DEAD items to be made
- * LP_UNUSED instead. This is only possible if the relation has
- * no indexes. If there are any dead items, then
- * PRUNE_DO_MARK_UNUSED_NOW was not set and every item being
- * marked LP_UNUSED must refer to a heap-only tuple.
+ * When heap_page_prune_and_freeze() was called,
+ * PRUNE_DO_MARK_UNUSED_NOW may have been set, which allows
+ * would-be LP_DEAD items to be made LP_UNUSED instead. This is
+ * only possible if the relation has no indexes. If there are any
+ * dead items, then PRUNE_DO_MARK_UNUSED_NOW was not set and every
+ * item being marked LP_UNUSED must refer to a heap-only tuple.
*/
if (ndead > 0)
{
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7f1e4db55c0..3913da7e161 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -430,12 +430,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* as an upper bound on the XIDs stored in the pages we'll actually scan
* (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
*
- * Next acquire vistest, a related cutoff that's used in heap_page_prune.
- * We expect vistest will always make heap_page_prune remove any deleted
- * tuple whose xmax is < OldestXmin. lazy_scan_prune must never become
- * confused about whether a tuple should be frozen or removed. (In the
- * future we might want to teach lazy_scan_prune to recompute vistest from
- * time to time, to increase the number of dead tuples it can prune away.)
+ * Next acquire vistest, a related cutoff that's used in
+ * heap_page_prune_and_freeze(). We expect vistest will always make
+ * heap_page_prune_and_freeze() remove any deleted tuple whose xmax is <
+ * OldestXmin. lazy_scan_prune must never become confused about whether a
+ * tuple should be frozen or removed. (In the future we might want to
+ * teach lazy_scan_prune to recompute vistest from time to time, to
+ * increase the number of dead tuples it can prune away.)
*/
vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
@@ -1387,22 +1388,6 @@ OffsetNumber_cmp(const void *a, const void *b)
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
- * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
- * whether or not a tuple should be considered DEAD. This happened when an
- * inserting transaction concurrently aborted (after our heap_page_prune()
- * call, before our HeapTupleSatisfiesVacuum() call). There was rather a lot
- * of complexity just so we could deal with tuples that were DEAD to VACUUM,
- * but nevertheless were left with storage after pruning.
- *
- * As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune()'s visibility check. Without the second call to
- * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
- * disagreement. We'll just handle such tuples as if they had become fully dead
- * right after this operation completes instead of in the middle of it. Note that
- * any tuple that becomes dead after the call to heap_page_prune() can't need to
- * be frozen, because it was visible to another session when vacuum started.
- *
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
@@ -1421,292 +1406,50 @@ lazy_scan_prune(LVRelState *vacrel,
bool *has_lpdead_items)
{
Relation rel = vacrel->rel;
- OffsetNumber offnum,
- maxoff;
- ItemId itemid;
- PruneResult presult;
- int live_tuples,
- recently_dead_tuples;
- bool all_visible;
- TransactionId visibility_cutoff_xid;
+ PruneFreezeResult presult;
uint8 actions = 0;
- int64 fpi_before = pgWalUsage.wal_fpi;
Assert(BufferGetBlockNumber(buf) == blkno);
/*
- * maxoff might be reduced following line pointer array truncation in
- * heap_page_prune. That's safe for us to ignore, since the reclaimed
- * space will continue to look like LP_UNUSED items below.
- */
- maxoff = PageGetMaxOffsetNumber(page);
-
- /* Initialize (or reset) page-level state */
- presult.pagefrz.freeze_required = false;
- presult.pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- presult.pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
- presult.pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- presult.pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
- presult.pagefrz.cutoffs = &vacrel->cutoffs;
- live_tuples = 0;
- recently_dead_tuples = 0;
-
- /*
- * Prune all HOT-update chains in this page.
+ * Prune all HOT-update chains and potentially freeze tuples on this page.
*
* We count the number of tuples removed from the page by the pruning step
- * in presult.ndeleted. It should not be confused with lpdead_items;
- * lpdead_items's final value can be thought of as the number of tuples
- * that were deleted from indexes.
+ * in presult.ndeleted. It should not be confused with
+ * presult.lpdead_items; presult.lpdead_items's final value can be thought
+ * of as the number of tuples that were deleted from indexes.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED, so PRUNE_DO_MARK_UNUSED_NOW should be set if no
* indexes and unset otherwise.
+ *
+ * We will update the VM after collecting LP_DEAD items and freezing
+ * tuples. Pruning will have determined whether or not the page is
+ * all-visible.
*/
actions |= PRUNE_DO_TRY_FREEZE;
if (vacrel->nindexes == 0)
actions |= PRUNE_DO_MARK_UNUSED_NOW;
- heap_page_prune(rel, buf, vacrel->vistest, actions,
- &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
-
- /*
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Keep track of whether or not the page is all_visible and
- * all_frozen and use this information to update the VM. all_visible
- * implies 0 lpdead_items, but don't trust all_frozen result unless
- * all_visible is also set to true.
- *
- * Also keep track of the visibility cutoff xid for recovery conflicts.
- */
- all_visible = true;
- visibility_cutoff_xid = InvalidTransactionId;
-
- /*
- * Now scan the page to collect LP_DEAD items and update the variables set
- * just above.
- */
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
- {
- HeapTupleHeader htup;
-
- /*
- * Set the offset number so that we can display it along with any
- * error that occurred while processing this tuple.
- */
- vacrel->offnum = offnum;
- itemid = PageGetItemId(page, offnum);
-
- if (!ItemIdIsUsed(itemid))
- continue;
-
- /* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid))
- continue;
-
- if (ItemIdIsDead(itemid))
- {
- /*
- * Also deliberately delay unsetting all_visible until just before
- * we return to lazy_scan_heap caller, as explained in full below.
- * (This is another case where it's useful to anticipate that any
- * LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
- */
- continue;
- }
-
- Assert(ItemIdIsNormal(itemid));
-
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (htsv_get_valid_status(presult.htsv[offnum]))
- {
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
- {
- all_visible = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- recently_dead_tuples++;
- all_visible = false;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- all_visible = false;
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
- all_visible = false;
+ heap_page_prune_and_freeze(rel, buf, actions, vacrel->vistest,
+ &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum,
+ &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
- /*
- * Count such rows as live. As above, we assume the deleting
- * transaction will commit and update the counters after we
- * report.
- */
- live_tuples++;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
- }
+ Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
+ Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
- /*
- * We have now divided every item on the page into either an LP_DEAD item
- * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
- * that remains and needs to be considered for freezing now (LP_UNUSED and
- * LP_REDIRECT items also remain, but are of no further interest to us).
- */
vacrel->offnum = InvalidOffsetNumber;
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
- */
- if (presult.pagefrz.freeze_required || presult.nfrozen == 0 ||
- (all_visible && presult.all_frozen &&
- fpi_before != pgWalUsage.wal_fpi))
+ if (presult.nfrozen > 0)
{
/*
- * We're freezing the page. Our final NewRelfrozenXid doesn't need to
- * be affected by the XIDs that are just about to be frozen anyway.
+ * We never increment the frozen_pages instrumentation counter when
+ * nfrozen == 0, since it only counts pages with newly frozen tuples
+ * (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->NewRelfrozenXid = presult.pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = presult.pagefrz.FreezePageRelminMxid;
-
- if (presult.nfrozen == 0)
- {
- /*
- * We have no freeze plans to execute, so there's no added cost
- * from following the freeze path. That's why it was chosen. This
- * is important in the case where the page only contains totally
- * frozen tuples at this point (perhaps only following pruning).
- * Such pages can be marked all-frozen in the VM by our caller,
- * even though none of its tuples were newly frozen here (note
- * that the "no freeze" path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter
- * here, since it only counts pages with newly frozen tuples
- * (don't confuse that with pages newly set all-frozen in VM).
- */
- }
- else
- {
- TransactionId snapshotConflictHorizon;
-
- vacrel->frozen_pages++;
-
- /*
- * We can use visibility_cutoff_xid as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM
- * once we're done with it. Otherwise we generate a conservative
- * cutoff by stepping back from OldestXmin.
- */
- if (all_visible && presult.all_frozen)
- {
- /* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = visibility_cutoff_xid;
- visibility_cutoff_xid = InvalidTransactionId;
- }
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
+ vacrel->frozen_pages++;
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- presult.frozen, presult.nfrozen);
- }
- }
- else
- {
- /*
- * Page requires "no freeze" processing. It might be set all-visible
- * in the visibility map, but it can never be set all-frozen.
- */
- vacrel->NewRelfrozenXid = presult.pagefrz.NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = presult.pagefrz.NoFreezePageRelminMxid;
- presult.all_frozen = false;
- presult.nfrozen = 0; /* avoid miscounts in instrumentation */
}
/*
@@ -1718,17 +1461,21 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && presult.lpdead_items == 0)
+ if (presult.all_visible)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
+ Assert(presult.lpdead_items == 0);
+
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
Assert(false);
+ Assert(presult.all_frozen == debug_all_frozen);
+
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == visibility_cutoff_xid);
+ debug_cutoff == presult.vm_conflict_horizon);
}
#endif
@@ -1762,27 +1509,14 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(dead_items->num_items <= dead_items->max_items);
pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
dead_items->num_items);
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on
- * to make the choice of whether or not to freeze the page unaffected
- * by the short-term presence of LP_DEAD items. These LP_DEAD items
- * were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible. It needs
- * to reflect the present state of things, as expected by our caller.
- */
- all_visible = false;
}
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += presult.lpdead_items;
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
+ vacrel->live_tuples += presult.live_tuples;
+ vacrel->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
if (presult.hastup)
@@ -1791,20 +1525,20 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_visible || !(*has_lpdead_items));
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (presult.all_frozen)
{
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1824,7 +1558,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
+ vmbuffer, presult.vm_conflict_horizon,
flags);
}
@@ -1872,7 +1606,7 @@ lazy_scan_prune(LVRelState *vacrel,
* it as all-frozen. Note that all_frozen is only valid if all_visible is
* true, so we must check both all_visible and all_frozen.
*/
- else if (all_visible_according_to_vm && all_visible &&
+ else if (all_visible_according_to_vm && presult.all_visible &&
presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
@@ -1889,11 +1623,11 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Set the page all-frozen (and all-visible) in the VM.
*
- * We can pass InvalidTransactionId as our visibility_cutoff_xid,
- * since a snapshotConflictHorizon sufficient to make everything safe
- * for REDO was logged when the page's tuples were frozen.
+ * We can pass InvalidTransactionId as our vm_conflict_horizon, since
+ * a snapshotConflictHorizon sufficient to make everything safe for
+ * REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2a86ffa078..98e956ea955 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -215,21 +215,15 @@ typedef struct HeapPageFreeze
/*
* Per-page state returned from pruning
*/
-typedef struct PruneResult
+typedef struct PruneFreezeResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ int nfrozen; /* Number of tuples we froze */
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
- *
- * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
- * 1. Otherwise every access would need to subtract 1.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
+ /* Number of live and recently dead tuples on the page, after pruning */
+ int live_tuples;
+ int recently_dead_tuples;
/*
* Whether or not the page makes rel truncation unsafe
@@ -240,18 +234,18 @@ typedef struct PruneResult
bool hastup;
/*
- * Prepare to freeze in heap_page_prune(). lazy_scan_prune() will use the
- * returned freeze plans to execute freezing.
- */
- HeapPageFreeze pagefrz;
-
- /*
- * Whether or not the page can be set all-frozen in the visibility map.
- * This is only set if the PRUNE_DO_TRY_FREEZE action flag is set.
+ * all_visible and all_frozen indicate if the all-visible and all-frozen
+ * bits in the visibility map can be set for this page, after pruning.
+ *
+ * vm_conflict_horizon is the newest xmin of live tuples on the page. The
+ * caller can use it as the conflict horizon, when setting the VM bits. It
+ * is only valid if we froze some tuples, and all_frozen is true.
+ *
+ * These are only set if the PRUNE_DO_TRY_FREEZE action flag is set.
*/
+ bool all_visible;
bool all_frozen;
- int nfrozen;
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+ TransactionId vm_conflict_horizon;
/*
* LP_DEAD items on the page after pruning. Includes existing LP_DEAD
@@ -259,7 +253,7 @@ typedef struct PruneResult
*/
int lpdead_items;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
-} PruneResult;
+} PruneFreezeResult;
/* 'reason' codes for heap_page_prune() */
typedef enum
@@ -269,20 +263,6 @@ typedef enum
PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
} PruneReason;
-/*
- * Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneResult.htsv for details. This helper function is meant to
- * guard against examining visibility status array members which have not yet
- * been computed.
- */
-static inline HTSV_Result
-htsv_get_valid_status(int status)
-{
- Assert(status >= HEAPTUPLE_DEAD &&
- status <= HEAPTUPLE_DELETE_IN_PROGRESS);
- return (HTSV_Result) status;
-}
-
/* ----------------
* function prototypes for heap access method
*
@@ -355,9 +335,11 @@ extern void heap_inplace_update(Relation relation, HeapTuple tuple);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
-extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples);
+
+extern void heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
+extern void heap_freeze_prepared_tuples(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern bool heap_freeze_tuple(HeapTupleHeader tuple,
TransactionId relfrozenxid, TransactionId relminmxid,
TransactionId FreezeLimit, TransactionId MultiXactCutoff);
@@ -378,12 +360,15 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
- uint8 actions,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc);
+extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ uint8 actions,
+ struct GlobalVisState *vistest,
+ struct VacuumCutoffs *cutoffs,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid);
extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9add48f9924..4f0f76cf7b9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2194,7 +2194,7 @@ PromptInterruptContext
ProtocolVersion
PrsStorage
PruneReason
-PruneResult
+PruneFreezeResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.39.2
On Mon, Apr 01, 2024 at 05:17:51PM +0300, Heikki Linnakangas wrote:
On 30/03/2024 07:57, Melanie Plageman wrote:
On Fri, Mar 29, 2024 at 12:32:21PM -0400, Melanie Plageman wrote:
On Fri, Mar 29, 2024 at 12:00 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Here's another idea: In the first loop through the offsets, where we
gather the HTSV status of each item, also collect the offsets of all HOT
and non-HOT items to two separate arrays. Call heap_prune_chain() for
all the non-HOT items first, and then process any remaining HOT tuples
that haven't been marked yet.That's an interesting idea. I'll try it out and see how it works.
Attached v10 implements this method of dividing tuples into HOT and
non-HOT and processing the potential HOT chains first then processing
tuples not marked by calling heap_prune_chain().I have applied the refactoring of heap_prune_chain() to master and then
built the other patches on top of that.Committed some of the changes. Continuing to reviewing the rest.
Thanks!
The early patches in the set include some additional comment cleanup as
well. 0001 is fairly polished. 0004 could use some variable renaming
(this patch partitions the tuples into HOT and not HOT and then
processes them). I was struggling with some of the names here
(chainmembers and chaincandidates is confusing).I didn't understand why you wanted to juggle both partitions in the same
array. So I separated them into two arrays, and called them 'root_items' and
'heaponly_items'.
I thought it was worth it to save the space. And the algorithm for doing
it seemed pretty straightforward. But looking at your patch, it is a lot
easier to understand with two arrays (since, for example, they can each
have a name).
In some micro-benchmarks, the order that the items were processed made a
measurable difference. So I'm processing the items in the reverse order.
That roughly matches the order the items are processed in master, as it
iterates the offsets from high-to-low in the first loop, and low-to-high in
the second loop.
This makes sense. I noticed you didn't there isn't a comment about this
above the loop. It might be worth it to mention it.
Below is a review of only 0001. I'll look at the others shortly.
From a6ab891779876e7cc1b4fb6fddb09f52f0094646 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon, 1 Apr 2024 16:59:38 +0300
Subject: [PATCH v11 1/7] Handle non-chain tuples outside of heap_prune_chain()
---
src/backend/access/heap/pruneheap.c | 264 +++++++++++++++++-----------
1 file changed, 166 insertions(+), 98 deletions(-)diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c @@ -256,15 +270,16 @@ heap_page_prune(Relation relation, Buffer buffer, tup.t_tableOid = RelationGetRelid(relation);/* - * Determine HTSV for all tuples. + * Determine HTSV for all tuples, and queue them up for processing as HOT + * chain roots or as a heap-only items.
Reading this comment now as a whole, I would add something like
"Determining HTSV for all tuples once is required for correctness" to
the start of the second paragraph. The new conjunction on the first
paragraph sentence followed by the next paragraph is a bit confusing
because it sounds like queuing them up for processing is required for
correctness (which, I suppose it is on some level). Basically, I'm just
saying that it is now less clear what is required for correctness.
* This is required for correctness to deal with cases where running HTSV * twice could result in different results (e.g. RECENTLY_DEAD can turn to * DEAD if another checked item causes GlobalVisTestIsRemovableFullXid() * to update the horizon, INSERT_IN_PROGRESS can change to DEAD if the - * inserting transaction aborts, ...). That in turn could cause - * heap_prune_chain() to behave incorrectly if a tuple is reached twice, - * once directly via a heap_prune_chain() and once following a HOT chain. + * inserting transaction aborts, ...). VACUUM assumes that there are no + * normal DEAD tuples left on the page after pruning, so it needs to have + * the same understanding of what is DEAD and what is not. * * It's also good for performance. Most commonly tuples within a page are * stored at decreasing offsets (while the items are stored at increasing @@ -282,52 +297,140 @@ heap_page_prune(Relation relation, Buffer buffer,
+ /* + * Process any heap-only tuples that were not already processed as part of + * a HOT chain. + */
While I recognize this is a matter of style and not important, I
personally prefer this for reverse looping:
for (int i = prstate.nheaponly_items; i --> 0;)
I do think a comment about the reverse order would be nice. I know it
says something above the first loop to this effect:
* Processing the items in reverse order (and thus the tuples in
* increasing order) increases prefetching efficiency significantly /
* decreases the number of cache misses.
So perhaps we could just say "as above, process the items in reverse
order"
- Melanie
On Mon, Apr 01, 2024 at 05:17:51PM +0300, Heikki Linnakangas wrote:
On 30/03/2024 07:57, Melanie Plageman wrote:
The final state of the code could definitely use more cleanup. I've been
staring at it for awhile, so I could use some thoughts/ideas about what
part to focus on improving.Committed some of the changes. I plan to commit at least the first of these
remaining patches later today. I'm happy with it now, but I'll give it a
final glance over after dinner.I'll continue to review the rest after that, but attached is what I have
now.
Review for 0003-0006 (I didn't have any new thoughts on 0002). I know
you didn't modify them much/at all, but I noticed some things in my code
that could be better.
From 17e183835a968e81daf7b74a4164b243e2de35aa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 29 Mar 2024 19:43:09 -0400
Subject: [PATCH v11 3/7] Introduce PRUNE_DO_* actionsWe will eventually take additional actions in heap_page_prune() at the
discretion of the caller. For now, introduce these PRUNE_DO_* macros and
turn mark_unused_now, a paramter to heap_page_prune(), into a PRUNE_DO_
paramter -> parameter
action.
---
src/backend/access/heap/pruneheap.c | 51 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 11 ++++--
src/include/access/heapam.h | 13 ++++++-
3 files changed, 46 insertions(+), 29 deletions(-)diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index fb0ad834f1b..30965c3c5a1 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -29,10 +29,11 @@ /* Working data for heap_page_prune and subroutines */ typedef struct { + /* PRUNE_DO_* arguments */ + uint8 actions;
I wasn't sure if actions is a good name. What do you think?
+ /* tuple visibility test, initialized for the relation */ GlobalVisState *vistest; - /* whether or not dead items can be set LP_UNUSED during pruning */ - bool mark_unused_now;TransactionId new_prune_xid; /* new prune hint value for page */
TransactionId snapshotConflictHorizon; /* latest xid removed */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index 32a3fbce961..35b8486c34a 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -191,6 +191,17 @@ typedef struct HeapPageFreeze} HeapPageFreeze;
+/* + * Actions that can be taken during pruning and freezing. By default, we will + * at least attempt regular pruning. + */ + +/* + * PRUNE_DO_MARK_UNUSED_NOW indicates whether or not dead items can be set + * LP_UNUSED during pruning. + */ +#define PRUNE_DO_MARK_UNUSED_NOW (1 << 1)
No reason for me to waste the zeroth bit here. I just realized that I
did this with XLHP_IS_CATALOG_REL too.
#define XLHP_IS_CATALOG_REL (1 << 1)
From 083690b946e19ab5e536a9f2689772e7b91d2a70 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 29 Mar 2024 21:22:14 -0400
Subject: [PATCH v11 4/7] Prepare freeze tuples in heap_page_prune()diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index 35b8486c34a..ac129692c13 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h+/* + * Prepare to freeze if advantageous or required and try to advance + * relfrozenxid and relminmxid. To attempt freezing, we will need to determine + * if the page is all frozen. So, if this action is set, we will also inform + * the caller if the page is all-visible and/or all-frozen and calculate a
I guess we don't inform the caller if the page is all-visible, so this
is not quite right.
+ * snapshot conflict horizon for updating the visibility map. + */ +#define PRUNE_DO_TRY_FREEZE (1 << 2)
From ef8cb2c089ad9474a6da309593029c08f71b0bb9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 29 Mar 2024 21:36:37 -0400
Subject: [PATCH v11 5/7] Set hastup in heap_page_prunediff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index 8bdd6389b25..65b0ed185ff 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -66,6 +66,9 @@ typedef struct bool processed[MaxHeapTuplesPerPage + 1];int ndeleted; /* Number of tuples deleted from the page */ + + /* Whether or not the page makes rel truncation unsafe */ + bool hastup; } PruneState;/* Local functions */
@@ -271,6 +274,7 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.ndeleted = 0;
prstate.nroot_items = 0;
prstate.nheaponly_items = 0;
+ prstate.hastup = false;/*
* If we will prepare to freeze tuples, consider that it might be possible
@@ -280,7 +284,7 @@ heap_page_prune(Relation relation, Buffer buffer,
presult->all_frozen = true;
else
presult->all_frozen = false;
-
+ presult->hastup = prstate.hastup;/*
* presult->htsv is not initialized here because all ntuple spots in the
@@ -819,6 +823,8 @@ heap_prune_record_redirect(PruneState *prstate,
*/
if (was_normal)
prstate->ndeleted++;
+
+ prstate->hastup = true;
}/* Record line pointer to be marked dead */ @@ -901,11 +907,15 @@ static void heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, PruneResult *presult, OffsetNumber offnum) { - HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum)); + HeapTupleHeader htup;Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;+ presult->hastup = true; /* the page is not empty */
My fault, but hastup being set sometimes in PruneState and sometimes in
PruneResult is quite unpalatable.
From dffa90d0bd8e972cfe26da96860051dc8c2a8576 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplageman@gmail.com> Date: Sat, 30 Mar 2024 01:27:08 -0400 Subject: [PATCH v11 6/7] Save dead tuple offsets during heap_page_prune --- diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 212d76045ef..7f1e4db55c0 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1373,6 +1373,15 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno, return false; }+static int +OffsetNumber_cmp(const void *a, const void *b) +{ + OffsetNumber na = *(const OffsetNumber *) a, + nb = *(const OffsetNumber *) b; + + return na < nb ? -1 : na > nb; +}
This probably doesn't belong here. I noticed spgdoinsert.c had a static
function for sorting OffsetNumbers, but I didn't see anything general
purpose anywhere else.
- Melanie
On 01/04/2024 19:08, Melanie Plageman wrote:
On Mon, Apr 01, 2024 at 05:17:51PM +0300, Heikki Linnakangas wrote:
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c @@ -256,15 +270,16 @@ heap_page_prune(Relation relation, Buffer buffer, tup.t_tableOid = RelationGetRelid(relation);/* - * Determine HTSV for all tuples. + * Determine HTSV for all tuples, and queue them up for processing as HOT + * chain roots or as a heap-only items.Reading this comment now as a whole, I would add something like
"Determining HTSV for all tuples once is required for correctness" to
the start of the second paragraph. The new conjunction on the first
paragraph sentence followed by the next paragraph is a bit confusing
because it sounds like queuing them up for processing is required for
correctness (which, I suppose it is on some level). Basically, I'm just
saying that it is now less clear what is required for correctness.
Fixed.
While I recognize this is a matter of style and not important, I
personally prefer this for reverse looping:for (int i = prstate.nheaponly_items; i --> 0;)
I don't think we use that style anywhere in the Postgres source tree
currently. (And I don't like it ;-) )
I do think a comment about the reverse order would be nice. I know it
says something above the first loop to this effect:* Processing the items in reverse order (and thus the tuples in
* increasing order) increases prefetching efficiency significantly /
* decreases the number of cache misses.So perhaps we could just say "as above, process the items in reverse
order"
I'm actually not sure why it makes a difference. I would assume all the
data to already be in CPU cache at this point, since the first loop
already accessed it, so I think there's something else going on. But I
didn't investigate it deeper. Anyway, added a comment.
Committed the first of the remaining patches with those changes. And
also this, which is worth calling out:
if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,
&prstate.latest_xid_removed);
heap_prune_record_unused(&prstate, offnum, true);
}
else
{
/*
* This tuple should've been processed and removed as part of
* a HOT chain, so something's wrong. To preserve evidence,
* we don't dare to remove it. We cannot leave behind a DEAD
* tuple either, because that will cause VACUUM to error out.
* Throwing an error with a distinct error message seems like
* the least bad option.
*/
elog(ERROR, "dead heap-only tuple (%u, %d) is not linked to from any HOT chain",
blockno, offnum);
}
}
else
heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
As you can see, I turned that into a hard error. Previously, that code
was at the top of heap_prune_chain(), and it was normal to see DEAD
heap-only tuples there, because they would presumably get processed
later as part of a HOT chain. But now this is done after all the HOT
chains have already been processed.
Previously if there was a dead heap-only tuple like that on the page for
some reason, it was silently not processed by heap_prune_chain()
(because it was assumed that it would be processed later as part of a
HOT chain), and was left behind as a HEAPTUPLE_DEAD tuple. If the
pruning was done as part of VACUUM, VACUUM would fail with "ERROR:
unexpected HeapTupleSatisfiesVacuum result". Or am I missing something?
Now you get that above error also on on-access pruning, which is not
ideal. But I don't remember hearing about corruption like that, and
you'd get the error on VACUUM anyway.
With the next patches, heap_prune_record_unchanged() will do more, and
will also throw an error on a HEAPTUPLE_LIVE tuple, so even though in
the first patch we could print just a WARNING and move on, it gets more
awkward with the rest of the patches.
(Continuing with the remaining patches..)
--
Heikki Linnakangas
Neon (https://neon.tech)
On Mon, Apr 1, 2024 at 1:37 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Committed the first of the remaining patches with those changes. And
also this, which is worth calling out:if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,
&prstate.latest_xid_removed);
heap_prune_record_unused(&prstate, offnum, true);
}
else
{
/*
* This tuple should've been processed and removed as part of
* a HOT chain, so something's wrong. To preserve evidence,
* we don't dare to remove it. We cannot leave behind a DEAD
* tuple either, because that will cause VACUUM to error out.
* Throwing an error with a distinct error message seems like
* the least bad option.
*/
elog(ERROR, "dead heap-only tuple (%u, %d) is not linked to from any HOT chain",
blockno, offnum);
}
}
else
heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);As you can see, I turned that into a hard error. Previously, that code
was at the top of heap_prune_chain(), and it was normal to see DEAD
heap-only tuples there, because they would presumably get processed
later as part of a HOT chain. But now this is done after all the HOT
chains have already been processed.Previously if there was a dead heap-only tuple like that on the page for
some reason, it was silently not processed by heap_prune_chain()
(because it was assumed that it would be processed later as part of a
HOT chain), and was left behind as a HEAPTUPLE_DEAD tuple. If the
pruning was done as part of VACUUM, VACUUM would fail with "ERROR:
unexpected HeapTupleSatisfiesVacuum result". Or am I missing something?
I think you are right. I wasn't sure if there was some way for a HOT,
DEAD tuple to be not HOT-updated, but that doesn't make much sense.
Now you get that above error also on on-access pruning, which is not
ideal. But I don't remember hearing about corruption like that, and
you'd get the error on VACUUM anyway.
Yea, that makes sense. One thing I don't really understand is why
vacuum has its own system for saving and restoring error information
for context messages (LVSavedErrInfo and
update/restore_vacuum_err_info()). I'll confess I don't know much
about how error cleanup works in any sub-system. But it stuck out to
me that vacuum has its own. I assume it is okay to add new error
messages and they somehow will work with the existing system?
- Melanie
On 01/04/2024 20:22, Melanie Plageman wrote:
From 17e183835a968e81daf7b74a4164b243e2de35aa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 29 Mar 2024 19:43:09 -0400
Subject: [PATCH v11 3/7] Introduce PRUNE_DO_* actionsWe will eventually take additional actions in heap_page_prune() at the
discretion of the caller. For now, introduce these PRUNE_DO_* macros and
turn mark_unused_now, a paramter to heap_page_prune(), into a PRUNE_DO_paramter -> parameter
action.
---
src/backend/access/heap/pruneheap.c | 51 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 11 ++++--
src/include/access/heapam.h | 13 ++++++-
3 files changed, 46 insertions(+), 29 deletions(-)diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index fb0ad834f1b..30965c3c5a1 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -29,10 +29,11 @@ /* Working data for heap_page_prune and subroutines */ typedef struct { + /* PRUNE_DO_* arguments */ + uint8 actions;I wasn't sure if actions is a good name. What do you think?
Committed this part, with the name 'options'. There's some precedent for
that in heap_insert().
I decided to keep it a separate bool field here in the PruneState
struct, though, and only changed it in the heap_page_prune() function
signature. It didn't feel worth the code churn here, and
'prstate.mark_unused_now' is a shorter than "(prstate.options &
HEAP_PRUNE_PAGE_MARK_UNUSED_NOW) != 0" anyway.
--
Heikki Linnakangas
Neon (https://neon.tech)
On 01/04/2024 20:22, Melanie Plageman wrote:
Review for 0003-0006 (I didn't have any new thoughts on 0002). I know
you didn't modify them much/at all, but I noticed some things in my code
that could be better.
Ok, here's what I have now. I made a lot of small comment changes here
and there, and some minor local refactorings, but nothing major. I lost
track of all the individual changes I'm afraid, so I'm afraid you'll
have to just diff against the previous version if you want to see what's
changed. I hope I didn't break anything.
I'm pretty happy with this now. I will skim through it one more time
later today or tomorrow, and commit. Please review once more if you have
a chance.
This probably doesn't belong here. I noticed spgdoinsert.c had a static
function for sorting OffsetNumbers, but I didn't see anything general
purpose anywhere else.
I copied the spgdoinsert.c implementation to vacuumlazy.c as is. Would
be nice to have just one copy of this in some common place, but I also
wasn't sure where to put it.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
v12-0001-Refactor-how-heap_prune_chain-updates-prunable_x.patchtext/x-patch; charset=UTF-8; name=v12-0001-Refactor-how-heap_prune_chain-updates-prunable_x.patchDownload
From be8891155c93f3555c49371f9804bdf5ba578f6e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 2 Apr 2024 15:47:06 +0300
Subject: [PATCH v12 1/2] Refactor how heap_prune_chain() updates prunable_xid
In preparation of freezing and counting tuples which are not
candidates for pruning, split heap_prune_record_unchanged() into
multiple functions, depending the kind of line pointer. That's not too
interesting right now, but makes the next commit smaller.
Recording the lowest soon-to-be prunable xid is one of the actions we
take for unchanged LP_NORMAL item pointers but not for others, so move
that to the new heap_prune_record_unchanged_lp_normal() function. The
next commit will add more actions to these functions.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
---
src/backend/access/heap/pruneheap.c | 125 ++++++++++++++++++++--------
1 file changed, 92 insertions(+), 33 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ef563e19aa5..1b5bf990d21 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -78,7 +78,11 @@ static void heap_prune_record_redirect(PruneState *prstate,
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_unchanged(PruneState *prstate, OffsetNumber offnum);
+
+static void heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -311,7 +315,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
{
- heap_prune_record_unchanged(&prstate, offnum);
+ heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
continue;
}
@@ -324,7 +328,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (unlikely(prstate.mark_unused_now))
heap_prune_record_unused(&prstate, offnum, false);
else
- heap_prune_record_unchanged(&prstate, offnum);
+ heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
continue;
}
@@ -434,7 +438,7 @@ heap_page_prune(Relation relation, Buffer buffer,
}
}
else
- heap_prune_record_unchanged(&prstate, offnum);
+ heap_prune_record_unchanged_lp_normal(page, presult->htsv, &prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -652,9 +656,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
*/
chainitems[nchain++] = offnum;
- /*
- * Check tuple's visibility status.
- */
switch (htsv_get_valid_status(htsv[offnum]))
{
case HEAPTUPLE_DEAD:
@@ -670,9 +671,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
case HEAPTUPLE_RECENTLY_DEAD:
/*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- *
* We don't need to advance the conflict horizon for
* RECENTLY_DEAD tuples, even if we are removing them. This
* is because we only remove RECENTLY_DEAD tuples if they
@@ -681,8 +679,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
* tuple by virtue of being later in the chain. We will have
* advanced the conflict horizon for the DEAD tuple.
*/
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
/*
* Advance past RECENTLY_DEAD tuples just in case there's a
@@ -693,24 +689,8 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- */
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
- goto process_chain;
-
case HEAPTUPLE_LIVE:
case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * If we wanted to optimize for aborts, we might consider
- * marking the page prunable when we see INSERT_IN_PROGRESS.
- * But we don't. See related decisions about when to mark the
- * page prunable in heapam.c.
- */
goto process_chain;
default:
@@ -757,8 +737,15 @@ process_chain:
* No DEAD tuple was found, so the chain is entirely composed of
* normal, unchanged tuples. Leave it alone.
*/
- for (int i = 0; i < nchain; i++)
- heap_prune_record_unchanged(prstate, chainitems[i]);
+ int i = 0;
+
+ if (ItemIdIsRedirected(rootlp))
+ {
+ heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum);
+ i++;
+ }
+ for (; i < nchain; i++)
+ heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -784,7 +771,7 @@ process_chain:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (int i = ndeadchain; i < nchain; i++)
- heap_prune_record_unchanged(prstate, chainitems[i]);
+ heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
}
}
@@ -894,9 +881,81 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_norm
prstate->ndeleted++;
}
-/* Record a line pointer that is left unchanged */
+/*
+ * Record an unused line pointer that is left unchanged.
+ */
+static void
+heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum)
+{
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+}
+
+/*
+ * Record LP_NORMAL line pointer that is left unchanged.
+ */
+static void
+heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum)
+{
+ HeapTupleHeader htup;
+
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+
+ switch (htsv[offnum])
+ {
+ case HEAPTUPLE_LIVE:
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * If we wanted to optimize for aborts, we might consider marking
+ * the page prunable when we see INSERT_IN_PROGRESS. But we
+ * don't. See related decisions about when to mark the page
+ * prunable in heapam.c.
+ */
+ break;
+
+ case HEAPTUPLE_RECENTLY_DEAD:
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+ htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+
+ /*
+ * This tuple may soon become DEAD. Update the hint field so that
+ * the page is reconsidered for pruning in future.
+ */
+ heap_prune_record_prunable(prstate,
+ HeapTupleHeaderGetUpdateXid(htup));
+ break;
+
+
+ default:
+
+ /*
+ * DEAD tuples should've been passed to heap_prune_record_dead()
+ * or heap_prune_record_unused() instead.
+ */
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d", htsv[offnum]);
+ break;
+ }
+}
+
+
+/*
+ * Record line pointer that was already LP_DEAD and is left unchanged.
+ */
+static void
+heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum)
+{
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+}
+
+/*
+ * Record LP_REDIRECT that is left unchanged.
+ */
static void
-heap_prune_record_unchanged(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum)
{
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
--
2.39.2
v12-0002-Combine-freezing-and-pruning-steps-in-VACUUM.patchtext/x-patch; charset=UTF-8; name=v12-0002-Combine-freezing-and-pruning-steps-in-VACUUM.patchDownload
From 5de3531e4dfc20eda114ce156c6698ca7bf6cd82 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 2 Apr 2024 16:00:34 +0300
Subject: [PATCH v12 2/2] Combine freezing and pruning steps in VACUUM
Execute both freezing and pruning of tuples in the same
heap_page_prune() function, now called heap_page_prune_and_freeze(),
and emit a single WAL record containing all changes. That reduces the
overall amount of WAL generated.
This moves the freezing logic from vacuumlazy.c to the
heap_page_prune_and_freeze() function. The main difference in the
coding is that in vacuumlazy.c, we looked at the tuples after the
pruning had already happened, but in heap_page_prune_and_freeze() we
operate on the tuples before pruning. The heap_prepare_freeze_tuple()
function is now invoked after we have determined that a tuple is not
going to be pruned away.
VACUUM no longer needs to loop through the items on the page after
pruning. heap_page_prune_and_freeze() does all the work. It now
returns the list of dead offsets, including existing LP_DEAD items, to
the caller. Similarly it's now responsible for tracking 'all_visible',
'all_frozen', and 'hastup' on the caller's behalf.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
---
src/backend/access/heap/heapam.c | 67 +-
src/backend/access/heap/heapam_handler.c | 2 +-
src/backend/access/heap/pruneheap.c | 753 ++++++++++++++++++++---
src/backend/access/heap/vacuumlazy.c | 434 +++----------
src/include/access/heapam.h | 83 ++-
src/tools/pgindent/typedefs.list | 2 +-
6 files changed, 804 insertions(+), 537 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b661d9811eb..a9d5b109a5e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6447,9 +6447,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* XIDs or MultiXactIds that will need to be processed by a future VACUUM.
*
* VACUUM caller must assemble HeapTupleFreeze freeze plan entries for every
- * tuple that we returned true for, and call heap_freeze_execute_prepared to
- * execute freezing. Caller must initialize pagefrz fields for page as a
- * whole before first call here for each heap page.
+ * tuple that we returned true for, and then execute freezing. Caller must
+ * initialize pagefrz fields for page as a whole before first call here for
+ * each heap page.
*
* VACUUM caller decides on whether or not to freeze the page as a whole.
* We'll often prepare freeze plans for a page that caller just discards.
@@ -6765,35 +6765,19 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
+ * Perform xmin/xmax XID status sanity checks before actually executing freeze
+ * plans.
+ *
+ * heap_prepare_freeze_tuple doesn't perform these checks directly because
+ * pg_xact lookups are relatively expensive. They shouldn't be repeated by
+ * successive VACUUMs that each decide against freezing the same page.
*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- /*
- * Perform xmin/xmax XID status sanity checks before critical section.
- *
- * heap_prepare_freeze_tuple doesn't perform these checks directly because
- * pg_xact lookups are relatively expensive. They shouldn't be repeated
- * by successive VACUUMs that each decide against freezing the same page.
- */
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6832,8 +6816,19 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xmax)));
}
}
+}
- START_CRIT_SECTION();
+/*
+ * Helper which executes freezing of one or more heap tuples on a page on
+ * behalf of caller. Caller passes an array of tuple plans from
+ * heap_prepare_freeze_tuple. Caller must set 'offset' in each plan for us.
+ * Must be called in a critical section that also marks the buffer dirty and,
+ * if needed, emits WAL.
+ */
+void
+heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
for (int i = 0; i < ntuples; i++)
{
@@ -6844,22 +6839,6 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
htup = (HeapTupleHeader) PageGetItem(page, itemid);
heap_execute_freeze_tuple(htup, frz);
}
-
- MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(rel))
- {
- log_heap_prune_and_freeze(rel, buffer, snapshotConflictHorizon,
- false, /* no cleanup lock required */
- PRUNE_VACUUM_SCAN,
- tuples, ntuples,
- NULL, 0, /* redirected */
- NULL, 0, /* dead */
- NULL, 0); /* unused */
- }
-
- END_CRIT_SECTION();
}
/*
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c86000d245b..0952d4a98eb 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1122,7 +1122,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
* We ignore unused and redirect line pointers. DEAD line pointers
* should be counted as dead, because we need vacuum to run to get rid
* of them. Note that this rule agrees with the way that
- * heap_page_prune() counts things.
+ * heap_page_prune_and_freeze() counts things.
*/
if (!ItemIdIsNormal(itemid))
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1b5bf990d21..41c919b15be 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -17,32 +17,54 @@
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
+#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
-/* Working data for heap_page_prune and subroutines */
+/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
+ /*-------------------------------------------------------
+ * Arguments passed to heap_page_and_freeze()
+ *-------------------------------------------------------
+ */
+
/* tuple visibility test, initialized for the relation */
GlobalVisState *vistest;
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
+ /* whether to attempt freezing tuples */
+ bool freeze;
+ struct VacuumCutoffs *cutoffs;
- TransactionId new_prune_xid; /* new prune hint value for page */
- TransactionId snapshotConflictHorizon; /* latest xid removed */
+ /*-------------------------------------------------------
+ * Fields describing what to do to the page
+ *-------------------------------------------------------
+ */
+ TransactionId new_prune_xid; /* new prune hint value */
+ TransactionId latest_xid_removed;
int nredirected; /* numbers of entries in arrays below */
int ndead;
int nunused;
+ int nfrozen;
/* arrays that accumulate indexes of items to be changed */
OffsetNumber redirected[MaxHeapTuplesPerPage * 2];
OffsetNumber nowdead[MaxHeapTuplesPerPage];
OffsetNumber nowunused[MaxHeapTuplesPerPage];
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+
+ /*-------------------------------------------------------
+ * Working state for HOT chain processing
+ *-------------------------------------------------------
+ */
/*
* 'root_items' contains offsets of all LP_REDIRECT line pointers and
@@ -63,24 +85,92 @@ typedef struct
*/
bool processed[MaxHeapTuplesPerPage + 1];
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * Freezing-related state.
+ */
+ HeapPageFreeze pagefrz;
+
+ /*-------------------------------------------------------
+ * Information about what was done
+ *
+ * These fields are not used by pruning itself for the most part, but are
+ * used to collect information about what was pruned and what state the
+ * page is in after pruning, for the benefit of the caller. They are
+ * copied to the caller's PruneFreezeResult at the end.
+ * -------------------------------------------------------
+ */
+
int ndeleted; /* Number of tuples deleted from the page */
+
+ /* Number of live and recently dead tuples, after pruning */
+ int live_tuples;
+ int recently_dead_tuples;
+
+ /* Whether or not the page makes rel truncation unsafe */
+ bool hastup;
+
+ /*
+ * LP_DEAD items on the page after pruning. Includes existing LP_DEAD
+ * items
+ */
+ int lpdead_items; /* number of items in the array */
+ OffsetNumber *deadoffsets; /* points directly to presult->deadoffsets */
+
+ /*
+ * all_visible and all_frozen indicate if the all-visible and all-frozen
+ * bits in the visibility map can be set for this page after pruning.
+ *
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page.
+ * The caller can use it as the conflict horizon, when setting the VM
+ * bits. It is only valid if we froze some tuples, and all_frozen is
+ * true.
+ *
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
+ * convenient for heap_page_prune_and_freeze(), to use them to decide
+ * whether to freeze the page or not. The all_visible and all_frozen
+ * values returned to the caller are adjusted to include LP_DEAD items at
+ * the end.
+ *
+ * all_frozen should only be considered valid if all_visible is also set;
+ * we don't bother to clear the all_frozen flag every time we clear the
+ * all_visible flag.
+ */
+ bool all_visible;
+ bool all_frozen;
+ TransactionId visibility_cutoff_xid;
} PruneState;
/* Local functions */
static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
HeapTuple tup,
Buffer buffer);
+static inline HTSV_Result htsv_get_valid_status(int status);
static void heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
- OffsetNumber rootoffnum, int8 *htsv, PruneState *prstate);
+ OffsetNumber rootoffnum, PruneState *prstate);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum, bool was_normal);
-static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ bool was_normal);
+static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ bool was_normal);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ bool was_normal);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
@@ -163,15 +253,15 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
OffsetNumber dummy_off_loc;
- PruneResult presult;
+ PruneFreezeResult presult;
/*
* For now, pass mark_unused_now as false regardless of whether or
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, 0,
- &presult, PRUNE_ON_ACCESS, &dummy_off_loc);
+ heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -205,13 +295,24 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
/*
- * Prune and repair fragmentation in the specified page.
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
+ * required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now. The
+ * 'cutoffs', 'presult', 'new_refrozen_xid' and 'new_relmin_mxid' arguments
+ * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
+ * set presult->all_visible and presult->all_frozen on exit, to indicate if
+ * the VM bits can be set. They are always set to false when the
+ * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
+ * that also freeze need that information.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -219,23 +320,39 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
* pruning.
*
- * presult contains output parameters needed by callers such as the number of
- * tuples removed and the number of line pointers newly marked LP_DEAD.
- * heap_page_prune() is responsible for initializing it.
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
+ * of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it.
*
* reason indicates why the pruning is performed. It is included in the WAL
* record for debugging and analysis purposes, but otherwise has no effect.
*
* off_loc is the offset location required by the caller to use in error
* callback.
+ *
+ * new_relfrozen_xid and new_relmin_xid must provided by the caller if the
+ * HEAP_PRUNE_FREEZE option is set. On entry, they contain the oldest XID and
+ * multi-XID seen on the relation so far. They will be updated with oldest
+ * values present on the page after pruning. After processing the whole
+ * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
+ * for the relation.
*/
void
-heap_page_prune(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc)
+heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ GlobalVisState *vistest,
+ int options,
+ struct VacuumCutoffs *cutoffs,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -243,6 +360,17 @@ heap_page_prune(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
+ bool do_freeze;
+ bool do_prune;
+ bool do_hint;
+ bool hint_bit_fpi;
+ int64 fpi_before = pgWalUsage.wal_fpi;
+
+ /* Copy parameters to prstate */
+ prstate.vistest = vistest;
+ prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+ prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.cutoffs = cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -256,36 +384,97 @@ heap_page_prune(Relation relation, Buffer buffer,
* initialize the rest of our working state.
*/
prstate.new_prune_xid = InvalidTransactionId;
- prstate.vistest = vistest;
- prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.snapshotConflictHorizon = InvalidTransactionId;
- prstate.nredirected = prstate.ndead = prstate.nunused = 0;
- prstate.ndeleted = 0;
+ prstate.latest_xid_removed = InvalidTransactionId;
+ prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
prstate.nroot_items = 0;
prstate.nheaponly_items = 0;
+ /* initialize page freezing working state */
+ prstate.pagefrz.freeze_required = false;
+ if (prstate.freeze)
+ {
+ prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
+ prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+ }
+ else
+ {
+ Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
+ prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ }
+
+ prstate.ndeleted = 0;
+ prstate.live_tuples = 0;
+ prstate.recently_dead_tuples = 0;
+ prstate.hastup = false;
+ prstate.lpdead_items = 0;
+ prstate.deadoffsets = presult->deadoffsets;
+
/*
- * presult->htsv is not initialized here because all ntuple spots in the
- * array will be set either to a valid HTSV_Result value or -1.
+ * Caller may update the VM after we're done. We keep track of whether
+ * the page will be all_visible and all_frozen, once we're done with the
+ * pruning and freezing, to help the caller to do that.
+ *
+ * Currently, only VACUUM sets the VM bits. To save the effort, only do
+ * only the bookkeeping if the caller needs it. Currently, that's tied to
+ * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag, if you wanted
+ * to update the VM bits without also freezing, or freezing without
+ * setting the VM bits.
+ *
+ * In addition to telling the caller whether it can set the VM bit, we
+ * also use 'all_visible' and 'all_frozen' for our own decision-making. If
+ * the whole page will become frozen, we consider opportunistically
+ * freezing tuples. We will not be able to freeze the whole page if there
+ * are tuples present that are not visible to everyone or if there are
+ * dead tuples which are not yet removable. However, dead tuples which
+ * will be removed by the end of vacuuming should not preclude us from
+ * opportunistically freezing. Because of that, we do not clear
+ * all_visible when we see LP_DEAD items. We fix that at the end of the
+ * function, when we return the value to the caller, so that the caller
+ * doesn't set the VM bit incorrectly.
*/
- presult->ndeleted = 0;
- presult->nnewlpdead = 0;
+ if (prstate.freeze)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = true;
+ }
+ else
+ {
+ /*
+ * Initializing to false allows skipping the work to update them in
+ * heap_prune_record_unchanged_lp_normal().
+ */
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
+ /*
+ * The visibility cutoff xid is the newest xmin of live tuples on the
+ * page. In the common case, this will be set as the conflict horizon the
+ * caller can use for updating the VM. If, at the end of freezing and
+ * pruning, the page is all-frozen, there is no possibility that any
+ * running transaction on the standby does not see tuples on the page as
+ * all-visible, so the conflict horizon remains InvalidTransactionId.
+ */
+ prstate.visibility_cutoff_xid = InvalidTransactionId;
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
- * chain roots or as a heap-only items.
+ * chain roots or as heap-only items.
*
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
* checked item causes GlobalVisTestIsRemovableFullXid() to update the
* horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts. VACUUM assumes that there are no normal DEAD
- * tuples left on the page after pruning, so it needs to have the same
- * understanding of what is DEAD and what is not.
+ * transaction aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -310,7 +499,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*off_loc = offnum;
prstate.processed[offnum] = false;
- presult->htsv[offnum] = -1;
+ prstate.htsv[offnum] = -1;
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
@@ -349,8 +538,8 @@ heap_page_prune(Relation relation, Buffer buffer,
tup.t_len = ItemIdGetLength(itemid);
ItemPointerSet(&tup.t_self, blockno, offnum);
- presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
if (!HeapTupleHeaderIsHeapOnly(htup))
prstate.root_items[prstate.nroot_items++] = offnum;
@@ -358,6 +547,12 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
}
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted.
+ */
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+
/*
* Process HOT chains.
*
@@ -381,8 +576,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*off_loc = offnum;
/* Process this item or chain of items */
- heap_prune_chain(page, blockno, maxoff,
- offnum, presult->htsv, &prstate);
+ heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
}
/*
@@ -412,7 +606,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* return true for an XMIN_INVALID tuple, so this code will work even
* when there were sequential updates within the aborted transaction.)
*/
- if (presult->htsv[offnum] == HEAPTUPLE_DEAD)
+ if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -420,7 +614,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate.snapshotConflictHorizon);
+ &prstate.latest_xid_removed);
heap_prune_record_unused(&prstate, offnum, true);
}
else
@@ -438,7 +632,7 @@ heap_page_prune(Relation relation, Buffer buffer,
}
}
else
- heap_prune_record_unchanged_lp_normal(page, presult->htsv, &prstate, offnum);
+ heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -456,21 +650,102 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
- /* Any error while applying the changes is critical */
- START_CRIT_SECTION();
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
- /* Have we found any prunable items? */
- if (prstate.nredirected > 0 || prstate.ndead > 0 || prstate.nunused > 0)
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
+
+ /*
+ * Decide if we want to go ahead with freezing according to the freeze
+ * plans we prepared, or not.
+ */
+ do_freeze = false;
+ if (prstate.freeze)
+ {
+ if (prstate.pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID
+ * from before FreezeLimit/MultiXactCutoff is present. Must
+ * freeze to advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page, if we are generating an FPI
+ * anyway, and if doing so means that we can set the page
+ * all-frozen afterwards (might not happen until VACUUM's final
+ * heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze
+ * and prune records are combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ bool whole_page_freezable = prstate.all_visible &&
+ prstate.all_frozen;
+
+ if (whole_page_freezable && prstate.nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. In this case, we
+ * will freeze if we have already emitted an FPI or will do so
+ * anyway. Be sure only to incur the overhead of checking if
+ * we will do an FPI if we may use that information.
+ */
+ if (hint_bit_fpi ||
+ ((do_prune || do_hint) && XLogCheckBufferNeedsBackup(buffer)))
+ {
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
{
/*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
*/
- heap_page_prune_execute(buffer, false,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
+ heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
+ }
+ else if (prstate.nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate.pagefrz.freeze_required);
+ prstate.all_frozen = false;
+ prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ /* Any error while applying the changes is critical */
+ START_CRIT_SECTION();
+
+ if (do_hint)
+ {
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
* XID of any soon-prunable tuple.
@@ -484,6 +759,29 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
PageClearFull(page);
+ /*
+ * If that's all we had to do to the page, this is a non-WAL-logged
+ * hint. If we will also freeze or prune the page, we will mark the
+ * buffer dirty below.
+ */
+ if (!do_freeze && !do_prune)
+ MarkBufferDirtyHint(buffer, true);
+ }
+
+ if (do_prune || do_freeze)
+ {
+ /* Apply the planned item changes and repair page fragmentation. */
+ if (do_prune)
+ {
+ heap_page_prune_execute(buffer, false,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
+ }
+
+ if (do_freeze)
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+
MarkBufferDirty(buffer);
/*
@@ -491,40 +789,115 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
+ /*
+ * The snapshotConflictHorizon for the whole record should be the
+ * most conservative of all the horizons calculated for any of the
+ * possible modifications. If this record will prune tuples, any
+ * transactions on the standby older than the youngest xmax of the
+ * most recently removed tuple this record will prune will
+ * conflict. If this record will freeze tuples, any transactions
+ * on the standby with xids older than the youngest tuple this
+ * record will freeze will conflict.
+ */
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+ TransactionId conflict_xid;
+
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for
+ * conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (do_freeze)
+ {
+ if (prstate.all_visible && prstate.all_frozen)
+ frz_conflict_horizon = prstate.visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+ }
+
+ if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
+ conflict_xid = frz_conflict_horizon;
+ else
+ conflict_xid = prstate.latest_xid_removed;
+
log_heap_prune_and_freeze(relation, buffer,
- prstate.snapshotConflictHorizon,
+ conflict_xid,
true, reason,
- NULL, 0,
+ prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
}
}
- else
- {
- /*
- * If we didn't prune anything, but have found a new value for the
- * pd_prune_xid field, update it and mark the buffer dirty. This is
- * treated as a non-WAL-logged hint.
- *
- * Also clear the "page is full" flag if it is set, since there's no
- * point in repeating the prune/defrag process until something else
- * happens to the page.
- */
- if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page))
- {
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
- PageClearFull(page);
- MarkBufferDirtyHint(buffer, true);
- }
- }
END_CRIT_SECTION();
/* Copy information back for caller */
- presult->nnewlpdead = prstate.ndead;
presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which heap pass (initial pass or final pass) ends up setting the
+ * page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state
+ * of things, as expected by our caller.
+ */
+ if (prstate.all_visible && prstate.lpdead_items == 0)
+ {
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
+ }
+ else
+ {
+ presult->all_visible = false;
+ presult->all_frozen = false;
+ }
+
+ presult->hastup = prstate.hastup;
+
+ /*
+ * For callers planning to update the visibility map, the conflict horizon
+ * for that record must be the newest xmin on the page. However, if the
+ * page is completely frozen, there can be no conflict and the
+ * vm_conflict_horizon should remain InvalidTransactionId. This includes
+ * the case that we just froze all the tuples; the prune-freeze record
+ * included the conflict XID already so the caller doesn't need it.
+ */
+ if (presult->all_frozen)
+ presult->vm_conflict_horizon = InvalidTransactionId;
+ else
+ presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ if (prstate.freeze)
+ {
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
+ }
}
@@ -549,10 +922,24 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
}
+/*
+ * Pruning calculates tuple visibility once and saves the results in an array
+ * of int8. See PruneState.htsv for details. This helper function is meant
+ * to guard against examining visibility status array members which have not
+ * yet been computed.
+ */
+static inline HTSV_Result
+htsv_get_valid_status(int status)
+{
+ Assert(status >= HEAPTUPLE_DEAD &&
+ status <= HEAPTUPLE_DELETE_IN_PROGRESS);
+ return (HTSV_Result) status;
+}
+
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in htsv.
+ * Tuple visibility information is provided in prstate->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -572,11 +959,17 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* prstate showing the changes to be made. Items to be redirected are added
* to the redirected[] array (two entries per redirection); items to be set to
* LP_DEAD state are added to nowdead[]; and items to be set to LP_UNUSED
- * state are added to nowunused[].
+ * state are added to nowunused[]. We perform bookkeeping of live tuples,
+ * visibility etc. based on what the page will look like after the changes
+ * applied. All that bookkeeping is performed in the heap_prune_record_*()
+ * subroutines. The division of labor is that heap_prune_chain() decides the
+ * fate of each tuple, ie. whether it's going to be removed, redirected or
+ * left unchanged, and the heap_prune_record_*() subroutines update PruneState
+ * based on that outcome.
*/
static void
heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
- OffsetNumber rootoffnum, int8 *htsv, PruneState *prstate)
+ OffsetNumber rootoffnum, PruneState *prstate)
{
TransactionId priorXmax = InvalidTransactionId;
ItemId rootlp;
@@ -656,15 +1049,14 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
*/
chainitems[nchain++] = offnum;
- switch (htsv_get_valid_status(htsv[offnum]))
+ switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
/* Remember the last DEAD tuple seen */
ndeadchain = nchain;
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
-
+ &prstate->latest_xid_removed);
/* Advance to next chain member */
break;
@@ -720,10 +1112,11 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
{
/*
* We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune caused us to
- * visit the dead successor of a redirect item before visiting the
- * redirect item. We can clean up by setting the redirect item to
- * LP_DEAD state or LP_UNUSED if the caller indicated.
+ * item. This can happen if the loop in heap_page_prune_and_freeze()
+ * caused us to visit the dead successor of a redirect item before
+ * visiting the redirect item. We can clean up by setting the
+ * redirect item to LP_DEAD state or LP_UNUSED if the caller
+ * indicated.
*/
heap_prune_record_dead_or_unused(prstate, rootoffnum, false);
return;
@@ -745,7 +1138,7 @@ process_chain:
i++;
}
for (; i < nchain; i++)
- heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
+ heap_prune_record_unchanged_lp_normal(page, prstate, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -771,7 +1164,7 @@ process_chain:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (int i = ndeadchain; i < nchain; i++)
- heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
+ heap_prune_record_unchanged_lp_normal(page, prstate, chainitems[i]);
}
}
@@ -816,6 +1209,8 @@ heap_prune_record_redirect(PruneState *prstate,
*/
if (was_normal)
prstate->ndeleted++;
+
+ prstate->hastup = true;
}
/* Record line pointer to be marked dead */
@@ -830,6 +1225,21 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
+ /*
+ * Deliberately delay unsetting all_visible until later during pruning.
+ * Removable dead tuples shouldn't preclude freezing the page. After
+ * finishing this first pass of tuple visibility checks, initialize
+ * all_visible_except_removable with the current value of all_visible to
+ * indicate whether or not the page is all visible except for dead tuples.
+ * This will allow us to attempt to freeze the page after pruning. Later
+ * during pruning, if we encounter an LP_DEAD item or are setting an item
+ * LP_DEAD, we will unset all_visible. As long as we unset it before
+ * updating the visibility map, this will be correct.
+ */
+
+ /* Record the dead offset for vacuum */
+ prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
/*
* If the root entry had been a normal tuple, we are deleting it, so count
* it in the result. But changing a redirect (even to DEAD state) doesn't
@@ -892,21 +1302,121 @@ heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumb
}
/*
- * Record LP_NORMAL line pointer that is left unchanged.
+ * Record line pointer that is left unchanged. We consider freezing it, and
+ * update bookkeeping of tuple counts and page visibility.
*/
static void
-heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumber offnum)
{
HeapTupleHeader htup;
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
- switch (htsv[offnum])
+ prstate->hastup = true; /* the page is not empty */
+
+ /*
+ * The criteria for counting a tuple as live in this block need to match
+ * what analyze.c's acquire_sample_rows() does, otherwise VACUUM and
+ * ANALYZE may produce wildly different reltuples values, e.g. when there
+ * are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as VACUUM
+ * can't run inside a transaction block, which makes some cases impossible
+ * (e.g. in-progress insert from the same transaction).
+ *
+ * HEAPTUPLE_DEAD are handled by the other heap_prune_record_*()
+ * subroutines. They don't count dead items like acquire_sample_rows()
+ * does, because we assume that all dead items will become LP_UNUSED
+ * before VACUUM finishes. This difference is only superficial. VACUUM
+ * effectively agrees with ANALYZE about DEAD items, in the end. VACUUM
+ * won't remember LP_DEAD items, but only because they're not supposed to
+ * be left behind when it is done. (Cases where we bypass index vacuuming
+ * will violate this optimistic assumption, but the overall impact of that
+ * should be negligible.)
+ */
+ htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+
+ switch (prstate->htsv[offnum])
{
case HEAPTUPLE_LIVE:
+
+ /*
+ * Count it as live. Not only is this natural, but it's also what
+ * acquire_sample_rows() does.
+ */
+ prstate->live_tuples++;
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed asynchronously.
+ * See SetHintBits for more info. Check that the tuple is hinted
+ * xmin-committed because of that.
+ */
+ if (prstate->all_visible)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ prstate->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed? A FrozenTransactionId
+ * is seen as committed to everyone. Otherwise, we check if
+ * there is a snapshot that considers this xid to still be
+ * running, and if so, we don't consider the page all-visible.
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+
+ /*
+ * For now always use prstate->cutoffs for this test, because
+ * we only update 'all_visible' when freezing is requested. We
+ * could use GlobalVisTestIsRemovableXid instead, if a
+ * non-freezing caller wanted to set the VM bit.
+ */
+ Assert(prstate->cutoffs);
+ if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
+ {
+ prstate->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
+ TransactionIdIsNormal(xmin))
+ prstate->visibility_cutoff_xid = xmin;
+ }
+ break;
+
+ case HEAPTUPLE_RECENTLY_DEAD:
+ prstate->recently_dead_tuples++;
+ prstate->all_visible = false;
+
+ /*
+ * This tuple will soon become DEAD. Update the hint field so
+ * that the page is reconsidered for pruning in future.
+ */
+ heap_prune_record_prunable(prstate,
+ HeapTupleHeaderGetUpdateXid(htup));
+ break;
+
case HEAPTUPLE_INSERT_IN_PROGRESS:
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and we
+ * assume that will happen only after we report our results. This
+ * assumption is a bit shaky, but it is what acquire_sample_rows()
+ * does, so be consistent.
+ */
+ prstate->all_visible = false;
+
/*
* If we wanted to optimize for aborts, we might consider marking
* the page prunable when we see INSERT_IN_PROGRESS. But we
@@ -915,10 +1425,15 @@ heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate
*/
break;
- case HEAPTUPLE_RECENTLY_DEAD:
case HEAPTUPLE_DELETE_IN_PROGRESS:
- htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+ /*
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
+ */
+ prstate->live_tuples++;
+ prstate->all_visible = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
@@ -928,16 +1443,40 @@ heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate
HeapTupleHeaderGetUpdateXid(htup));
break;
-
default:
/*
* DEAD tuples should've been passed to heap_prune_record_dead()
* or heap_prune_record_unused() instead.
*/
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d", htsv[offnum]);
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d",
+ prstate->htsv[offnum]);
break;
}
+
+ /* Consider freezing any normal tuples which will not be removed */
+ if (prstate->freeze)
+ {
+ bool totally_frozen;
+
+ if ((heap_prepare_freeze_tuple(htup,
+ prstate->cutoffs,
+ &prstate->pagefrz,
+ &prstate->frozen[prstate->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ prstate->frozen[prstate->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to
+ * become totally frozen (according to its freeze plan), then the page
+ * definitely cannot be set all-frozen in the visibility map later on.
+ */
+ if (!totally_frozen)
+ prstate->all_frozen = false;
+ }
}
@@ -949,6 +1488,24 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
{
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
+
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the soft
+ * assumption that any LP_DEAD items encountered here will become
+ * LP_UNUSED later on, before count_nondeletable_pages is reached. If we
+ * don't make this assumption then rel truncation will only happen every
+ * other VACUUM, at most. Besides, VACUUM must treat
+ * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
+ * handled (handled here, or handled later on).
+ *
+ * Similarly, don't unset all_visible until later, at the end of
+ * heap_page_prune_and_freeze(). This will allow us to attempt to freeze
+ * the page after pruning. As long as we unset it before updating the
+ * visibility map, this will be correct.
+ */
+
+ /* Record the dead offset for vacuum */
+ prstate->deadoffsets[prstate->lpdead_items++] = offnum;
}
/*
@@ -957,12 +1514,20 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
static void
heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum)
{
+ /*
+ * A redirect line pointer doesn't count as a live tuple.
+ *
+ * If we leave a redirect line pointer in place, there will be another
+ * tuple on the page that it points to. We will do the bookkeeping for
+ * that separately. So we have nothing to do here, except remember that
+ * we processed this item.
+ */
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
}
/*
- * Perform the actual page changes needed by heap_page_prune.
+ * Perform the actual page changes needed by heap_page_prune_and_freeze().
*
* If 'lp_truncate_only' is set, we are merely marking LP_DEAD line pointers
* as unused, not redirecting or removing anything else. The
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5fb8f7727b3..ace95a4de26 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -439,12 +439,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* as an upper bound on the XIDs stored in the pages we'll actually scan
* (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
*
- * Next acquire vistest, a related cutoff that's used in heap_page_prune.
- * We expect vistest will always make heap_page_prune remove any deleted
- * tuple whose xmax is < OldestXmin. lazy_scan_prune must never become
- * confused about whether a tuple should be frozen or removed. (In the
- * future we might want to teach lazy_scan_prune to recompute vistest from
- * time to time, to increase the number of dead tuples it can prune away.)
+ * Next acquire vistest, a related cutoff that's used in pruning. We
+ * expect vistest will always make heap_page_prune_and_freeze() remove any
+ * deleted tuple whose xmax is < OldestXmin. lazy_scan_prune must never
+ * become confused about whether a tuple should be frozen or removed. (In
+ * the future we might want to teach lazy_scan_prune to recompute vistest
+ * from time to time, to increase the number of dead tuples it can prune
+ * away.)
*/
vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
@@ -1382,27 +1383,18 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/* qsort comparator for sorting OffsetNumbers */
+static int
+cmpOffsetNumbers(const void *a, const void *b)
+{
+ return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
- * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
- * whether or not a tuple should be considered DEAD. This happened when an
- * inserting transaction concurrently aborted (after our heap_page_prune()
- * call, before our HeapTupleSatisfiesVacuum() call). There was rather a lot
- * of complexity just so we could deal with tuples that were DEAD to VACUUM,
- * but nevertheless were left with storage after pruning.
- *
- * As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune()'s visibility check. Without the second call to
- * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
- * disagreement. We'll just handle such tuples as if they had become fully dead
- * right after this operation completes instead of in the middle of it. Note that
- * any tuple that becomes dead after the call to heap_page_prune() can't need to
- * be frozen, because it was visible to another session when vacuum started.
- *
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
@@ -1421,330 +1413,46 @@ lazy_scan_prune(LVRelState *vacrel,
bool *has_lpdead_items)
{
Relation rel = vacrel->rel;
- OffsetNumber offnum,
- maxoff;
- ItemId itemid;
- PruneResult presult;
- int tuples_frozen,
- lpdead_items,
- live_tuples,
- recently_dead_tuples;
- HeapPageFreeze pagefrz;
- bool hastup = false;
- bool all_visible,
- all_frozen;
- TransactionId visibility_cutoff_xid;
+ PruneFreezeResult presult;
int prune_options = 0;
- int64 fpi_before = pgWalUsage.wal_fpi;
- OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
/*
- * maxoff might be reduced following line pointer array truncation in
- * heap_page_prune. That's safe for us to ignore, since the reclaimed
- * space will continue to look like LP_UNUSED items below.
- */
- maxoff = PageGetMaxOffsetNumber(page);
-
- /* Initialize (or reset) page-level state */
- pagefrz.freeze_required = false;
- pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
- pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
- tuples_frozen = 0;
- lpdead_items = 0;
- live_tuples = 0;
- recently_dead_tuples = 0;
-
- /*
- * Prune all HOT-update chains in this page.
- *
- * We count the number of tuples removed from the page by the pruning step
- * in presult.ndeleted. It should not be confused with lpdead_items;
- * lpdead_items's final value can be thought of as the number of tuples
- * that were deleted from indexes.
+ * Prune all HOT-update chains and potentially freeze tuples on this page.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
- */
- prune_options = 0;
- if (vacrel->nindexes == 0)
- prune_options = HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune(rel, buf, vacrel->vistest, prune_options,
- &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
-
- /*
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Keep track of whether or not the page is all_visible and
- * all_frozen and use this information to update the VM. all_visible
- * implies 0 lpdead_items, but don't trust all_frozen result unless
- * all_visible is also set to true.
*
- * Also keep track of the visibility cutoff xid for recovery conflicts.
- */
- all_visible = true;
- all_frozen = true;
- visibility_cutoff_xid = InvalidTransactionId;
-
- /*
- * Now scan the page to collect LP_DEAD items and update the variables set
- * just above.
- */
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
- {
- HeapTupleHeader htup;
- bool totally_frozen;
-
- /*
- * Set the offset number so that we can display it along with any
- * error that occurred while processing this tuple.
- */
- vacrel->offnum = offnum;
- itemid = PageGetItemId(page, offnum);
-
- if (!ItemIdIsUsed(itemid))
- continue;
-
- /* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid))
- {
- /* page makes rel truncation unsafe */
- hastup = true;
- continue;
- }
-
- if (ItemIdIsDead(itemid))
- {
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- *
- * Also deliberately delay unsetting all_visible until just before
- * we return to lazy_scan_heap caller, as explained in full below.
- * (This is another case where it's useful to anticipate that any
- * LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
- */
- deadoffsets[lpdead_items++] = offnum;
- continue;
- }
-
- Assert(ItemIdIsNormal(itemid));
-
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (htsv_get_valid_status(presult.htsv[offnum]))
- {
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
- {
- all_visible = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- recently_dead_tuples++;
- all_visible = false;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- all_visible = false;
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
- all_visible = false;
-
- /*
- * Count such rows as live. As above, we assume the deleting
- * transaction will commit and update the counters after we
- * report.
- */
- live_tuples++;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
-
- hastup = true; /* page makes rel truncation unsafe */
-
- /* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
- &frozen[tuples_frozen], &totally_frozen))
- {
- /* Save prepared freeze plan for later */
- frozen[tuples_frozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the page
- * definitely cannot be set all-frozen in the visibility map later on
- */
- if (!totally_frozen)
- all_frozen = false;
- }
-
- /*
- * We have now divided every item on the page into either an LP_DEAD item
- * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
- * that remains and needs to be considered for freezing now (LP_UNUSED and
- * LP_REDIRECT items also remain, but are of no further interest to us).
- */
- vacrel->offnum = InvalidOffsetNumber;
-
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
+ * The number of tuples removed from the page is returned in
+ * presult.ndeleted. It should not be confused with presult.lpdead_items;
+ * presult.lpdead_items's final value can be thought of as the number of
+ * tuples that were deleted from indexes.
+ *
+ * We will update the VM after collecting LP_DEAD items and freezing
+ * tuples. Pruning will have determined whether or not the page is
+ * all-visible.
*/
- if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (all_visible && all_frozen &&
- fpi_before != pgWalUsage.wal_fpi))
- {
- /*
- * We're freezing the page. Our final NewRelfrozenXid doesn't need to
- * be affected by the XIDs that are just about to be frozen anyway.
- */
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
-
- if (tuples_frozen == 0)
- {
- /*
- * We have no freeze plans to execute, so there's no added cost
- * from following the freeze path. That's why it was chosen. This
- * is important in the case where the page only contains totally
- * frozen tuples at this point (perhaps only following pruning).
- * Such pages can be marked all-frozen in the VM by our caller,
- * even though none of its tuples were newly frozen here (note
- * that the "no freeze" path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter
- * here, since it only counts pages with newly frozen tuples
- * (don't confuse that with pages newly set all-frozen in VM).
- */
- }
- else
- {
- TransactionId snapshotConflictHorizon;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ if (vacrel->nindexes == 0)
+ prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- vacrel->frozen_pages++;
+ heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ &vacrel->offnum,
+ &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
- /*
- * We can use visibility_cutoff_xid as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM
- * once we're done with it. Otherwise we generate a conservative
- * cutoff by stepping back from OldestXmin.
- */
- if (all_visible && all_frozen)
- {
- /* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = visibility_cutoff_xid;
- visibility_cutoff_xid = InvalidTransactionId;
- }
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
+ Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
+ Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- frozen, tuples_frozen);
- }
- }
- else
+ if (presult.nfrozen > 0)
{
/*
- * Page requires "no freeze" processing. It might be set all-visible
- * in the visibility map, but it can never be set all-frozen.
+ * We don't increment the frozen_pages instrumentation counter when
+ * nfrozen == 0, since it only counts pages with newly frozen tuples
+ * (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- all_frozen = false;
- tuples_frozen = 0; /* avoid miscounts in instrumentation */
+ vacrel->frozen_pages++;
}
/*
@@ -1756,71 +1464,71 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && lpdead_items == 0)
+ if (presult.all_visible)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
+ Assert(presult.lpdead_items == 0);
+
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
Assert(false);
+ Assert(presult.all_frozen == debug_all_frozen);
+
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == visibility_cutoff_xid);
+ debug_cutoff == presult.vm_conflict_horizon);
}
#endif
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
- if (lpdead_items > 0)
+ if (presult.lpdead_items > 0)
{
vacrel->lpdead_item_pages++;
- dead_items_add(vacrel, blkno, deadoffsets, lpdead_items);
-
/*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on
- * to make the choice of whether or not to freeze the page unaffected
- * by the short-term presence of LP_DEAD items. These LP_DEAD items
- * were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible. It needs
- * to reflect the present state of things, as expected by our caller.
+ * deadoffsets are collected incrementally in
+ * heap_page_prune_and_freeze() as each dead line pointer is recorded,
+ * with an indeterminate order, but dead_items_add requires them to be
+ * sorted.
*/
- all_visible = false;
+ qsort(presult.deadoffsets, presult.lpdead_items, sizeof(OffsetNumber),
+ OffsetNumber_cmp);
+
+ dead_items_add(vacrel, blkno, presult.deadoffsets, presult.lpdead_items);
}
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += tuples_frozen;
- vacrel->lpdead_items += lpdead_items;
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
+ vacrel->tuples_frozen += presult.nfrozen;
+ vacrel->lpdead_items += presult.lpdead_items;
+ vacrel->live_tuples += presult.live_tuples;
+ vacrel->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
- if (hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
- *has_lpdead_items = (lpdead_items > 0);
+ *has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_visible || !(*has_lpdead_items));
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (all_frozen)
+ if (presult.all_frozen)
{
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1840,7 +1548,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
+ vmbuffer, presult.vm_conflict_horizon,
flags);
}
@@ -1873,7 +1581,7 @@ lazy_scan_prune(LVRelState *vacrel,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (lpdead_items > 0 && PageIsAllVisible(page))
+ else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
@@ -1888,8 +1596,8 @@ lazy_scan_prune(LVRelState *vacrel,
* it as all-frozen. Note that all_frozen is only valid if all_visible is
* true, so we must check both all_visible and all_frozen.
*/
- else if (all_visible_according_to_vm && all_visible &&
- all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if (all_visible_according_to_vm && presult.all_visible &&
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
@@ -1905,11 +1613,11 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Set the page all-frozen (and all-visible) in the VM.
*
- * We can pass InvalidTransactionId as our visibility_cutoff_xid,
- * since a snapshotConflictHorizon sufficient to make everything safe
- * for REDO was logged when the page's tuples were frozen.
+ * We can pass InvalidTransactionId as our cutoff_xid, since a
+ * snapshotConflictHorizon sufficient to make everything safe for REDO
+ * was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b632fe953c4..536711d98e0 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -36,8 +36,9 @@
#define HEAP_INSERT_NO_LOGICAL TABLE_INSERT_NO_LOGICAL
#define HEAP_INSERT_SPECULATIVE 0x0010
-/* "options" flag bits for heap_page_prune */
+/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
+#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -195,24 +196,47 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
/*
- * Per-page state returned from pruning
+ * Per-page state returned by heap_page_prune_and_freeze()
*/
-typedef struct PruneResult
+typedef struct PruneFreezeResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ int nfrozen; /* Number of tuples we froze */
+
+ /* Number of live and recently dead tuples on the page, after pruning */
+ int live_tuples;
+ int recently_dead_tuples;
/*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ * all_visible and all_frozen indicate if the all-visible and all-frozen
+ * bits in the visibility map can be set for this page, after pruning.
+ *
+ * vm_conflict_horizon is the newest xmin of live tuples on the page. The
+ * caller can use it as the conflict horizon when setting the VM bits. It
+ * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
+ * true.
*
- * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
- * 1. Otherwise every access would need to subtract 1.
+ * These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
- int8 htsv[MaxHeapTuplesPerPage + 1];
-} PruneResult;
+ bool all_visible;
+ bool all_frozen;
+ TransactionId vm_conflict_horizon;
+
+ /*
+ * Whether or not the page makes rel truncation unsafe. This is set to
+ * 'true', even if the page contains LP_DEAD items. VACUUM will remove
+ * them before attempting to truncate.
+ */
+ bool hastup;
+
+ /*
+ * LP_DEAD items on the page after pruning. Includes existing LP_DEAD
+ * items.
+ */
+ int lpdead_items;
+ OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
+} PruneFreezeResult;
/* 'reason' codes for heap_page_prune() */
typedef enum
@@ -222,20 +246,6 @@ typedef enum
PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
} PruneReason;
-/*
- * Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneResult.htsv for details. This helper function is meant to
- * guard against examining visibility status array members which have not yet
- * been computed.
- */
-static inline HTSV_Result
-htsv_get_valid_status(int status)
-{
- Assert(status >= HEAPTUPLE_DEAD &&
- status <= HEAPTUPLE_DELETE_IN_PROGRESS);
- return (HTSV_Result) status;
-}
-
/* ----------------
* function prototypes for heap access method
*
@@ -309,9 +319,11 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
-extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples);
+
+extern void heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
+extern void heap_freeze_prepared_tuples(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern bool heap_freeze_tuple(HeapTupleHeader tuple,
TransactionId relfrozenxid, TransactionId relminmxid,
TransactionId FreezeLimit, TransactionId MultiXactCutoff);
@@ -332,12 +344,15 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
- int options,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc);
+extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ struct GlobalVisState *vistest,
+ int options,
+ struct VacuumCutoffs *cutoffs,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid);
extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8bc8dd6f1c6..46edffef38e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2195,7 +2195,7 @@ PromptInterruptContext
ProtocolVersion
PrsStorage
PruneReason
-PruneResult
+PruneFreezeResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.39.2
On Tue, Apr 2, 2024 at 9:11 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 01/04/2024 20:22, Melanie Plageman wrote:
Review for 0003-0006 (I didn't have any new thoughts on 0002). I know
you didn't modify them much/at all, but I noticed some things in my code
that could be better.Ok, here's what I have now. I made a lot of small comment changes here
and there, and some minor local refactorings, but nothing major. I lost
track of all the individual changes I'm afraid, so I'm afraid you'll
have to just diff against the previous version if you want to see what's
changed. I hope I didn't break anything.I'm pretty happy with this now. I will skim through it one more time
later today or tomorrow, and commit. Please review once more if you have
a chance.
Thanks!
0001 looks good. Attached are some comment updates and such on top of
0001 and 0002.
I started some performance testing of 0002 but haven't finished yet. I
wanted to provide my other review first.
This probably doesn't belong here. I noticed spgdoinsert.c had a static
function for sorting OffsetNumbers, but I didn't see anything general
purpose anywhere else.I copied the spgdoinsert.c implementation to vacuumlazy.c as is. Would
be nice to have just one copy of this in some common place, but I also
wasn't sure where to put it.
I looked a bit through utils and common and didn't see anywhere
obvious to put it. We could make a new file? 0003 fixes where you
forgot to change the name of the qsort function, though.
- Melanie
Attachments:
v13-0003-fix-qsort-func.patchtext/x-patch; charset=US-ASCII; name=v13-0003-fix-qsort-func.patchDownload
From 3b69cf3123732c3296a784be8f4fc08ec024c0d5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Apr 2024 11:11:07 -0400
Subject: [PATCH v13 3/4] fix qsort func
---
src/backend/access/heap/vacuumlazy.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ace95a4de2..c3a9dc1ad6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -46,6 +46,7 @@
#include "commands/dbcommands.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "common/int.h"
#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
@@ -1496,7 +1497,7 @@ lazy_scan_prune(LVRelState *vacrel,
* sorted.
*/
qsort(presult.deadoffsets, presult.lpdead_items, sizeof(OffsetNumber),
- OffsetNumber_cmp);
+ cmpOffsetNumbers);
dead_items_add(vacrel, blkno, presult.deadoffsets, presult.lpdead_items);
}
--
2.40.1
v13-0004-update-few-more-outdated-comments.patchtext/x-patch; charset=US-ASCII; name=v13-0004-update-few-more-outdated-comments.patchDownload
From 021407cee292c0b1ef5145f37aef889b68b739b0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Tue, 2 Apr 2024 11:22:35 -0400
Subject: [PATCH v13 4/4] update few more outdated comments
---
src/backend/access/heap/pruneheap.c | 53 +++++++++++++----------------
src/backend/storage/ipc/procarray.c | 6 ++--
src/include/access/heapam.h | 2 +-
3 files changed, 28 insertions(+), 33 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 41c919b15b..ddc228c86d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -328,7 +328,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it.
+ * heap_page_prune_and_freeze() is responsible for initializing it. Required by
+ * all callers.
*
* reason indicates why the pruning is performed. It is included in the WAL
* record for debugging and analysis purposes, but otherwise has no effect.
@@ -393,6 +394,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.pagefrz.freeze_required = false;
if (prstate.freeze)
{
+ Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
@@ -415,19 +417,19 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We keep track of whether
- * the page will be all_visible and all_frozen, once we're done with the
- * pruning and freezing, to help the caller to do that.
+ * Caller may update the VM after we're done. We can keep track of
+ * whether the page will be all-visible and all-frozen after pruning and
+ * freezing to help the caller to do that.
*
* Currently, only VACUUM sets the VM bits. To save the effort, only do
- * only the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag, if you wanted
- * to update the VM bits without also freezing, or freezing without
+ * the bookkeeping if the caller needs it. Currently, that's tied to
+ * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
+ * to update the VM bits without also freezing or freeze without also
* setting the VM bits.
*
* In addition to telling the caller whether it can set the VM bit, we
* also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page will become frozen, we consider opportunistically
+ * the whole page would become frozen, we consider opportunistically
* freezing tuples. We will not be able to freeze the whole page if there
* are tuples present that are not visible to everyone or if there are
* dead tuples which are not yet removable. However, dead tuples which
@@ -681,16 +683,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
else
{
/*
- * Opportunistically freeze the page, if we are generating an FPI
- * anyway, and if doing so means that we can set the page
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page
* all-frozen afterwards (might not happen until VACUUM's final
* heap pass).
*
* XXX: Previously, we knew if pruning emitted an FPI by checking
* pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records are combined, this heuristic couldn't be used
- * anymore. The opportunistic freeze heuristic must be improved;
- * however, for now, try to approximate the old logic.
+ * and prune records were combined, this heuristic couldn't be
+ * used anymore. The opportunistic freeze heuristic must be
+ * improved; however, for now, try to approximate the old logic.
*/
bool whole_page_freezable = prstate.all_visible &&
prstate.all_frozen;
@@ -761,7 +763,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we will also freeze or prune the page, we will mark the
+ * hint. If we are going to freeze or prune the page, we will mark the
* buffer dirty below.
*/
if (!do_freeze && !do_prune)
@@ -849,12 +851,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* make the choice of whether or not to freeze the page unaffected by the
* short-term presence of LP_DEAD items. These LP_DEAD items were
* effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which heap pass (initial pass or final pass) ends up setting the
- * page all-frozen, as long as the ongoing VACUUM does it.
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of things, as expected by our caller.
+ * of the page, as expected by our caller.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -1227,14 +1229,7 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page. After
- * finishing this first pass of tuple visibility checks, initialize
- * all_visible_except_removable with the current value of all_visible to
- * indicate whether or not the page is all visible except for dead tuples.
- * This will allow us to attempt to freeze the page after pruning. Later
- * during pruning, if we encounter an LP_DEAD item or are setting an item
- * LP_DEAD, we will unset all_visible. As long as we unset it before
- * updating the visibility map, this will be correct.
+ * Removable dead tuples shouldn't preclude freezing the page.
*/
/* Record the dead offset for vacuum */
@@ -1658,10 +1653,10 @@ heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
else
{
/*
- * When heap_page_prune() was called, mark_unused_now may have
- * been passed as true, which allows would-be LP_DEAD items to be
- * made LP_UNUSED instead. This is only possible if the relation
- * has no indexes. If there are any dead items, then
+ * When heap_page_prune_and_freeze() was called, mark_unused_now
+ * may have been passed as true, which allows would-be LP_DEAD
+ * items to be made LP_UNUSED instead. This is only possible if
+ * the relation has no indexes. If there are any dead items, then
* mark_unused_now was not true and every item being marked
* LP_UNUSED must refer to a heap-only tuple.
*/
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index b3cd248fb6..88a6d504df 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1715,9 +1715,9 @@ TransactionIdIsActive(TransactionId xid)
* Note: the approximate horizons (see definition of GlobalVisState) are
* updated by the computations done here. That's currently required for
* correctness and a small optimization. Without doing so it's possible that
- * heap vacuum's call to heap_page_prune() uses a more conservative horizon
- * than later when deciding which tuples can be removed - which the code
- * doesn't expect (breaking HOT).
+ * heap vacuum's call to heap_page_prune_and_freeze() uses a more conservative
+ * horizon than later when deciding which tuples can be removed - which the
+ * code doesn't expect (breaking HOT).
*/
static void
ComputeXidHorizons(ComputeXidHorizonsResult *h)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 536711d98e..a307fb5f24 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -238,7 +238,7 @@ typedef struct PruneFreezeResult
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
-/* 'reason' codes for heap_page_prune() */
+/* 'reason' codes for heap_page_prune_and_freeze() */
typedef enum
{
PRUNE_ON_ACCESS, /* on-access pruning */
--
2.40.1
v13-0002-Combine-freezing-and-pruning-steps-in-VACUUM.patchtext/x-patch; charset=US-ASCII; name=v13-0002-Combine-freezing-and-pruning-steps-in-VACUUM.patchDownload
From 056a1301c0adb918f0502239365054f57fc81672 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 2 Apr 2024 16:00:34 +0300
Subject: [PATCH v13 2/4] Combine freezing and pruning steps in VACUUM
Execute both freezing and pruning of tuples in the same
heap_page_prune() function, now called heap_page_prune_and_freeze(),
and emit a single WAL record containing all changes. That reduces the
overall amount of WAL generated.
This moves the freezing logic from vacuumlazy.c to the
heap_page_prune_and_freeze() function. The main difference in the
coding is that in vacuumlazy.c, we looked at the tuples after the
pruning had already happened, but in heap_page_prune_and_freeze() we
operate on the tuples before pruning. The heap_prepare_freeze_tuple()
function is now invoked after we have determined that a tuple is not
going to be pruned away.
VACUUM no longer needs to loop through the items on the page after
pruning. heap_page_prune_and_freeze() does all the work. It now
returns the list of dead offsets, including existing LP_DEAD items, to
the caller. Similarly it's now responsible for tracking 'all_visible',
'all_frozen', and 'hastup' on the caller's behalf.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
---
src/backend/access/heap/heapam.c | 67 +-
src/backend/access/heap/heapam_handler.c | 2 +-
src/backend/access/heap/pruneheap.c | 753 ++++++++++++++++++++---
src/backend/access/heap/vacuumlazy.c | 434 +++----------
src/include/access/heapam.h | 83 ++-
src/tools/pgindent/typedefs.list | 2 +-
6 files changed, 804 insertions(+), 537 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b661d9811e..a9d5b109a5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6447,9 +6447,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* XIDs or MultiXactIds that will need to be processed by a future VACUUM.
*
* VACUUM caller must assemble HeapTupleFreeze freeze plan entries for every
- * tuple that we returned true for, and call heap_freeze_execute_prepared to
- * execute freezing. Caller must initialize pagefrz fields for page as a
- * whole before first call here for each heap page.
+ * tuple that we returned true for, and then execute freezing. Caller must
+ * initialize pagefrz fields for page as a whole before first call here for
+ * each heap page.
*
* VACUUM caller decides on whether or not to freeze the page as a whole.
* We'll often prepare freeze plans for a page that caller just discards.
@@ -6765,35 +6765,19 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
+ * Perform xmin/xmax XID status sanity checks before actually executing freeze
+ * plans.
+ *
+ * heap_prepare_freeze_tuple doesn't perform these checks directly because
+ * pg_xact lookups are relatively expensive. They shouldn't be repeated by
+ * successive VACUUMs that each decide against freezing the same page.
*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- /*
- * Perform xmin/xmax XID status sanity checks before critical section.
- *
- * heap_prepare_freeze_tuple doesn't perform these checks directly because
- * pg_xact lookups are relatively expensive. They shouldn't be repeated
- * by successive VACUUMs that each decide against freezing the same page.
- */
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6832,8 +6816,19 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xmax)));
}
}
+}
- START_CRIT_SECTION();
+/*
+ * Helper which executes freezing of one or more heap tuples on a page on
+ * behalf of caller. Caller passes an array of tuple plans from
+ * heap_prepare_freeze_tuple. Caller must set 'offset' in each plan for us.
+ * Must be called in a critical section that also marks the buffer dirty and,
+ * if needed, emits WAL.
+ */
+void
+heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
for (int i = 0; i < ntuples; i++)
{
@@ -6844,22 +6839,6 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
htup = (HeapTupleHeader) PageGetItem(page, itemid);
heap_execute_freeze_tuple(htup, frz);
}
-
- MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(rel))
- {
- log_heap_prune_and_freeze(rel, buffer, snapshotConflictHorizon,
- false, /* no cleanup lock required */
- PRUNE_VACUUM_SCAN,
- tuples, ntuples,
- NULL, 0, /* redirected */
- NULL, 0, /* dead */
- NULL, 0); /* unused */
- }
-
- END_CRIT_SECTION();
}
/*
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c86000d245..0952d4a98e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1122,7 +1122,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
* We ignore unused and redirect line pointers. DEAD line pointers
* should be counted as dead, because we need vacuum to run to get rid
* of them. Note that this rule agrees with the way that
- * heap_page_prune() counts things.
+ * heap_page_prune_and_freeze() counts things.
*/
if (!ItemIdIsNormal(itemid))
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1b5bf990d2..41c919b15b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -17,32 +17,54 @@
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
+#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
-/* Working data for heap_page_prune and subroutines */
+/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
+ /*-------------------------------------------------------
+ * Arguments passed to heap_page_and_freeze()
+ *-------------------------------------------------------
+ */
+
/* tuple visibility test, initialized for the relation */
GlobalVisState *vistest;
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
+ /* whether to attempt freezing tuples */
+ bool freeze;
+ struct VacuumCutoffs *cutoffs;
- TransactionId new_prune_xid; /* new prune hint value for page */
- TransactionId snapshotConflictHorizon; /* latest xid removed */
+ /*-------------------------------------------------------
+ * Fields describing what to do to the page
+ *-------------------------------------------------------
+ */
+ TransactionId new_prune_xid; /* new prune hint value */
+ TransactionId latest_xid_removed;
int nredirected; /* numbers of entries in arrays below */
int ndead;
int nunused;
+ int nfrozen;
/* arrays that accumulate indexes of items to be changed */
OffsetNumber redirected[MaxHeapTuplesPerPage * 2];
OffsetNumber nowdead[MaxHeapTuplesPerPage];
OffsetNumber nowunused[MaxHeapTuplesPerPage];
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+
+ /*-------------------------------------------------------
+ * Working state for HOT chain processing
+ *-------------------------------------------------------
+ */
/*
* 'root_items' contains offsets of all LP_REDIRECT line pointers and
@@ -63,24 +85,92 @@ typedef struct
*/
bool processed[MaxHeapTuplesPerPage + 1];
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * Freezing-related state.
+ */
+ HeapPageFreeze pagefrz;
+
+ /*-------------------------------------------------------
+ * Information about what was done
+ *
+ * These fields are not used by pruning itself for the most part, but are
+ * used to collect information about what was pruned and what state the
+ * page is in after pruning, for the benefit of the caller. They are
+ * copied to the caller's PruneFreezeResult at the end.
+ * -------------------------------------------------------
+ */
+
int ndeleted; /* Number of tuples deleted from the page */
+
+ /* Number of live and recently dead tuples, after pruning */
+ int live_tuples;
+ int recently_dead_tuples;
+
+ /* Whether or not the page makes rel truncation unsafe */
+ bool hastup;
+
+ /*
+ * LP_DEAD items on the page after pruning. Includes existing LP_DEAD
+ * items
+ */
+ int lpdead_items; /* number of items in the array */
+ OffsetNumber *deadoffsets; /* points directly to presult->deadoffsets */
+
+ /*
+ * all_visible and all_frozen indicate if the all-visible and all-frozen
+ * bits in the visibility map can be set for this page after pruning.
+ *
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page.
+ * The caller can use it as the conflict horizon, when setting the VM
+ * bits. It is only valid if we froze some tuples, and all_frozen is
+ * true.
+ *
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
+ * convenient for heap_page_prune_and_freeze(), to use them to decide
+ * whether to freeze the page or not. The all_visible and all_frozen
+ * values returned to the caller are adjusted to include LP_DEAD items at
+ * the end.
+ *
+ * all_frozen should only be considered valid if all_visible is also set;
+ * we don't bother to clear the all_frozen flag every time we clear the
+ * all_visible flag.
+ */
+ bool all_visible;
+ bool all_frozen;
+ TransactionId visibility_cutoff_xid;
} PruneState;
/* Local functions */
static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
HeapTuple tup,
Buffer buffer);
+static inline HTSV_Result htsv_get_valid_status(int status);
static void heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
- OffsetNumber rootoffnum, int8 *htsv, PruneState *prstate);
+ OffsetNumber rootoffnum, PruneState *prstate);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum, bool was_normal);
-static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ bool was_normal);
+static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ bool was_normal);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ bool was_normal);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
@@ -163,15 +253,15 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
OffsetNumber dummy_off_loc;
- PruneResult presult;
+ PruneFreezeResult presult;
/*
* For now, pass mark_unused_now as false regardless of whether or
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, 0,
- &presult, PRUNE_ON_ACCESS, &dummy_off_loc);
+ heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -205,13 +295,24 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
/*
- * Prune and repair fragmentation in the specified page.
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
+ * required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now. The
+ * 'cutoffs', 'presult', 'new_refrozen_xid' and 'new_relmin_mxid' arguments
+ * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
+ * set presult->all_visible and presult->all_frozen on exit, to indicate if
+ * the VM bits can be set. They are always set to false when the
+ * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
+ * that also freeze need that information.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -219,23 +320,39 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
* pruning.
*
- * presult contains output parameters needed by callers such as the number of
- * tuples removed and the number of line pointers newly marked LP_DEAD.
- * heap_page_prune() is responsible for initializing it.
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
+ * of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it.
*
* reason indicates why the pruning is performed. It is included in the WAL
* record for debugging and analysis purposes, but otherwise has no effect.
*
* off_loc is the offset location required by the caller to use in error
* callback.
+ *
+ * new_relfrozen_xid and new_relmin_xid must provided by the caller if the
+ * HEAP_PRUNE_FREEZE option is set. On entry, they contain the oldest XID and
+ * multi-XID seen on the relation so far. They will be updated with oldest
+ * values present on the page after pruning. After processing the whole
+ * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
+ * for the relation.
*/
void
-heap_page_prune(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc)
+heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ GlobalVisState *vistest,
+ int options,
+ struct VacuumCutoffs *cutoffs,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -243,6 +360,17 @@ heap_page_prune(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
+ bool do_freeze;
+ bool do_prune;
+ bool do_hint;
+ bool hint_bit_fpi;
+ int64 fpi_before = pgWalUsage.wal_fpi;
+
+ /* Copy parameters to prstate */
+ prstate.vistest = vistest;
+ prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+ prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.cutoffs = cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -256,36 +384,97 @@ heap_page_prune(Relation relation, Buffer buffer,
* initialize the rest of our working state.
*/
prstate.new_prune_xid = InvalidTransactionId;
- prstate.vistest = vistest;
- prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.snapshotConflictHorizon = InvalidTransactionId;
- prstate.nredirected = prstate.ndead = prstate.nunused = 0;
- prstate.ndeleted = 0;
+ prstate.latest_xid_removed = InvalidTransactionId;
+ prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
prstate.nroot_items = 0;
prstate.nheaponly_items = 0;
+ /* initialize page freezing working state */
+ prstate.pagefrz.freeze_required = false;
+ if (prstate.freeze)
+ {
+ prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
+ prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+ }
+ else
+ {
+ Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
+ prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ }
+
+ prstate.ndeleted = 0;
+ prstate.live_tuples = 0;
+ prstate.recently_dead_tuples = 0;
+ prstate.hastup = false;
+ prstate.lpdead_items = 0;
+ prstate.deadoffsets = presult->deadoffsets;
+
/*
- * presult->htsv is not initialized here because all ntuple spots in the
- * array will be set either to a valid HTSV_Result value or -1.
+ * Caller may update the VM after we're done. We keep track of whether
+ * the page will be all_visible and all_frozen, once we're done with the
+ * pruning and freezing, to help the caller to do that.
+ *
+ * Currently, only VACUUM sets the VM bits. To save the effort, only do
+ * only the bookkeeping if the caller needs it. Currently, that's tied to
+ * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag, if you wanted
+ * to update the VM bits without also freezing, or freezing without
+ * setting the VM bits.
+ *
+ * In addition to telling the caller whether it can set the VM bit, we
+ * also use 'all_visible' and 'all_frozen' for our own decision-making. If
+ * the whole page will become frozen, we consider opportunistically
+ * freezing tuples. We will not be able to freeze the whole page if there
+ * are tuples present that are not visible to everyone or if there are
+ * dead tuples which are not yet removable. However, dead tuples which
+ * will be removed by the end of vacuuming should not preclude us from
+ * opportunistically freezing. Because of that, we do not clear
+ * all_visible when we see LP_DEAD items. We fix that at the end of the
+ * function, when we return the value to the caller, so that the caller
+ * doesn't set the VM bit incorrectly.
*/
- presult->ndeleted = 0;
- presult->nnewlpdead = 0;
+ if (prstate.freeze)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = true;
+ }
+ else
+ {
+ /*
+ * Initializing to false allows skipping the work to update them in
+ * heap_prune_record_unchanged_lp_normal().
+ */
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
+ /*
+ * The visibility cutoff xid is the newest xmin of live tuples on the
+ * page. In the common case, this will be set as the conflict horizon the
+ * caller can use for updating the VM. If, at the end of freezing and
+ * pruning, the page is all-frozen, there is no possibility that any
+ * running transaction on the standby does not see tuples on the page as
+ * all-visible, so the conflict horizon remains InvalidTransactionId.
+ */
+ prstate.visibility_cutoff_xid = InvalidTransactionId;
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
- * chain roots or as a heap-only items.
+ * chain roots or as heap-only items.
*
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
* checked item causes GlobalVisTestIsRemovableFullXid() to update the
* horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts. VACUUM assumes that there are no normal DEAD
- * tuples left on the page after pruning, so it needs to have the same
- * understanding of what is DEAD and what is not.
+ * transaction aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -310,7 +499,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*off_loc = offnum;
prstate.processed[offnum] = false;
- presult->htsv[offnum] = -1;
+ prstate.htsv[offnum] = -1;
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
@@ -349,8 +538,8 @@ heap_page_prune(Relation relation, Buffer buffer,
tup.t_len = ItemIdGetLength(itemid);
ItemPointerSet(&tup.t_self, blockno, offnum);
- presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
if (!HeapTupleHeaderIsHeapOnly(htup))
prstate.root_items[prstate.nroot_items++] = offnum;
@@ -358,6 +547,12 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
}
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted.
+ */
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+
/*
* Process HOT chains.
*
@@ -381,8 +576,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*off_loc = offnum;
/* Process this item or chain of items */
- heap_prune_chain(page, blockno, maxoff,
- offnum, presult->htsv, &prstate);
+ heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
}
/*
@@ -412,7 +606,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* return true for an XMIN_INVALID tuple, so this code will work even
* when there were sequential updates within the aborted transaction.)
*/
- if (presult->htsv[offnum] == HEAPTUPLE_DEAD)
+ if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -420,7 +614,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate.snapshotConflictHorizon);
+ &prstate.latest_xid_removed);
heap_prune_record_unused(&prstate, offnum, true);
}
else
@@ -438,7 +632,7 @@ heap_page_prune(Relation relation, Buffer buffer,
}
}
else
- heap_prune_record_unchanged_lp_normal(page, presult->htsv, &prstate, offnum);
+ heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -456,21 +650,102 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
- /* Any error while applying the changes is critical */
- START_CRIT_SECTION();
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
- /* Have we found any prunable items? */
- if (prstate.nredirected > 0 || prstate.ndead > 0 || prstate.nunused > 0)
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
+
+ /*
+ * Decide if we want to go ahead with freezing according to the freeze
+ * plans we prepared, or not.
+ */
+ do_freeze = false;
+ if (prstate.freeze)
+ {
+ if (prstate.pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID
+ * from before FreezeLimit/MultiXactCutoff is present. Must
+ * freeze to advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page, if we are generating an FPI
+ * anyway, and if doing so means that we can set the page
+ * all-frozen afterwards (might not happen until VACUUM's final
+ * heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze
+ * and prune records are combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ bool whole_page_freezable = prstate.all_visible &&
+ prstate.all_frozen;
+
+ if (whole_page_freezable && prstate.nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. In this case, we
+ * will freeze if we have already emitted an FPI or will do so
+ * anyway. Be sure only to incur the overhead of checking if
+ * we will do an FPI if we may use that information.
+ */
+ if (hint_bit_fpi ||
+ ((do_prune || do_hint) && XLogCheckBufferNeedsBackup(buffer)))
+ {
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
{
/*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
*/
- heap_page_prune_execute(buffer, false,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
+ heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
+ }
+ else if (prstate.nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate.pagefrz.freeze_required);
+ prstate.all_frozen = false;
+ prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ /* Any error while applying the changes is critical */
+ START_CRIT_SECTION();
+
+ if (do_hint)
+ {
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
* XID of any soon-prunable tuple.
@@ -484,6 +759,29 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
PageClearFull(page);
+ /*
+ * If that's all we had to do to the page, this is a non-WAL-logged
+ * hint. If we will also freeze or prune the page, we will mark the
+ * buffer dirty below.
+ */
+ if (!do_freeze && !do_prune)
+ MarkBufferDirtyHint(buffer, true);
+ }
+
+ if (do_prune || do_freeze)
+ {
+ /* Apply the planned item changes and repair page fragmentation. */
+ if (do_prune)
+ {
+ heap_page_prune_execute(buffer, false,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
+ }
+
+ if (do_freeze)
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+
MarkBufferDirty(buffer);
/*
@@ -491,40 +789,115 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
+ /*
+ * The snapshotConflictHorizon for the whole record should be the
+ * most conservative of all the horizons calculated for any of the
+ * possible modifications. If this record will prune tuples, any
+ * transactions on the standby older than the youngest xmax of the
+ * most recently removed tuple this record will prune will
+ * conflict. If this record will freeze tuples, any transactions
+ * on the standby with xids older than the youngest tuple this
+ * record will freeze will conflict.
+ */
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+ TransactionId conflict_xid;
+
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for
+ * conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (do_freeze)
+ {
+ if (prstate.all_visible && prstate.all_frozen)
+ frz_conflict_horizon = prstate.visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+ }
+
+ if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
+ conflict_xid = frz_conflict_horizon;
+ else
+ conflict_xid = prstate.latest_xid_removed;
+
log_heap_prune_and_freeze(relation, buffer,
- prstate.snapshotConflictHorizon,
+ conflict_xid,
true, reason,
- NULL, 0,
+ prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
}
}
- else
- {
- /*
- * If we didn't prune anything, but have found a new value for the
- * pd_prune_xid field, update it and mark the buffer dirty. This is
- * treated as a non-WAL-logged hint.
- *
- * Also clear the "page is full" flag if it is set, since there's no
- * point in repeating the prune/defrag process until something else
- * happens to the page.
- */
- if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page))
- {
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
- PageClearFull(page);
- MarkBufferDirtyHint(buffer, true);
- }
- }
END_CRIT_SECTION();
/* Copy information back for caller */
- presult->nnewlpdead = prstate.ndead;
presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which heap pass (initial pass or final pass) ends up setting the
+ * page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state
+ * of things, as expected by our caller.
+ */
+ if (prstate.all_visible && prstate.lpdead_items == 0)
+ {
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
+ }
+ else
+ {
+ presult->all_visible = false;
+ presult->all_frozen = false;
+ }
+
+ presult->hastup = prstate.hastup;
+
+ /*
+ * For callers planning to update the visibility map, the conflict horizon
+ * for that record must be the newest xmin on the page. However, if the
+ * page is completely frozen, there can be no conflict and the
+ * vm_conflict_horizon should remain InvalidTransactionId. This includes
+ * the case that we just froze all the tuples; the prune-freeze record
+ * included the conflict XID already so the caller doesn't need it.
+ */
+ if (presult->all_frozen)
+ presult->vm_conflict_horizon = InvalidTransactionId;
+ else
+ presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ if (prstate.freeze)
+ {
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
+ }
}
@@ -549,10 +922,24 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
}
+/*
+ * Pruning calculates tuple visibility once and saves the results in an array
+ * of int8. See PruneState.htsv for details. This helper function is meant
+ * to guard against examining visibility status array members which have not
+ * yet been computed.
+ */
+static inline HTSV_Result
+htsv_get_valid_status(int status)
+{
+ Assert(status >= HEAPTUPLE_DEAD &&
+ status <= HEAPTUPLE_DELETE_IN_PROGRESS);
+ return (HTSV_Result) status;
+}
+
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in htsv.
+ * Tuple visibility information is provided in prstate->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -572,11 +959,17 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* prstate showing the changes to be made. Items to be redirected are added
* to the redirected[] array (two entries per redirection); items to be set to
* LP_DEAD state are added to nowdead[]; and items to be set to LP_UNUSED
- * state are added to nowunused[].
+ * state are added to nowunused[]. We perform bookkeeping of live tuples,
+ * visibility etc. based on what the page will look like after the changes
+ * applied. All that bookkeeping is performed in the heap_prune_record_*()
+ * subroutines. The division of labor is that heap_prune_chain() decides the
+ * fate of each tuple, ie. whether it's going to be removed, redirected or
+ * left unchanged, and the heap_prune_record_*() subroutines update PruneState
+ * based on that outcome.
*/
static void
heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
- OffsetNumber rootoffnum, int8 *htsv, PruneState *prstate)
+ OffsetNumber rootoffnum, PruneState *prstate)
{
TransactionId priorXmax = InvalidTransactionId;
ItemId rootlp;
@@ -656,15 +1049,14 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
*/
chainitems[nchain++] = offnum;
- switch (htsv_get_valid_status(htsv[offnum]))
+ switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
/* Remember the last DEAD tuple seen */
ndeadchain = nchain;
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
-
+ &prstate->latest_xid_removed);
/* Advance to next chain member */
break;
@@ -720,10 +1112,11 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
{
/*
* We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune caused us to
- * visit the dead successor of a redirect item before visiting the
- * redirect item. We can clean up by setting the redirect item to
- * LP_DEAD state or LP_UNUSED if the caller indicated.
+ * item. This can happen if the loop in heap_page_prune_and_freeze()
+ * caused us to visit the dead successor of a redirect item before
+ * visiting the redirect item. We can clean up by setting the
+ * redirect item to LP_DEAD state or LP_UNUSED if the caller
+ * indicated.
*/
heap_prune_record_dead_or_unused(prstate, rootoffnum, false);
return;
@@ -745,7 +1138,7 @@ process_chain:
i++;
}
for (; i < nchain; i++)
- heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
+ heap_prune_record_unchanged_lp_normal(page, prstate, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -771,7 +1164,7 @@ process_chain:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (int i = ndeadchain; i < nchain; i++)
- heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
+ heap_prune_record_unchanged_lp_normal(page, prstate, chainitems[i]);
}
}
@@ -816,6 +1209,8 @@ heap_prune_record_redirect(PruneState *prstate,
*/
if (was_normal)
prstate->ndeleted++;
+
+ prstate->hastup = true;
}
/* Record line pointer to be marked dead */
@@ -830,6 +1225,21 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
+ /*
+ * Deliberately delay unsetting all_visible until later during pruning.
+ * Removable dead tuples shouldn't preclude freezing the page. After
+ * finishing this first pass of tuple visibility checks, initialize
+ * all_visible_except_removable with the current value of all_visible to
+ * indicate whether or not the page is all visible except for dead tuples.
+ * This will allow us to attempt to freeze the page after pruning. Later
+ * during pruning, if we encounter an LP_DEAD item or are setting an item
+ * LP_DEAD, we will unset all_visible. As long as we unset it before
+ * updating the visibility map, this will be correct.
+ */
+
+ /* Record the dead offset for vacuum */
+ prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
/*
* If the root entry had been a normal tuple, we are deleting it, so count
* it in the result. But changing a redirect (even to DEAD state) doesn't
@@ -892,21 +1302,121 @@ heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumb
}
/*
- * Record LP_NORMAL line pointer that is left unchanged.
+ * Record line pointer that is left unchanged. We consider freezing it, and
+ * update bookkeeping of tuple counts and page visibility.
*/
static void
-heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumber offnum)
{
HeapTupleHeader htup;
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
- switch (htsv[offnum])
+ prstate->hastup = true; /* the page is not empty */
+
+ /*
+ * The criteria for counting a tuple as live in this block need to match
+ * what analyze.c's acquire_sample_rows() does, otherwise VACUUM and
+ * ANALYZE may produce wildly different reltuples values, e.g. when there
+ * are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as VACUUM
+ * can't run inside a transaction block, which makes some cases impossible
+ * (e.g. in-progress insert from the same transaction).
+ *
+ * HEAPTUPLE_DEAD are handled by the other heap_prune_record_*()
+ * subroutines. They don't count dead items like acquire_sample_rows()
+ * does, because we assume that all dead items will become LP_UNUSED
+ * before VACUUM finishes. This difference is only superficial. VACUUM
+ * effectively agrees with ANALYZE about DEAD items, in the end. VACUUM
+ * won't remember LP_DEAD items, but only because they're not supposed to
+ * be left behind when it is done. (Cases where we bypass index vacuuming
+ * will violate this optimistic assumption, but the overall impact of that
+ * should be negligible.)
+ */
+ htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+
+ switch (prstate->htsv[offnum])
{
case HEAPTUPLE_LIVE:
+
+ /*
+ * Count it as live. Not only is this natural, but it's also what
+ * acquire_sample_rows() does.
+ */
+ prstate->live_tuples++;
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed asynchronously.
+ * See SetHintBits for more info. Check that the tuple is hinted
+ * xmin-committed because of that.
+ */
+ if (prstate->all_visible)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ prstate->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed? A FrozenTransactionId
+ * is seen as committed to everyone. Otherwise, we check if
+ * there is a snapshot that considers this xid to still be
+ * running, and if so, we don't consider the page all-visible.
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+
+ /*
+ * For now always use prstate->cutoffs for this test, because
+ * we only update 'all_visible' when freezing is requested. We
+ * could use GlobalVisTestIsRemovableXid instead, if a
+ * non-freezing caller wanted to set the VM bit.
+ */
+ Assert(prstate->cutoffs);
+ if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
+ {
+ prstate->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
+ TransactionIdIsNormal(xmin))
+ prstate->visibility_cutoff_xid = xmin;
+ }
+ break;
+
+ case HEAPTUPLE_RECENTLY_DEAD:
+ prstate->recently_dead_tuples++;
+ prstate->all_visible = false;
+
+ /*
+ * This tuple will soon become DEAD. Update the hint field so
+ * that the page is reconsidered for pruning in future.
+ */
+ heap_prune_record_prunable(prstate,
+ HeapTupleHeaderGetUpdateXid(htup));
+ break;
+
case HEAPTUPLE_INSERT_IN_PROGRESS:
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and we
+ * assume that will happen only after we report our results. This
+ * assumption is a bit shaky, but it is what acquire_sample_rows()
+ * does, so be consistent.
+ */
+ prstate->all_visible = false;
+
/*
* If we wanted to optimize for aborts, we might consider marking
* the page prunable when we see INSERT_IN_PROGRESS. But we
@@ -915,10 +1425,15 @@ heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate
*/
break;
- case HEAPTUPLE_RECENTLY_DEAD:
case HEAPTUPLE_DELETE_IN_PROGRESS:
- htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+ /*
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
+ */
+ prstate->live_tuples++;
+ prstate->all_visible = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
@@ -928,16 +1443,40 @@ heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate
HeapTupleHeaderGetUpdateXid(htup));
break;
-
default:
/*
* DEAD tuples should've been passed to heap_prune_record_dead()
* or heap_prune_record_unused() instead.
*/
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d", htsv[offnum]);
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d",
+ prstate->htsv[offnum]);
break;
}
+
+ /* Consider freezing any normal tuples which will not be removed */
+ if (prstate->freeze)
+ {
+ bool totally_frozen;
+
+ if ((heap_prepare_freeze_tuple(htup,
+ prstate->cutoffs,
+ &prstate->pagefrz,
+ &prstate->frozen[prstate->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ prstate->frozen[prstate->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to
+ * become totally frozen (according to its freeze plan), then the page
+ * definitely cannot be set all-frozen in the visibility map later on.
+ */
+ if (!totally_frozen)
+ prstate->all_frozen = false;
+ }
}
@@ -949,6 +1488,24 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
{
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
+
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the soft
+ * assumption that any LP_DEAD items encountered here will become
+ * LP_UNUSED later on, before count_nondeletable_pages is reached. If we
+ * don't make this assumption then rel truncation will only happen every
+ * other VACUUM, at most. Besides, VACUUM must treat
+ * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
+ * handled (handled here, or handled later on).
+ *
+ * Similarly, don't unset all_visible until later, at the end of
+ * heap_page_prune_and_freeze(). This will allow us to attempt to freeze
+ * the page after pruning. As long as we unset it before updating the
+ * visibility map, this will be correct.
+ */
+
+ /* Record the dead offset for vacuum */
+ prstate->deadoffsets[prstate->lpdead_items++] = offnum;
}
/*
@@ -957,12 +1514,20 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
static void
heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum)
{
+ /*
+ * A redirect line pointer doesn't count as a live tuple.
+ *
+ * If we leave a redirect line pointer in place, there will be another
+ * tuple on the page that it points to. We will do the bookkeeping for
+ * that separately. So we have nothing to do here, except remember that
+ * we processed this item.
+ */
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
}
/*
- * Perform the actual page changes needed by heap_page_prune.
+ * Perform the actual page changes needed by heap_page_prune_and_freeze().
*
* If 'lp_truncate_only' is set, we are merely marking LP_DEAD line pointers
* as unused, not redirecting or removing anything else. The
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5fb8f7727b..ace95a4de2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -439,12 +439,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* as an upper bound on the XIDs stored in the pages we'll actually scan
* (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
*
- * Next acquire vistest, a related cutoff that's used in heap_page_prune.
- * We expect vistest will always make heap_page_prune remove any deleted
- * tuple whose xmax is < OldestXmin. lazy_scan_prune must never become
- * confused about whether a tuple should be frozen or removed. (In the
- * future we might want to teach lazy_scan_prune to recompute vistest from
- * time to time, to increase the number of dead tuples it can prune away.)
+ * Next acquire vistest, a related cutoff that's used in pruning. We
+ * expect vistest will always make heap_page_prune_and_freeze() remove any
+ * deleted tuple whose xmax is < OldestXmin. lazy_scan_prune must never
+ * become confused about whether a tuple should be frozen or removed. (In
+ * the future we might want to teach lazy_scan_prune to recompute vistest
+ * from time to time, to increase the number of dead tuples it can prune
+ * away.)
*/
vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
@@ -1382,27 +1383,18 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/* qsort comparator for sorting OffsetNumbers */
+static int
+cmpOffsetNumbers(const void *a, const void *b)
+{
+ return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
- * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
- * whether or not a tuple should be considered DEAD. This happened when an
- * inserting transaction concurrently aborted (after our heap_page_prune()
- * call, before our HeapTupleSatisfiesVacuum() call). There was rather a lot
- * of complexity just so we could deal with tuples that were DEAD to VACUUM,
- * but nevertheless were left with storage after pruning.
- *
- * As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune()'s visibility check. Without the second call to
- * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
- * disagreement. We'll just handle such tuples as if they had become fully dead
- * right after this operation completes instead of in the middle of it. Note that
- * any tuple that becomes dead after the call to heap_page_prune() can't need to
- * be frozen, because it was visible to another session when vacuum started.
- *
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
@@ -1421,330 +1413,46 @@ lazy_scan_prune(LVRelState *vacrel,
bool *has_lpdead_items)
{
Relation rel = vacrel->rel;
- OffsetNumber offnum,
- maxoff;
- ItemId itemid;
- PruneResult presult;
- int tuples_frozen,
- lpdead_items,
- live_tuples,
- recently_dead_tuples;
- HeapPageFreeze pagefrz;
- bool hastup = false;
- bool all_visible,
- all_frozen;
- TransactionId visibility_cutoff_xid;
+ PruneFreezeResult presult;
int prune_options = 0;
- int64 fpi_before = pgWalUsage.wal_fpi;
- OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
/*
- * maxoff might be reduced following line pointer array truncation in
- * heap_page_prune. That's safe for us to ignore, since the reclaimed
- * space will continue to look like LP_UNUSED items below.
- */
- maxoff = PageGetMaxOffsetNumber(page);
-
- /* Initialize (or reset) page-level state */
- pagefrz.freeze_required = false;
- pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
- pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
- tuples_frozen = 0;
- lpdead_items = 0;
- live_tuples = 0;
- recently_dead_tuples = 0;
-
- /*
- * Prune all HOT-update chains in this page.
- *
- * We count the number of tuples removed from the page by the pruning step
- * in presult.ndeleted. It should not be confused with lpdead_items;
- * lpdead_items's final value can be thought of as the number of tuples
- * that were deleted from indexes.
+ * Prune all HOT-update chains and potentially freeze tuples on this page.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
- */
- prune_options = 0;
- if (vacrel->nindexes == 0)
- prune_options = HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune(rel, buf, vacrel->vistest, prune_options,
- &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
-
- /*
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Keep track of whether or not the page is all_visible and
- * all_frozen and use this information to update the VM. all_visible
- * implies 0 lpdead_items, but don't trust all_frozen result unless
- * all_visible is also set to true.
*
- * Also keep track of the visibility cutoff xid for recovery conflicts.
- */
- all_visible = true;
- all_frozen = true;
- visibility_cutoff_xid = InvalidTransactionId;
-
- /*
- * Now scan the page to collect LP_DEAD items and update the variables set
- * just above.
- */
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
- {
- HeapTupleHeader htup;
- bool totally_frozen;
-
- /*
- * Set the offset number so that we can display it along with any
- * error that occurred while processing this tuple.
- */
- vacrel->offnum = offnum;
- itemid = PageGetItemId(page, offnum);
-
- if (!ItemIdIsUsed(itemid))
- continue;
-
- /* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid))
- {
- /* page makes rel truncation unsafe */
- hastup = true;
- continue;
- }
-
- if (ItemIdIsDead(itemid))
- {
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- *
- * Also deliberately delay unsetting all_visible until just before
- * we return to lazy_scan_heap caller, as explained in full below.
- * (This is another case where it's useful to anticipate that any
- * LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
- */
- deadoffsets[lpdead_items++] = offnum;
- continue;
- }
-
- Assert(ItemIdIsNormal(itemid));
-
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (htsv_get_valid_status(presult.htsv[offnum]))
- {
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
- {
- all_visible = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- recently_dead_tuples++;
- all_visible = false;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- all_visible = false;
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
- all_visible = false;
-
- /*
- * Count such rows as live. As above, we assume the deleting
- * transaction will commit and update the counters after we
- * report.
- */
- live_tuples++;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
-
- hastup = true; /* page makes rel truncation unsafe */
-
- /* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
- &frozen[tuples_frozen], &totally_frozen))
- {
- /* Save prepared freeze plan for later */
- frozen[tuples_frozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the page
- * definitely cannot be set all-frozen in the visibility map later on
- */
- if (!totally_frozen)
- all_frozen = false;
- }
-
- /*
- * We have now divided every item on the page into either an LP_DEAD item
- * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
- * that remains and needs to be considered for freezing now (LP_UNUSED and
- * LP_REDIRECT items also remain, but are of no further interest to us).
- */
- vacrel->offnum = InvalidOffsetNumber;
-
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
+ * The number of tuples removed from the page is returned in
+ * presult.ndeleted. It should not be confused with presult.lpdead_items;
+ * presult.lpdead_items's final value can be thought of as the number of
+ * tuples that were deleted from indexes.
+ *
+ * We will update the VM after collecting LP_DEAD items and freezing
+ * tuples. Pruning will have determined whether or not the page is
+ * all-visible.
*/
- if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (all_visible && all_frozen &&
- fpi_before != pgWalUsage.wal_fpi))
- {
- /*
- * We're freezing the page. Our final NewRelfrozenXid doesn't need to
- * be affected by the XIDs that are just about to be frozen anyway.
- */
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
-
- if (tuples_frozen == 0)
- {
- /*
- * We have no freeze plans to execute, so there's no added cost
- * from following the freeze path. That's why it was chosen. This
- * is important in the case where the page only contains totally
- * frozen tuples at this point (perhaps only following pruning).
- * Such pages can be marked all-frozen in the VM by our caller,
- * even though none of its tuples were newly frozen here (note
- * that the "no freeze" path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter
- * here, since it only counts pages with newly frozen tuples
- * (don't confuse that with pages newly set all-frozen in VM).
- */
- }
- else
- {
- TransactionId snapshotConflictHorizon;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ if (vacrel->nindexes == 0)
+ prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- vacrel->frozen_pages++;
+ heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ &vacrel->offnum,
+ &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
- /*
- * We can use visibility_cutoff_xid as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM
- * once we're done with it. Otherwise we generate a conservative
- * cutoff by stepping back from OldestXmin.
- */
- if (all_visible && all_frozen)
- {
- /* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = visibility_cutoff_xid;
- visibility_cutoff_xid = InvalidTransactionId;
- }
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
+ Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
+ Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- frozen, tuples_frozen);
- }
- }
- else
+ if (presult.nfrozen > 0)
{
/*
- * Page requires "no freeze" processing. It might be set all-visible
- * in the visibility map, but it can never be set all-frozen.
+ * We don't increment the frozen_pages instrumentation counter when
+ * nfrozen == 0, since it only counts pages with newly frozen tuples
+ * (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- all_frozen = false;
- tuples_frozen = 0; /* avoid miscounts in instrumentation */
+ vacrel->frozen_pages++;
}
/*
@@ -1756,71 +1464,71 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && lpdead_items == 0)
+ if (presult.all_visible)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
+ Assert(presult.lpdead_items == 0);
+
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
Assert(false);
+ Assert(presult.all_frozen == debug_all_frozen);
+
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == visibility_cutoff_xid);
+ debug_cutoff == presult.vm_conflict_horizon);
}
#endif
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
- if (lpdead_items > 0)
+ if (presult.lpdead_items > 0)
{
vacrel->lpdead_item_pages++;
- dead_items_add(vacrel, blkno, deadoffsets, lpdead_items);
-
/*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on
- * to make the choice of whether or not to freeze the page unaffected
- * by the short-term presence of LP_DEAD items. These LP_DEAD items
- * were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible. It needs
- * to reflect the present state of things, as expected by our caller.
+ * deadoffsets are collected incrementally in
+ * heap_page_prune_and_freeze() as each dead line pointer is recorded,
+ * with an indeterminate order, but dead_items_add requires them to be
+ * sorted.
*/
- all_visible = false;
+ qsort(presult.deadoffsets, presult.lpdead_items, sizeof(OffsetNumber),
+ OffsetNumber_cmp);
+
+ dead_items_add(vacrel, blkno, presult.deadoffsets, presult.lpdead_items);
}
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += tuples_frozen;
- vacrel->lpdead_items += lpdead_items;
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
+ vacrel->tuples_frozen += presult.nfrozen;
+ vacrel->lpdead_items += presult.lpdead_items;
+ vacrel->live_tuples += presult.live_tuples;
+ vacrel->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
- if (hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
- *has_lpdead_items = (lpdead_items > 0);
+ *has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_visible || !(*has_lpdead_items));
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (all_frozen)
+ if (presult.all_frozen)
{
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1840,7 +1548,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
+ vmbuffer, presult.vm_conflict_horizon,
flags);
}
@@ -1873,7 +1581,7 @@ lazy_scan_prune(LVRelState *vacrel,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (lpdead_items > 0 && PageIsAllVisible(page))
+ else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
@@ -1888,8 +1596,8 @@ lazy_scan_prune(LVRelState *vacrel,
* it as all-frozen. Note that all_frozen is only valid if all_visible is
* true, so we must check both all_visible and all_frozen.
*/
- else if (all_visible_according_to_vm && all_visible &&
- all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if (all_visible_according_to_vm && presult.all_visible &&
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
@@ -1905,11 +1613,11 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Set the page all-frozen (and all-visible) in the VM.
*
- * We can pass InvalidTransactionId as our visibility_cutoff_xid,
- * since a snapshotConflictHorizon sufficient to make everything safe
- * for REDO was logged when the page's tuples were frozen.
+ * We can pass InvalidTransactionId as our cutoff_xid, since a
+ * snapshotConflictHorizon sufficient to make everything safe for REDO
+ * was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b632fe953c..536711d98e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -36,8 +36,9 @@
#define HEAP_INSERT_NO_LOGICAL TABLE_INSERT_NO_LOGICAL
#define HEAP_INSERT_SPECULATIVE 0x0010
-/* "options" flag bits for heap_page_prune */
+/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
+#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -195,24 +196,47 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
/*
- * Per-page state returned from pruning
+ * Per-page state returned by heap_page_prune_and_freeze()
*/
-typedef struct PruneResult
+typedef struct PruneFreezeResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ int nfrozen; /* Number of tuples we froze */
+
+ /* Number of live and recently dead tuples on the page, after pruning */
+ int live_tuples;
+ int recently_dead_tuples;
/*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ * all_visible and all_frozen indicate if the all-visible and all-frozen
+ * bits in the visibility map can be set for this page, after pruning.
+ *
+ * vm_conflict_horizon is the newest xmin of live tuples on the page. The
+ * caller can use it as the conflict horizon when setting the VM bits. It
+ * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
+ * true.
*
- * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
- * 1. Otherwise every access would need to subtract 1.
+ * These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
- int8 htsv[MaxHeapTuplesPerPage + 1];
-} PruneResult;
+ bool all_visible;
+ bool all_frozen;
+ TransactionId vm_conflict_horizon;
+
+ /*
+ * Whether or not the page makes rel truncation unsafe. This is set to
+ * 'true', even if the page contains LP_DEAD items. VACUUM will remove
+ * them before attempting to truncate.
+ */
+ bool hastup;
+
+ /*
+ * LP_DEAD items on the page after pruning. Includes existing LP_DEAD
+ * items.
+ */
+ int lpdead_items;
+ OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
+} PruneFreezeResult;
/* 'reason' codes for heap_page_prune() */
typedef enum
@@ -222,20 +246,6 @@ typedef enum
PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
} PruneReason;
-/*
- * Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneResult.htsv for details. This helper function is meant to
- * guard against examining visibility status array members which have not yet
- * been computed.
- */
-static inline HTSV_Result
-htsv_get_valid_status(int status)
-{
- Assert(status >= HEAPTUPLE_DEAD &&
- status <= HEAPTUPLE_DELETE_IN_PROGRESS);
- return (HTSV_Result) status;
-}
-
/* ----------------
* function prototypes for heap access method
*
@@ -309,9 +319,11 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
-extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples);
+
+extern void heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
+extern void heap_freeze_prepared_tuples(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern bool heap_freeze_tuple(HeapTupleHeader tuple,
TransactionId relfrozenxid, TransactionId relminmxid,
TransactionId FreezeLimit, TransactionId MultiXactCutoff);
@@ -332,12 +344,15 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
- int options,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc);
+extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ struct GlobalVisState *vistest,
+ int options,
+ struct VacuumCutoffs *cutoffs,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid);
extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8bc8dd6f1c..46edffef38 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2195,7 +2195,7 @@ PromptInterruptContext
ProtocolVersion
PrsStorage
PruneReason
-PruneResult
+PruneFreezeResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.40.1
v13-0001-Refactor-how-heap_prune_chain-updates-prunable_x.patchtext/x-patch; charset=US-ASCII; name=v13-0001-Refactor-how-heap_prune_chain-updates-prunable_x.patchDownload
From 9734a59cbe7caa8e62327686008c0a718f591838 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 2 Apr 2024 15:47:06 +0300
Subject: [PATCH v13 1/4] Refactor how heap_prune_chain() updates prunable_xid
In preparation of freezing and counting tuples which are not
candidates for pruning, split heap_prune_record_unchanged() into
multiple functions, depending the kind of line pointer. That's not too
interesting right now, but makes the next commit smaller.
Recording the lowest soon-to-be prunable xid is one of the actions we
take for unchanged LP_NORMAL item pointers but not for others, so move
that to the new heap_prune_record_unchanged_lp_normal() function. The
next commit will add more actions to these functions.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
---
src/backend/access/heap/pruneheap.c | 125 ++++++++++++++++++++--------
1 file changed, 92 insertions(+), 33 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ef563e19aa..1b5bf990d2 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -78,7 +78,11 @@ static void heap_prune_record_redirect(PruneState *prstate,
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_unchanged(PruneState *prstate, OffsetNumber offnum);
+
+static void heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -311,7 +315,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
{
- heap_prune_record_unchanged(&prstate, offnum);
+ heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
continue;
}
@@ -324,7 +328,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (unlikely(prstate.mark_unused_now))
heap_prune_record_unused(&prstate, offnum, false);
else
- heap_prune_record_unchanged(&prstate, offnum);
+ heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
continue;
}
@@ -434,7 +438,7 @@ heap_page_prune(Relation relation, Buffer buffer,
}
}
else
- heap_prune_record_unchanged(&prstate, offnum);
+ heap_prune_record_unchanged_lp_normal(page, presult->htsv, &prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -652,9 +656,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
*/
chainitems[nchain++] = offnum;
- /*
- * Check tuple's visibility status.
- */
switch (htsv_get_valid_status(htsv[offnum]))
{
case HEAPTUPLE_DEAD:
@@ -670,9 +671,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
case HEAPTUPLE_RECENTLY_DEAD:
/*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- *
* We don't need to advance the conflict horizon for
* RECENTLY_DEAD tuples, even if we are removing them. This
* is because we only remove RECENTLY_DEAD tuples if they
@@ -681,8 +679,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
* tuple by virtue of being later in the chain. We will have
* advanced the conflict horizon for the DEAD tuple.
*/
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
/*
* Advance past RECENTLY_DEAD tuples just in case there's a
@@ -693,24 +689,8 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- */
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
- goto process_chain;
-
case HEAPTUPLE_LIVE:
case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * If we wanted to optimize for aborts, we might consider
- * marking the page prunable when we see INSERT_IN_PROGRESS.
- * But we don't. See related decisions about when to mark the
- * page prunable in heapam.c.
- */
goto process_chain;
default:
@@ -757,8 +737,15 @@ process_chain:
* No DEAD tuple was found, so the chain is entirely composed of
* normal, unchanged tuples. Leave it alone.
*/
- for (int i = 0; i < nchain; i++)
- heap_prune_record_unchanged(prstate, chainitems[i]);
+ int i = 0;
+
+ if (ItemIdIsRedirected(rootlp))
+ {
+ heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum);
+ i++;
+ }
+ for (; i < nchain; i++)
+ heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -784,7 +771,7 @@ process_chain:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (int i = ndeadchain; i < nchain; i++)
- heap_prune_record_unchanged(prstate, chainitems[i]);
+ heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
}
}
@@ -894,9 +881,81 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_norm
prstate->ndeleted++;
}
-/* Record a line pointer that is left unchanged */
+/*
+ * Record an unused line pointer that is left unchanged.
+ */
+static void
+heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum)
+{
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+}
+
+/*
+ * Record LP_NORMAL line pointer that is left unchanged.
+ */
+static void
+heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum)
+{
+ HeapTupleHeader htup;
+
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+
+ switch (htsv[offnum])
+ {
+ case HEAPTUPLE_LIVE:
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * If we wanted to optimize for aborts, we might consider marking
+ * the page prunable when we see INSERT_IN_PROGRESS. But we
+ * don't. See related decisions about when to mark the page
+ * prunable in heapam.c.
+ */
+ break;
+
+ case HEAPTUPLE_RECENTLY_DEAD:
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+ htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+
+ /*
+ * This tuple may soon become DEAD. Update the hint field so that
+ * the page is reconsidered for pruning in future.
+ */
+ heap_prune_record_prunable(prstate,
+ HeapTupleHeaderGetUpdateXid(htup));
+ break;
+
+
+ default:
+
+ /*
+ * DEAD tuples should've been passed to heap_prune_record_dead()
+ * or heap_prune_record_unused() instead.
+ */
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d", htsv[offnum]);
+ break;
+ }
+}
+
+
+/*
+ * Record line pointer that was already LP_DEAD and is left unchanged.
+ */
+static void
+heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum)
+{
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+}
+
+/*
+ * Record LP_REDIRECT that is left unchanged.
+ */
static void
-heap_prune_record_unchanged(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum)
{
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
--
2.40.1
On Tue, Apr 02, 2024 at 01:24:27PM -0400, Melanie Plageman wrote:
On Tue, Apr 2, 2024 at 9:11 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 01/04/2024 20:22, Melanie Plageman wrote:
Review for 0003-0006 (I didn't have any new thoughts on 0002). I know
you didn't modify them much/at all, but I noticed some things in my code
that could be better.Ok, here's what I have now. I made a lot of small comment changes here
and there, and some minor local refactorings, but nothing major. I lost
track of all the individual changes I'm afraid, so I'm afraid you'll
have to just diff against the previous version if you want to see what's
changed. I hope I didn't break anything.I'm pretty happy with this now. I will skim through it one more time
later today or tomorrow, and commit. Please review once more if you have
a chance.Thanks!
0001 looks good. Attached are some comment updates and such on top of
0001 and 0002.I started some performance testing of 0002 but haven't finished yet. I
wanted to provide my other review first.
I tried to do some performance tests of just on-access HOT pruning with
the patches in this thread applied. I'm not sure if I succeeded in being
targeted enough to have usable results.
Off-list Andres gave me some suggestions of how to improve my test case
and setup and this is what I ended up doing:
----------------------------------------
On-access pruning during a SELECT query:
----------------------------------------
# Make a table with a single not NULL column of a small datatype to fit
# as many tuples as possible on the page so each page we prune exercises
# those loops in heap_page_prune_and_freeze() and heap_prune_chain() as
# much as possible
psql -c "create table small(col smallint not null)"
# Insert data that is the same except for ~1 row per page with a different value
for i in $(seq 1000)
do
psql -c "INSERT INTO small VALUES(2);" -c "INSERT INTO small SELECT 1 FROM (SELECT generate_series(1,220));"
done
# COPY this data to a file
psql -c "COPY small TO '/tmp/small.data';"
# Start postgres bound to a single CPU core
# Run the following script with pgbench
# Make the table unlogged table so we don't see the effects of WAL writes in
# results
#
# Make sure autovacuum doesn't run on the table
drop table if exists small;
create unlogged table small(col smallint not null) with (autovacuum_enabled = false);
copy small from '/tmp/small.data';
update small set col = 9 where col = 2;
select * from small where col = 0;
pgbench -n -f query.sql -t 100 -M prepared -r
# (I made sure that HOT pruning was actually happening for the SELECT
# query before running this with pgbench)
# There seemed to be no meaningful difference for this example with the
# patches:
on current master:
statement latencies in milliseconds and failures:
12.387 0 drop table if exists small;
1.914 0 create unlogged table small(col smallint not null) with (autovacuum_enabled = false);
100.152 0 copy small from '/tmp/small.data';
49.480 0 update small set col = 9 where col = 2;
46.835 0 select * from small where col = 0;
with the patches applied:
statement latencies in milliseconds and failures:
13.281 0 drop table if exists small;
1.952 0 create unlogged table small(col smallint not null) with (autovacuum_enabled = false);
99.418 0 copy small from '/tmp/small.data';
47.397 0 update small set col = 9 where col = 2;
46.887 0 select * from small where col = 0;
--------------------------------
On-access pruning during UPDATE:
--------------------------------
# The idea is to test a query which would be calling the new
# heap_prune_record_unchanged_lp_normal() function a lot
# I made the same table but filled entirely with the same value
psql -c "create table small(col smallint not null)" \
-c "INSERT INTO small SELECT 1 FROM (SELECT generate_series(1, 221000));"
# COPY this data to a file
psql -c "COPY small TO '/tmp/small_univalue.data';"
# Start postgres bound to a single CPU core
# Run the following script with pgbench
# Pick a low fillfactor so we have room for the HOT updates
drop table if exists small;
create unlogged table small(col smallint not null) with (autovacuum_enabled = false, fillfactor=25);
copy small from '/tmp/small_univalue.data';
update small set col = 3;
update small set col = 4;
update small set col = 5;
update small set col = 6;
pgbench -n -f update.sql -t 20 -M prepared -r
# There again seems to be no meaningful difference with the patches
# applied
on master:
statement latencies in milliseconds and failures:
19.880 0 drop table if exists small;
2.099 0 create unlogged table small(col smallint not null) with (autovacuum_enabled = false, fillfactor=25);
130.793 0 copy small from '/tmp/small_univalue.data';
377.707 0 update small set col = 3;
417.644 0 update small set col = 4;
483.974 0 update small set col = 5;
422.956 0 update small set col = 6;
with patches applied:
statement latencies in milliseconds and failures:
19.995 0 drop table if exists small;
2.034 0 create unlogged table small(col smallint not null) with (autovacuum_enabled = false, fillfactor=25);
124.270 0 copy small from '/tmp/small_univalue.data';
375.327 0 update small set col = 3;
419.336 0 update small set col = 4;
483.750 0 update small set col = 5;
420.451 0 update small set col = 6;
- Melanie
On 02/04/2024 16:11, Heikki Linnakangas wrote:
On 01/04/2024 20:22, Melanie Plageman wrote:
Review for 0003-0006 (I didn't have any new thoughts on 0002). I know
you didn't modify them much/at all, but I noticed some things in my code
that could be better.Ok, here's what I have now. I made a lot of small comment changes here
and there, and some minor local refactorings, but nothing major. I lost
track of all the individual changes I'm afraid, so I'm afraid you'll
have to just diff against the previous version if you want to see what's
changed. I hope I didn't break anything.I'm pretty happy with this now. I will skim through it one more time
later today or tomorrow, and commit. Please review once more if you have
a chance.This probably doesn't belong here. I noticed spgdoinsert.c had a static
function for sorting OffsetNumbers, but I didn't see anything general
purpose anywhere else.I copied the spgdoinsert.c implementation to vacuumlazy.c as is. Would
be nice to have just one copy of this in some common place, but I also
wasn't sure where to put it.
One more version, with two small fixes:
1. I fumbled the offsetnumber-cmp function at the last minute so that it
didn't compile. Fixed. that
2. On VACUUM on an unlogged or temp table, the logic always thought that
we would be generating an FPI, causing it to always freeze when it
could. But of course, you never generate FPIs on an unlogged table.
Fixed that. (Perhaps we should indeed freeze more aggressively on an
unlogged table, but changing the heuristic is out of scope for this patch.)
Off-list, Melanie reported that there is a small regression with the
benchmark script she posted yesterday, after all, but I'm not able to
reproduce that.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
v13-0001-Refactor-how-heap_prune_chain-updates-prunable_x.patchtext/x-patch; charset=UTF-8; name=v13-0001-Refactor-how-heap_prune_chain-updates-prunable_x.patchDownload
From e04bda4666d7eaff0c520a9c9e1468a9c4cc9f51 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 2 Apr 2024 15:47:06 +0300
Subject: [PATCH v13 1/3] Refactor how heap_prune_chain() updates prunable_xid
In preparation of freezing and counting tuples which are not
candidates for pruning, split heap_prune_record_unchanged() into
multiple functions, depending the kind of line pointer. That's not too
interesting right now, but makes the next commit smaller.
Recording the lowest soon-to-be prunable xid is one of the actions we
take for unchanged LP_NORMAL item pointers but not for others, so move
that to the new heap_prune_record_unchanged_lp_normal() function. The
next commit will add more actions to these functions.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
---
src/backend/access/heap/pruneheap.c | 125 ++++++++++++++++++++--------
1 file changed, 92 insertions(+), 33 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ef563e19aa5..1b5bf990d21 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -78,7 +78,11 @@ static void heap_prune_record_redirect(PruneState *prstate,
static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_unchanged(PruneState *prstate, OffsetNumber offnum);
+
+static void heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
static void page_verify_redirects(Page page);
@@ -311,7 +315,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
{
- heap_prune_record_unchanged(&prstate, offnum);
+ heap_prune_record_unchanged_lp_unused(page, &prstate, offnum);
continue;
}
@@ -324,7 +328,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (unlikely(prstate.mark_unused_now))
heap_prune_record_unused(&prstate, offnum, false);
else
- heap_prune_record_unchanged(&prstate, offnum);
+ heap_prune_record_unchanged_lp_dead(page, &prstate, offnum);
continue;
}
@@ -434,7 +438,7 @@ heap_page_prune(Relation relation, Buffer buffer,
}
}
else
- heap_prune_record_unchanged(&prstate, offnum);
+ heap_prune_record_unchanged_lp_normal(page, presult->htsv, &prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -652,9 +656,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
*/
chainitems[nchain++] = offnum;
- /*
- * Check tuple's visibility status.
- */
switch (htsv_get_valid_status(htsv[offnum]))
{
case HEAPTUPLE_DEAD:
@@ -670,9 +671,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
case HEAPTUPLE_RECENTLY_DEAD:
/*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- *
* We don't need to advance the conflict horizon for
* RECENTLY_DEAD tuples, even if we are removing them. This
* is because we only remove RECENTLY_DEAD tuples if they
@@ -681,8 +679,6 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
* tuple by virtue of being later in the chain. We will have
* advanced the conflict horizon for the DEAD tuple.
*/
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
/*
* Advance past RECENTLY_DEAD tuples just in case there's a
@@ -693,24 +689,8 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
-
- /*
- * This tuple may soon become DEAD. Update the hint field so
- * that the page is reconsidered for pruning in future.
- */
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
- goto process_chain;
-
case HEAPTUPLE_LIVE:
case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * If we wanted to optimize for aborts, we might consider
- * marking the page prunable when we see INSERT_IN_PROGRESS.
- * But we don't. See related decisions about when to mark the
- * page prunable in heapam.c.
- */
goto process_chain;
default:
@@ -757,8 +737,15 @@ process_chain:
* No DEAD tuple was found, so the chain is entirely composed of
* normal, unchanged tuples. Leave it alone.
*/
- for (int i = 0; i < nchain; i++)
- heap_prune_record_unchanged(prstate, chainitems[i]);
+ int i = 0;
+
+ if (ItemIdIsRedirected(rootlp))
+ {
+ heap_prune_record_unchanged_lp_redirect(prstate, rootoffnum);
+ i++;
+ }
+ for (; i < nchain; i++)
+ heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -784,7 +771,7 @@ process_chain:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (int i = ndeadchain; i < nchain; i++)
- heap_prune_record_unchanged(prstate, chainitems[i]);
+ heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
}
}
@@ -894,9 +881,81 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_norm
prstate->ndeleted++;
}
-/* Record a line pointer that is left unchanged */
+/*
+ * Record an unused line pointer that is left unchanged.
+ */
+static void
+heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum)
+{
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+}
+
+/*
+ * Record LP_NORMAL line pointer that is left unchanged.
+ */
+static void
+heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum)
+{
+ HeapTupleHeader htup;
+
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+
+ switch (htsv[offnum])
+ {
+ case HEAPTUPLE_LIVE:
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+
+ /*
+ * If we wanted to optimize for aborts, we might consider marking
+ * the page prunable when we see INSERT_IN_PROGRESS. But we
+ * don't. See related decisions about when to mark the page
+ * prunable in heapam.c.
+ */
+ break;
+
+ case HEAPTUPLE_RECENTLY_DEAD:
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
+
+ htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+
+ /*
+ * This tuple may soon become DEAD. Update the hint field so that
+ * the page is reconsidered for pruning in future.
+ */
+ heap_prune_record_prunable(prstate,
+ HeapTupleHeaderGetUpdateXid(htup));
+ break;
+
+
+ default:
+
+ /*
+ * DEAD tuples should've been passed to heap_prune_record_dead()
+ * or heap_prune_record_unused() instead.
+ */
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d", htsv[offnum]);
+ break;
+ }
+}
+
+
+/*
+ * Record line pointer that was already LP_DEAD and is left unchanged.
+ */
+static void
+heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum)
+{
+ Assert(!prstate->processed[offnum]);
+ prstate->processed[offnum] = true;
+}
+
+/*
+ * Record LP_REDIRECT that is left unchanged.
+ */
static void
-heap_prune_record_unchanged(PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum)
{
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
--
2.39.2
v13-0002-Combine-freezing-and-pruning-steps-in-VACUUM.patchtext/x-patch; charset=UTF-8; name=v13-0002-Combine-freezing-and-pruning-steps-in-VACUUM.patchDownload
From 4b42ee8e20f7f7c85bb4878424f0a6261d4f0a15 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue, 2 Apr 2024 16:00:34 +0300
Subject: [PATCH v13 2/3] Combine freezing and pruning steps in VACUUM
Execute both freezing and pruning of tuples in the same
heap_page_prune() function, now called heap_page_prune_and_freeze(),
and emit a single WAL record containing all changes. That reduces the
overall amount of WAL generated.
This moves the freezing logic from vacuumlazy.c to the
heap_page_prune_and_freeze() function. The main difference in the
coding is that in vacuumlazy.c, we looked at the tuples after the
pruning had already happened, but in heap_page_prune_and_freeze() we
operate on the tuples before pruning. The heap_prepare_freeze_tuple()
function is now invoked after we have determined that a tuple is not
going to be pruned away.
VACUUM no longer needs to loop through the items on the page after
pruning. heap_page_prune_and_freeze() does all the work. It now
returns the list of dead offsets, including existing LP_DEAD items, to
the caller. Similarly it's now responsible for tracking 'all_visible',
'all_frozen', and 'hastup' on the caller's behalf.
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/20240330055710.kqg6ii2cdojsxgje@liskov
---
src/backend/access/heap/heapam.c | 67 +-
src/backend/access/heap/heapam_handler.c | 2 +-
src/backend/access/heap/pruneheap.c | 758 ++++++++++++++++++++---
src/backend/access/heap/vacuumlazy.c | 435 +++----------
src/include/access/heapam.h | 83 ++-
src/tools/pgindent/typedefs.list | 2 +-
6 files changed, 810 insertions(+), 537 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b661d9811eb..a9d5b109a5e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6447,9 +6447,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* XIDs or MultiXactIds that will need to be processed by a future VACUUM.
*
* VACUUM caller must assemble HeapTupleFreeze freeze plan entries for every
- * tuple that we returned true for, and call heap_freeze_execute_prepared to
- * execute freezing. Caller must initialize pagefrz fields for page as a
- * whole before first call here for each heap page.
+ * tuple that we returned true for, and then execute freezing. Caller must
+ * initialize pagefrz fields for page as a whole before first call here for
+ * each heap page.
*
* VACUUM caller decides on whether or not to freeze the page as a whole.
* We'll often prepare freeze plans for a page that caller just discards.
@@ -6765,35 +6765,19 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, HeapTupleFreeze *frz)
}
/*
- * heap_freeze_execute_prepared
- *
- * Executes freezing of one or more heap tuples on a page on behalf of caller.
- * Caller passes an array of tuple plans from heap_prepare_freeze_tuple.
- * Caller must set 'offset' in each plan for us. Note that we destructively
- * sort caller's tuples array in-place, so caller had better be done with it.
- *
- * WAL-logs the changes so that VACUUM can advance the rel's relfrozenxid
- * later on without any risk of unsafe pg_xact lookups, even following a hard
- * crash (or when querying from a standby). We represent freezing by setting
- * infomask bits in tuple headers, but this shouldn't be thought of as a hint.
- * See section on buffer access rules in src/backend/storage/buffer/README.
+ * Perform xmin/xmax XID status sanity checks before actually executing freeze
+ * plans.
+ *
+ * heap_prepare_freeze_tuple doesn't perform these checks directly because
+ * pg_xact lookups are relatively expensive. They shouldn't be repeated by
+ * successive VACUUMs that each decide against freezing the same page.
*/
void
-heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples)
+heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples)
{
Page page = BufferGetPage(buffer);
- Assert(ntuples > 0);
-
- /*
- * Perform xmin/xmax XID status sanity checks before critical section.
- *
- * heap_prepare_freeze_tuple doesn't perform these checks directly because
- * pg_xact lookups are relatively expensive. They shouldn't be repeated
- * by successive VACUUMs that each decide against freezing the same page.
- */
for (int i = 0; i < ntuples; i++)
{
HeapTupleFreeze *frz = tuples + i;
@@ -6832,8 +6816,19 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
xmax)));
}
}
+}
- START_CRIT_SECTION();
+/*
+ * Helper which executes freezing of one or more heap tuples on a page on
+ * behalf of caller. Caller passes an array of tuple plans from
+ * heap_prepare_freeze_tuple. Caller must set 'offset' in each plan for us.
+ * Must be called in a critical section that also marks the buffer dirty and,
+ * if needed, emits WAL.
+ */
+void
+heap_freeze_prepared_tuples(Buffer buffer, HeapTupleFreeze *tuples, int ntuples)
+{
+ Page page = BufferGetPage(buffer);
for (int i = 0; i < ntuples; i++)
{
@@ -6844,22 +6839,6 @@ heap_freeze_execute_prepared(Relation rel, Buffer buffer,
htup = (HeapTupleHeader) PageGetItem(page, itemid);
heap_execute_freeze_tuple(htup, frz);
}
-
- MarkBufferDirty(buffer);
-
- /* Now WAL-log freezing if necessary */
- if (RelationNeedsWAL(rel))
- {
- log_heap_prune_and_freeze(rel, buffer, snapshotConflictHorizon,
- false, /* no cleanup lock required */
- PRUNE_VACUUM_SCAN,
- tuples, ntuples,
- NULL, 0, /* redirected */
- NULL, 0, /* dead */
- NULL, 0); /* unused */
- }
-
- END_CRIT_SECTION();
}
/*
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c86000d245b..0952d4a98eb 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1122,7 +1122,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
* We ignore unused and redirect line pointers. DEAD line pointers
* should be counted as dead, because we need vacuum to run to get rid
* of them. Note that this rule agrees with the way that
- * heap_page_prune() counts things.
+ * heap_page_prune_and_freeze() counts things.
*/
if (!ItemIdIsNormal(itemid))
{
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1b5bf990d21..8ed44ba93dc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -17,32 +17,54 @@
#include "access/heapam.h"
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
+#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
+#include "commands/vacuum.h"
+#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
-/* Working data for heap_page_prune and subroutines */
+/* Working data for heap_page_prune_and_freeze() and subroutines */
typedef struct
{
+ /*-------------------------------------------------------
+ * Arguments passed to heap_page_and_freeze()
+ *-------------------------------------------------------
+ */
+
/* tuple visibility test, initialized for the relation */
GlobalVisState *vistest;
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
+ /* whether to attempt freezing tuples */
+ bool freeze;
+ struct VacuumCutoffs *cutoffs;
- TransactionId new_prune_xid; /* new prune hint value for page */
- TransactionId snapshotConflictHorizon; /* latest xid removed */
+ /*-------------------------------------------------------
+ * Fields describing what to do to the page
+ *-------------------------------------------------------
+ */
+ TransactionId new_prune_xid; /* new prune hint value */
+ TransactionId latest_xid_removed;
int nredirected; /* numbers of entries in arrays below */
int ndead;
int nunused;
+ int nfrozen;
/* arrays that accumulate indexes of items to be changed */
OffsetNumber redirected[MaxHeapTuplesPerPage * 2];
OffsetNumber nowdead[MaxHeapTuplesPerPage];
OffsetNumber nowunused[MaxHeapTuplesPerPage];
+ HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+
+ /*-------------------------------------------------------
+ * Working state for HOT chain processing
+ *-------------------------------------------------------
+ */
/*
* 'root_items' contains offsets of all LP_REDIRECT line pointers and
@@ -63,24 +85,92 @@ typedef struct
*/
bool processed[MaxHeapTuplesPerPage + 1];
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune_and_freeze() for
+ * details. This is of type int8[], instead of HTSV_Result[], so we can
+ * use -1 to indicate no visibility has been computed, e.g. for LP_DEAD
+ * items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
+
+ /*
+ * Freezing-related state.
+ */
+ HeapPageFreeze pagefrz;
+
+ /*-------------------------------------------------------
+ * Information about what was done
+ *
+ * These fields are not used by pruning itself for the most part, but are
+ * used to collect information about what was pruned and what state the
+ * page is in after pruning, for the benefit of the caller. They are
+ * copied to the caller's PruneFreezeResult at the end.
+ * -------------------------------------------------------
+ */
+
int ndeleted; /* Number of tuples deleted from the page */
+
+ /* Number of live and recently dead tuples, after pruning */
+ int live_tuples;
+ int recently_dead_tuples;
+
+ /* Whether or not the page makes rel truncation unsafe */
+ bool hastup;
+
+ /*
+ * LP_DEAD items on the page after pruning. Includes existing LP_DEAD
+ * items
+ */
+ int lpdead_items; /* number of items in the array */
+ OffsetNumber *deadoffsets; /* points directly to presult->deadoffsets */
+
+ /*
+ * all_visible and all_frozen indicate if the all-visible and all-frozen
+ * bits in the visibility map can be set for this page after pruning.
+ *
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page.
+ * The caller can use it as the conflict horizon, when setting the VM
+ * bits. It is only valid if we froze some tuples, and all_frozen is
+ * true.
+ *
+ * NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
+ * convenient for heap_page_prune_and_freeze(), to use them to decide
+ * whether to freeze the page or not. The all_visible and all_frozen
+ * values returned to the caller are adjusted to include LP_DEAD items at
+ * the end.
+ *
+ * all_frozen should only be considered valid if all_visible is also set;
+ * we don't bother to clear the all_frozen flag every time we clear the
+ * all_visible flag.
+ */
+ bool all_visible;
+ bool all_frozen;
+ TransactionId visibility_cutoff_xid;
} PruneState;
/* Local functions */
static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
HeapTuple tup,
Buffer buffer);
+static inline HTSV_Result htsv_get_valid_status(int status);
static void heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
- OffsetNumber rootoffnum, int8 *htsv, PruneState *prstate);
+ OffsetNumber rootoffnum, PruneState *prstate);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
- OffsetNumber offnum, OffsetNumber rdoffnum, bool was_normal);
-static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum, bool was_normal);
-static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
+ OffsetNumber offnum, OffsetNumber rdoffnum,
+ bool was_normal);
+static void heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
+ bool was_normal);
+static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
+ bool was_normal);
static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
static void heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum);
static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
@@ -163,15 +253,15 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
OffsetNumber dummy_off_loc;
- PruneResult presult;
+ PruneFreezeResult presult;
/*
* For now, pass mark_unused_now as false regardless of whether or
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune(relation, buffer, vistest, 0,
- &presult, PRUNE_ON_ACCESS, &dummy_off_loc);
+ heap_page_prune_and_freeze(relation, buffer, vistest, 0,
+ NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -205,13 +295,24 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
/*
- * Prune and repair fragmentation in the specified page.
+ * Prune and repair fragmentation and potentially freeze tuples on the
+ * specified page.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
+ * required in order to advance relfrozenxid / relminmxid, or if it's
+ * considered advantageous for overall system performance to do so now. The
+ * 'cutoffs', 'presult', 'new_refrozen_xid' and 'new_relmin_mxid' arguments
+ * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
+ * set presult->all_visible and presult->all_frozen on exit, to indicate if
+ * the VM bits can be set. They are always set to false when the
+ * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
+ * that also freeze need that information.
+ *
* vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
* (see heap_prune_satisfies_vacuum).
*
@@ -219,23 +320,39 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
* pruning.
*
- * presult contains output parameters needed by callers such as the number of
- * tuples removed and the number of line pointers newly marked LP_DEAD.
- * heap_page_prune() is responsible for initializing it.
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
+ * of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
+ *
+ * presult contains output parameters needed by callers, such as the number of
+ * tuples removed and the offsets of dead items on the page after pruning.
+ * heap_page_prune_and_freeze() is responsible for initializing it.
*
* reason indicates why the pruning is performed. It is included in the WAL
* record for debugging and analysis purposes, but otherwise has no effect.
*
* off_loc is the offset location required by the caller to use in error
* callback.
+ *
+ * new_relfrozen_xid and new_relmin_xid must provided by the caller if the
+ * HEAP_PRUNE_FREEZE option is set. On entry, they contain the oldest XID and
+ * multi-XID seen on the relation so far. They will be updated with oldest
+ * values present on the page after pruning. After processing the whole
+ * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
+ * for the relation.
*/
void
-heap_page_prune(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
- int options,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc)
+heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ GlobalVisState *vistest,
+ int options,
+ struct VacuumCutoffs *cutoffs,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid)
{
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
@@ -243,6 +360,17 @@ heap_page_prune(Relation relation, Buffer buffer,
maxoff;
PruneState prstate;
HeapTupleData tup;
+ bool do_freeze;
+ bool do_prune;
+ bool do_hint;
+ bool hint_bit_fpi;
+ int64 fpi_before = pgWalUsage.wal_fpi;
+
+ /* Copy parameters to prstate */
+ prstate.vistest = vistest;
+ prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
+ prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.cutoffs = cutoffs;
/*
* Our strategy is to scan the page and make lists of items to change,
@@ -256,36 +384,97 @@ heap_page_prune(Relation relation, Buffer buffer,
* initialize the rest of our working state.
*/
prstate.new_prune_xid = InvalidTransactionId;
- prstate.vistest = vistest;
- prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.snapshotConflictHorizon = InvalidTransactionId;
- prstate.nredirected = prstate.ndead = prstate.nunused = 0;
- prstate.ndeleted = 0;
+ prstate.latest_xid_removed = InvalidTransactionId;
+ prstate.nredirected = prstate.ndead = prstate.nunused = prstate.nfrozen = 0;
prstate.nroot_items = 0;
prstate.nheaponly_items = 0;
+ /* initialize page freezing working state */
+ prstate.pagefrz.freeze_required = false;
+ if (prstate.freeze)
+ {
+ prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
+ prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
+ prstate.pagefrz.NoFreezePageRelminMxid = *new_relmin_mxid;
+ }
+ else
+ {
+ Assert(new_relfrozen_xid == NULL && new_relmin_mxid == NULL);
+ prstate.pagefrz.FreezePageRelminMxid = InvalidMultiXactId;
+ prstate.pagefrz.NoFreezePageRelminMxid = InvalidMultiXactId;
+ prstate.pagefrz.FreezePageRelfrozenXid = InvalidTransactionId;
+ prstate.pagefrz.NoFreezePageRelfrozenXid = InvalidTransactionId;
+ }
+
+ prstate.ndeleted = 0;
+ prstate.live_tuples = 0;
+ prstate.recently_dead_tuples = 0;
+ prstate.hastup = false;
+ prstate.lpdead_items = 0;
+ prstate.deadoffsets = presult->deadoffsets;
+
/*
- * presult->htsv is not initialized here because all ntuple spots in the
- * array will be set either to a valid HTSV_Result value or -1.
+ * Caller may update the VM after we're done. We keep track of whether
+ * the page will be all_visible and all_frozen, once we're done with the
+ * pruning and freezing, to help the caller to do that.
+ *
+ * Currently, only VACUUM sets the VM bits. To save the effort, only do
+ * only the bookkeeping if the caller needs it. Currently, that's tied to
+ * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag, if you wanted
+ * to update the VM bits without also freezing, or freezing without
+ * setting the VM bits.
+ *
+ * In addition to telling the caller whether it can set the VM bit, we
+ * also use 'all_visible' and 'all_frozen' for our own decision-making. If
+ * the whole page will become frozen, we consider opportunistically
+ * freezing tuples. We will not be able to freeze the whole page if there
+ * are tuples present that are not visible to everyone or if there are
+ * dead tuples which are not yet removable. However, dead tuples which
+ * will be removed by the end of vacuuming should not preclude us from
+ * opportunistically freezing. Because of that, we do not clear
+ * all_visible when we see LP_DEAD items. We fix that at the end of the
+ * function, when we return the value to the caller, so that the caller
+ * doesn't set the VM bit incorrectly.
*/
- presult->ndeleted = 0;
- presult->nnewlpdead = 0;
+ if (prstate.freeze)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = true;
+ }
+ else
+ {
+ /*
+ * Initializing to false allows skipping the work to update them in
+ * heap_prune_record_unchanged_lp_normal().
+ */
+ prstate.all_visible = false;
+ prstate.all_frozen = false;
+ }
+
+ /*
+ * The visibility cutoff xid is the newest xmin of live tuples on the
+ * page. In the common case, this will be set as the conflict horizon the
+ * caller can use for updating the VM. If, at the end of freezing and
+ * pruning, the page is all-frozen, there is no possibility that any
+ * running transaction on the standby does not see tuples on the page as
+ * all-visible, so the conflict horizon remains InvalidTransactionId.
+ */
+ prstate.visibility_cutoff_xid = InvalidTransactionId;
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(relation);
/*
* Determine HTSV for all tuples, and queue them up for processing as HOT
- * chain roots or as a heap-only items.
+ * chain roots or as heap-only items.
*
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
* checked item causes GlobalVisTestIsRemovableFullXid() to update the
* horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts. VACUUM assumes that there are no normal DEAD
- * tuples left on the page after pruning, so it needs to have the same
- * understanding of what is DEAD and what is not.
+ * transaction aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -310,7 +499,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*off_loc = offnum;
prstate.processed[offnum] = false;
- presult->htsv[offnum] = -1;
+ prstate.htsv[offnum] = -1;
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsUsed(itemid))
@@ -349,8 +538,8 @@ heap_page_prune(Relation relation, Buffer buffer,
tup.t_len = ItemIdGetLength(itemid);
ItemPointerSet(&tup.t_self, blockno, offnum);
- presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
if (!HeapTupleHeaderIsHeapOnly(htup))
prstate.root_items[prstate.nroot_items++] = offnum;
@@ -358,6 +547,12 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.heaponly_items[prstate.nheaponly_items++] = offnum;
}
+ /*
+ * If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
+ * an FPI to be emitted.
+ */
+ hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+
/*
* Process HOT chains.
*
@@ -381,8 +576,7 @@ heap_page_prune(Relation relation, Buffer buffer,
*off_loc = offnum;
/* Process this item or chain of items */
- heap_prune_chain(page, blockno, maxoff,
- offnum, presult->htsv, &prstate);
+ heap_prune_chain(page, blockno, maxoff, offnum, &prstate);
}
/*
@@ -412,7 +606,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* return true for an XMIN_INVALID tuple, so this code will work even
* when there were sequential updates within the aborted transaction.)
*/
- if (presult->htsv[offnum] == HEAPTUPLE_DEAD)
+ if (prstate.htsv[offnum] == HEAPTUPLE_DEAD)
{
ItemId itemid = PageGetItemId(page, offnum);
HeapTupleHeader htup = (HeapTupleHeader) PageGetItem(page, itemid);
@@ -420,7 +614,7 @@ heap_page_prune(Relation relation, Buffer buffer,
if (likely(!HeapTupleHeaderIsHotUpdated(htup)))
{
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate.snapshotConflictHorizon);
+ &prstate.latest_xid_removed);
heap_prune_record_unused(&prstate, offnum, true);
}
else
@@ -438,7 +632,7 @@ heap_page_prune(Relation relation, Buffer buffer,
}
}
else
- heap_prune_record_unchanged_lp_normal(page, presult->htsv, &prstate, offnum);
+ heap_prune_record_unchanged_lp_normal(page, &prstate, offnum);
}
/* We should now have processed every tuple exactly once */
@@ -456,21 +650,107 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Clear the offset information once we have processed the given page. */
*off_loc = InvalidOffsetNumber;
- /* Any error while applying the changes is critical */
- START_CRIT_SECTION();
+ do_prune = prstate.nredirected > 0 ||
+ prstate.ndead > 0 ||
+ prstate.nunused > 0;
- /* Have we found any prunable items? */
- if (prstate.nredirected > 0 || prstate.ndead > 0 || prstate.nunused > 0)
+ /*
+ * Even if we don't prune anything, if we found a new value for the
+ * pd_prune_xid field or the page was marked full, we will update the hint
+ * bit.
+ */
+ do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ PageIsFull(page);
+
+ /*
+ * Decide if we want to go ahead with freezing according to the freeze
+ * plans we prepared, or not.
+ */
+ do_freeze = false;
+ if (prstate.freeze)
+ {
+ if (prstate.pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID
+ * from before FreezeLimit/MultiXactCutoff is present. Must
+ * freeze to advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page, if we are generating an FPI
+ * anyway, and if doing so means that we can set the page
+ * all-frozen afterwards (might not happen until VACUUM's final
+ * heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze
+ * and prune records are combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
+ {
+ /*
+ * Freezing would make the page all-frozen. Have already
+ * emitted an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (hint_bit_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
{
/*
- * Apply the planned item changes, then repair page fragmentation, and
- * update the page's hint bit about whether it has free line pointers.
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
*/
- heap_page_prune_execute(buffer, false,
- prstate.redirected, prstate.nredirected,
- prstate.nowdead, prstate.ndead,
- prstate.nowunused, prstate.nunused);
+ heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
+ }
+ else if (prstate.nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate.pagefrz.freeze_required);
+ prstate.all_frozen = false;
+ prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ /* Any error while applying the changes is critical */
+ START_CRIT_SECTION();
+
+ if (do_hint)
+ {
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
* XID of any soon-prunable tuple.
@@ -484,6 +764,29 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
PageClearFull(page);
+ /*
+ * If that's all we had to do to the page, this is a non-WAL-logged
+ * hint. If we will also freeze or prune the page, we will mark the
+ * buffer dirty below.
+ */
+ if (!do_freeze && !do_prune)
+ MarkBufferDirtyHint(buffer, true);
+ }
+
+ if (do_prune || do_freeze)
+ {
+ /* Apply the planned item changes and repair page fragmentation. */
+ if (do_prune)
+ {
+ heap_page_prune_execute(buffer, false,
+ prstate.redirected, prstate.nredirected,
+ prstate.nowdead, prstate.ndead,
+ prstate.nowunused, prstate.nunused);
+ }
+
+ if (do_freeze)
+ heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+
MarkBufferDirty(buffer);
/*
@@ -491,40 +794,115 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
+ /*
+ * The snapshotConflictHorizon for the whole record should be the
+ * most conservative of all the horizons calculated for any of the
+ * possible modifications. If this record will prune tuples, any
+ * transactions on the standby older than the youngest xmax of the
+ * most recently removed tuple this record will prune will
+ * conflict. If this record will freeze tuples, any transactions
+ * on the standby with xids older than the youngest tuple this
+ * record will freeze will conflict.
+ */
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
+ TransactionId conflict_xid;
+
+ /*
+ * We can use the visibility_cutoff_xid as our cutoff for
+ * conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (do_freeze)
+ {
+ if (prstate.all_visible && prstate.all_frozen)
+ frz_conflict_horizon = prstate.visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ frz_conflict_horizon = prstate.cutoffs->OldestXmin;
+ TransactionIdRetreat(frz_conflict_horizon);
+ }
+ }
+
+ if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
+ conflict_xid = frz_conflict_horizon;
+ else
+ conflict_xid = prstate.latest_xid_removed;
+
log_heap_prune_and_freeze(relation, buffer,
- prstate.snapshotConflictHorizon,
+ conflict_xid,
true, reason,
- NULL, 0,
+ prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
prstate.nowunused, prstate.nunused);
}
}
- else
- {
- /*
- * If we didn't prune anything, but have found a new value for the
- * pd_prune_xid field, update it and mark the buffer dirty. This is
- * treated as a non-WAL-logged hint.
- *
- * Also clear the "page is full" flag if it is set, since there's no
- * point in repeating the prune/defrag process until something else
- * happens to the page.
- */
- if (((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
- PageIsFull(page))
- {
- ((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
- PageClearFull(page);
- MarkBufferDirtyHint(buffer, true);
- }
- }
END_CRIT_SECTION();
/* Copy information back for caller */
- presult->nnewlpdead = prstate.ndead;
presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+
+ /*
+ * It was convenient to ignore LP_DEAD items in all_visible earlier on to
+ * make the choice of whether or not to freeze the page unaffected by the
+ * short-term presence of LP_DEAD items. These LP_DEAD items were
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which heap pass (initial pass or final pass) ends up setting the
+ * page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that freezing has been finalized, unset all_visible if there are
+ * any LP_DEAD items on the page. It needs to reflect the present state
+ * of things, as expected by our caller.
+ */
+ if (prstate.all_visible && prstate.lpdead_items == 0)
+ {
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
+ }
+ else
+ {
+ presult->all_visible = false;
+ presult->all_frozen = false;
+ }
+
+ presult->hastup = prstate.hastup;
+
+ /*
+ * For callers planning to update the visibility map, the conflict horizon
+ * for that record must be the newest xmin on the page. However, if the
+ * page is completely frozen, there can be no conflict and the
+ * vm_conflict_horizon should remain InvalidTransactionId. This includes
+ * the case that we just froze all the tuples; the prune-freeze record
+ * included the conflict XID already so the caller doesn't need it.
+ */
+ if (presult->all_frozen)
+ presult->vm_conflict_horizon = InvalidTransactionId;
+ else
+ presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
+
+ if (prstate.freeze)
+ {
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
+ }
}
@@ -549,10 +927,24 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
}
+/*
+ * Pruning calculates tuple visibility once and saves the results in an array
+ * of int8. See PruneState.htsv for details. This helper function is meant
+ * to guard against examining visibility status array members which have not
+ * yet been computed.
+ */
+static inline HTSV_Result
+htsv_get_valid_status(int status)
+{
+ Assert(status >= HEAPTUPLE_DEAD &&
+ status <= HEAPTUPLE_DELETE_IN_PROGRESS);
+ return (HTSV_Result) status;
+}
+
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
- * Tuple visibility information is provided in htsv.
+ * Tuple visibility information is provided in prstate->htsv.
*
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
@@ -572,11 +964,17 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* prstate showing the changes to be made. Items to be redirected are added
* to the redirected[] array (two entries per redirection); items to be set to
* LP_DEAD state are added to nowdead[]; and items to be set to LP_UNUSED
- * state are added to nowunused[].
+ * state are added to nowunused[]. We perform bookkeeping of live tuples,
+ * visibility etc. based on what the page will look like after the changes
+ * applied. All that bookkeeping is performed in the heap_prune_record_*()
+ * subroutines. The division of labor is that heap_prune_chain() decides the
+ * fate of each tuple, ie. whether it's going to be removed, redirected or
+ * left unchanged, and the heap_prune_record_*() subroutines update PruneState
+ * based on that outcome.
*/
static void
heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
- OffsetNumber rootoffnum, int8 *htsv, PruneState *prstate)
+ OffsetNumber rootoffnum, PruneState *prstate)
{
TransactionId priorXmax = InvalidTransactionId;
ItemId rootlp;
@@ -656,15 +1054,14 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
*/
chainitems[nchain++] = offnum;
- switch (htsv_get_valid_status(htsv[offnum]))
+ switch (htsv_get_valid_status(prstate->htsv[offnum]))
{
case HEAPTUPLE_DEAD:
/* Remember the last DEAD tuple seen */
ndeadchain = nchain;
HeapTupleHeaderAdvanceConflictHorizon(htup,
- &prstate->snapshotConflictHorizon);
-
+ &prstate->latest_xid_removed);
/* Advance to next chain member */
break;
@@ -720,10 +1117,11 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
{
/*
* We found a redirect item that doesn't point to a valid follow-on
- * item. This can happen if the loop in heap_page_prune caused us to
- * visit the dead successor of a redirect item before visiting the
- * redirect item. We can clean up by setting the redirect item to
- * LP_DEAD state or LP_UNUSED if the caller indicated.
+ * item. This can happen if the loop in heap_page_prune_and_freeze()
+ * caused us to visit the dead successor of a redirect item before
+ * visiting the redirect item. We can clean up by setting the
+ * redirect item to LP_DEAD state or LP_UNUSED if the caller
+ * indicated.
*/
heap_prune_record_dead_or_unused(prstate, rootoffnum, false);
return;
@@ -745,7 +1143,7 @@ process_chain:
i++;
}
for (; i < nchain; i++)
- heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
+ heap_prune_record_unchanged_lp_normal(page, prstate, chainitems[i]);
}
else if (ndeadchain == nchain)
{
@@ -771,7 +1169,7 @@ process_chain:
/* the rest of tuples in the chain are normal, unchanged tuples */
for (int i = ndeadchain; i < nchain; i++)
- heap_prune_record_unchanged_lp_normal(page, htsv, prstate, chainitems[i]);
+ heap_prune_record_unchanged_lp_normal(page, prstate, chainitems[i]);
}
}
@@ -816,6 +1214,8 @@ heap_prune_record_redirect(PruneState *prstate,
*/
if (was_normal)
prstate->ndeleted++;
+
+ prstate->hastup = true;
}
/* Record line pointer to be marked dead */
@@ -830,6 +1230,21 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
prstate->nowdead[prstate->ndead] = offnum;
prstate->ndead++;
+ /*
+ * Deliberately delay unsetting all_visible until later during pruning.
+ * Removable dead tuples shouldn't preclude freezing the page. After
+ * finishing this first pass of tuple visibility checks, initialize
+ * all_visible_except_removable with the current value of all_visible to
+ * indicate whether or not the page is all visible except for dead tuples.
+ * This will allow us to attempt to freeze the page after pruning. Later
+ * during pruning, if we encounter an LP_DEAD item or are setting an item
+ * LP_DEAD, we will unset all_visible. As long as we unset it before
+ * updating the visibility map, this will be correct.
+ */
+
+ /* Record the dead offset for vacuum */
+ prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
/*
* If the root entry had been a normal tuple, we are deleting it, so count
* it in the result. But changing a redirect (even to DEAD state) doesn't
@@ -892,21 +1307,121 @@ heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumb
}
/*
- * Record LP_NORMAL line pointer that is left unchanged.
+ * Record line pointer that is left unchanged. We consider freezing it, and
+ * update bookkeeping of tuple counts and page visibility.
*/
static void
-heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumber offnum)
{
HeapTupleHeader htup;
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
- switch (htsv[offnum])
+ prstate->hastup = true; /* the page is not empty */
+
+ /*
+ * The criteria for counting a tuple as live in this block need to match
+ * what analyze.c's acquire_sample_rows() does, otherwise VACUUM and
+ * ANALYZE may produce wildly different reltuples values, e.g. when there
+ * are many recently-dead tuples.
+ *
+ * The logic here is a bit simpler than acquire_sample_rows(), as VACUUM
+ * can't run inside a transaction block, which makes some cases impossible
+ * (e.g. in-progress insert from the same transaction).
+ *
+ * HEAPTUPLE_DEAD are handled by the other heap_prune_record_*()
+ * subroutines. They don't count dead items like acquire_sample_rows()
+ * does, because we assume that all dead items will become LP_UNUSED
+ * before VACUUM finishes. This difference is only superficial. VACUUM
+ * effectively agrees with ANALYZE about DEAD items, in the end. VACUUM
+ * won't remember LP_DEAD items, but only because they're not supposed to
+ * be left behind when it is done. (Cases where we bypass index vacuuming
+ * will violate this optimistic assumption, but the overall impact of that
+ * should be negligible.)
+ */
+ htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+
+ switch (prstate->htsv[offnum])
{
case HEAPTUPLE_LIVE:
+
+ /*
+ * Count it as live. Not only is this natural, but it's also what
+ * acquire_sample_rows() does.
+ */
+ prstate->live_tuples++;
+
+ /*
+ * Is the tuple definitely visible to all transactions?
+ *
+ * NB: Like with per-tuple hint bits, we can't set the
+ * PD_ALL_VISIBLE flag if the inserter committed asynchronously.
+ * See SetHintBits for more info. Check that the tuple is hinted
+ * xmin-committed because of that.
+ */
+ if (prstate->all_visible)
+ {
+ TransactionId xmin;
+
+ if (!HeapTupleHeaderXminCommitted(htup))
+ {
+ prstate->all_visible = false;
+ break;
+ }
+
+ /*
+ * The inserter definitely committed. But is it old enough
+ * that everyone sees it as committed? A FrozenTransactionId
+ * is seen as committed to everyone. Otherwise, we check if
+ * there is a snapshot that considers this xid to still be
+ * running, and if so, we don't consider the page all-visible.
+ */
+ xmin = HeapTupleHeaderGetXmin(htup);
+
+ /*
+ * For now always use prstate->cutoffs for this test, because
+ * we only update 'all_visible' when freezing is requested. We
+ * could use GlobalVisTestIsRemovableXid instead, if a
+ * non-freezing caller wanted to set the VM bit.
+ */
+ Assert(prstate->cutoffs);
+ if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
+ {
+ prstate->all_visible = false;
+ break;
+ }
+
+ /* Track newest xmin on page. */
+ if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
+ TransactionIdIsNormal(xmin))
+ prstate->visibility_cutoff_xid = xmin;
+ }
+ break;
+
+ case HEAPTUPLE_RECENTLY_DEAD:
+ prstate->recently_dead_tuples++;
+ prstate->all_visible = false;
+
+ /*
+ * This tuple will soon become DEAD. Update the hint field so
+ * that the page is reconsidered for pruning in future.
+ */
+ heap_prune_record_prunable(prstate,
+ HeapTupleHeaderGetUpdateXid(htup));
+ break;
+
case HEAPTUPLE_INSERT_IN_PROGRESS:
+ /*
+ * We do not count these rows as live, because we expect the
+ * inserting transaction to update the counters at commit, and we
+ * assume that will happen only after we report our results. This
+ * assumption is a bit shaky, but it is what acquire_sample_rows()
+ * does, so be consistent.
+ */
+ prstate->all_visible = false;
+
/*
* If we wanted to optimize for aborts, we might consider marking
* the page prunable when we see INSERT_IN_PROGRESS. But we
@@ -915,10 +1430,15 @@ heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate
*/
break;
- case HEAPTUPLE_RECENTLY_DEAD:
case HEAPTUPLE_DELETE_IN_PROGRESS:
- htup = (HeapTupleHeader) PageGetItem(page, PageGetItemId(page, offnum));
+ /*
+ * This an expected case during concurrent vacuum. Count such
+ * rows as live. As above, we assume the deleting transaction
+ * will commit and update the counters after we report.
+ */
+ prstate->live_tuples++;
+ prstate->all_visible = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
@@ -928,16 +1448,40 @@ heap_prune_record_unchanged_lp_normal(Page page, int8 *htsv, PruneState *prstate
HeapTupleHeaderGetUpdateXid(htup));
break;
-
default:
/*
* DEAD tuples should've been passed to heap_prune_record_dead()
* or heap_prune_record_unused() instead.
*/
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d", htsv[offnum]);
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result %d",
+ prstate->htsv[offnum]);
break;
}
+
+ /* Consider freezing any normal tuples which will not be removed */
+ if (prstate->freeze)
+ {
+ bool totally_frozen;
+
+ if ((heap_prepare_freeze_tuple(htup,
+ prstate->cutoffs,
+ &prstate->pagefrz,
+ &prstate->frozen[prstate->nfrozen],
+ &totally_frozen)))
+ {
+ /* Save prepared freeze plan for later */
+ prstate->frozen[prstate->nfrozen++].offset = offnum;
+ }
+
+ /*
+ * If any tuple isn't either totally frozen already or eligible to
+ * become totally frozen (according to its freeze plan), then the page
+ * definitely cannot be set all-frozen in the visibility map later on.
+ */
+ if (!totally_frozen)
+ prstate->all_frozen = false;
+ }
}
@@ -949,6 +1493,24 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
{
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
+
+ /*
+ * Deliberately don't set hastup for LP_DEAD items. We make the soft
+ * assumption that any LP_DEAD items encountered here will become
+ * LP_UNUSED later on, before count_nondeletable_pages is reached. If we
+ * don't make this assumption then rel truncation will only happen every
+ * other VACUUM, at most. Besides, VACUUM must treat
+ * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
+ * handled (handled here, or handled later on).
+ *
+ * Similarly, don't unset all_visible until later, at the end of
+ * heap_page_prune_and_freeze(). This will allow us to attempt to freeze
+ * the page after pruning. As long as we unset it before updating the
+ * visibility map, this will be correct.
+ */
+
+ /* Record the dead offset for vacuum */
+ prstate->deadoffsets[prstate->lpdead_items++] = offnum;
}
/*
@@ -957,12 +1519,20 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
static void
heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum)
{
+ /*
+ * A redirect line pointer doesn't count as a live tuple.
+ *
+ * If we leave a redirect line pointer in place, there will be another
+ * tuple on the page that it points to. We will do the bookkeeping for
+ * that separately. So we have nothing to do here, except remember that
+ * we processed this item.
+ */
Assert(!prstate->processed[offnum]);
prstate->processed[offnum] = true;
}
/*
- * Perform the actual page changes needed by heap_page_prune.
+ * Perform the actual page changes needed by heap_page_prune_and_freeze().
*
* If 'lp_truncate_only' is set, we are merely marking LP_DEAD line pointers
* as unused, not redirecting or removing anything else. The
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5fb8f7727b3..c3a9dc1ad6d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -46,6 +46,7 @@
#include "commands/dbcommands.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "common/int.h"
#include "executor/instrument.h"
#include "miscadmin.h"
#include "pgstat.h"
@@ -439,12 +440,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* as an upper bound on the XIDs stored in the pages we'll actually scan
* (NewRelfrozenXid tracking must never be allowed to miss unfrozen XIDs).
*
- * Next acquire vistest, a related cutoff that's used in heap_page_prune.
- * We expect vistest will always make heap_page_prune remove any deleted
- * tuple whose xmax is < OldestXmin. lazy_scan_prune must never become
- * confused about whether a tuple should be frozen or removed. (In the
- * future we might want to teach lazy_scan_prune to recompute vistest from
- * time to time, to increase the number of dead tuples it can prune away.)
+ * Next acquire vistest, a related cutoff that's used in pruning. We
+ * expect vistest will always make heap_page_prune_and_freeze() remove any
+ * deleted tuple whose xmax is < OldestXmin. lazy_scan_prune must never
+ * become confused about whether a tuple should be frozen or removed. (In
+ * the future we might want to teach lazy_scan_prune to recompute vistest
+ * from time to time, to increase the number of dead tuples it can prune
+ * away.)
*/
vacrel->aggressive = vacuum_get_cutoffs(rel, params, &vacrel->cutoffs);
vacrel->rel_pages = orig_rel_pages = RelationGetNumberOfBlocks(rel);
@@ -1382,27 +1384,18 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
return false;
}
+/* qsort comparator for sorting OffsetNumbers */
+static int
+cmpOffsetNumbers(const void *a, const void *b)
+{
+ return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
- * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
- * whether or not a tuple should be considered DEAD. This happened when an
- * inserting transaction concurrently aborted (after our heap_page_prune()
- * call, before our HeapTupleSatisfiesVacuum() call). There was rather a lot
- * of complexity just so we could deal with tuples that were DEAD to VACUUM,
- * but nevertheless were left with storage after pruning.
- *
- * As of Postgres 17, we circumvent this problem altogether by reusing the
- * result of heap_page_prune()'s visibility check. Without the second call to
- * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
- * disagreement. We'll just handle such tuples as if they had become fully dead
- * right after this operation completes instead of in the middle of it. Note that
- * any tuple that becomes dead after the call to heap_page_prune() can't need to
- * be frozen, because it was visible to another session when vacuum started.
- *
* vmbuffer is the buffer containing the VM block with visibility information
* for the heap block, blkno. all_visible_according_to_vm is the saved
* visibility status of the heap block looked up earlier by the caller. We
@@ -1421,330 +1414,46 @@ lazy_scan_prune(LVRelState *vacrel,
bool *has_lpdead_items)
{
Relation rel = vacrel->rel;
- OffsetNumber offnum,
- maxoff;
- ItemId itemid;
- PruneResult presult;
- int tuples_frozen,
- lpdead_items,
- live_tuples,
- recently_dead_tuples;
- HeapPageFreeze pagefrz;
- bool hastup = false;
- bool all_visible,
- all_frozen;
- TransactionId visibility_cutoff_xid;
+ PruneFreezeResult presult;
int prune_options = 0;
- int64 fpi_before = pgWalUsage.wal_fpi;
- OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
- HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
Assert(BufferGetBlockNumber(buf) == blkno);
/*
- * maxoff might be reduced following line pointer array truncation in
- * heap_page_prune. That's safe for us to ignore, since the reclaimed
- * space will continue to look like LP_UNUSED items below.
- */
- maxoff = PageGetMaxOffsetNumber(page);
-
- /* Initialize (or reset) page-level state */
- pagefrz.freeze_required = false;
- pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
- pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
- pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
- tuples_frozen = 0;
- lpdead_items = 0;
- live_tuples = 0;
- recently_dead_tuples = 0;
-
- /*
- * Prune all HOT-update chains in this page.
- *
- * We count the number of tuples removed from the page by the pruning step
- * in presult.ndeleted. It should not be confused with lpdead_items;
- * lpdead_items's final value can be thought of as the number of tuples
- * that were deleted from indexes.
+ * Prune all HOT-update chains and potentially freeze tuples on this page.
*
* If the relation has no indexes, we can immediately mark would-be dead
* items LP_UNUSED.
- */
- prune_options = 0;
- if (vacrel->nindexes == 0)
- prune_options = HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune(rel, buf, vacrel->vistest, prune_options,
- &presult, PRUNE_VACUUM_SCAN, &vacrel->offnum);
-
- /*
- * We will update the VM after collecting LP_DEAD items and freezing
- * tuples. Keep track of whether or not the page is all_visible and
- * all_frozen and use this information to update the VM. all_visible
- * implies 0 lpdead_items, but don't trust all_frozen result unless
- * all_visible is also set to true.
*
- * Also keep track of the visibility cutoff xid for recovery conflicts.
- */
- all_visible = true;
- all_frozen = true;
- visibility_cutoff_xid = InvalidTransactionId;
-
- /*
- * Now scan the page to collect LP_DEAD items and update the variables set
- * just above.
- */
- for (offnum = FirstOffsetNumber;
- offnum <= maxoff;
- offnum = OffsetNumberNext(offnum))
- {
- HeapTupleHeader htup;
- bool totally_frozen;
-
- /*
- * Set the offset number so that we can display it along with any
- * error that occurred while processing this tuple.
- */
- vacrel->offnum = offnum;
- itemid = PageGetItemId(page, offnum);
-
- if (!ItemIdIsUsed(itemid))
- continue;
-
- /* Redirect items mustn't be touched */
- if (ItemIdIsRedirected(itemid))
- {
- /* page makes rel truncation unsafe */
- hastup = true;
- continue;
- }
-
- if (ItemIdIsDead(itemid))
- {
- /*
- * Deliberately don't set hastup for LP_DEAD items. We make the
- * soft assumption that any LP_DEAD items encountered here will
- * become LP_UNUSED later on, before count_nondeletable_pages is
- * reached. If we don't make this assumption then rel truncation
- * will only happen every other VACUUM, at most. Besides, VACUUM
- * must treat hastup/nonempty_pages as provisional no matter how
- * LP_DEAD items are handled (handled here, or handled later on).
- *
- * Also deliberately delay unsetting all_visible until just before
- * we return to lazy_scan_heap caller, as explained in full below.
- * (This is another case where it's useful to anticipate that any
- * LP_DEAD items will become LP_UNUSED during the ongoing VACUUM.)
- */
- deadoffsets[lpdead_items++] = offnum;
- continue;
- }
-
- Assert(ItemIdIsNormal(itemid));
-
- htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
- /*
- * The criteria for counting a tuple as live in this block need to
- * match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
- * and ANALYZE may produce wildly different reltuples values, e.g.
- * when there are many recently-dead tuples.
- *
- * The logic here is a bit simpler than acquire_sample_rows(), as
- * VACUUM can't run inside a transaction block, which makes some cases
- * impossible (e.g. in-progress insert from the same transaction).
- *
- * We treat LP_DEAD items (which are the closest thing to DEAD tuples
- * that might be seen here) differently, too: we assume that they'll
- * become LP_UNUSED before VACUUM finishes. This difference is only
- * superficial. VACUUM effectively agrees with ANALYZE about DEAD
- * items, in the end. VACUUM won't remember LP_DEAD items, but only
- * because they're not supposed to be left behind when it is done.
- * (Cases where we bypass index vacuuming will violate this optimistic
- * assumption, but the overall impact of that should be negligible.)
- */
- switch (htsv_get_valid_status(presult.htsv[offnum]))
- {
- case HEAPTUPLE_LIVE:
-
- /*
- * Count it as live. Not only is this natural, but it's also
- * what acquire_sample_rows() does.
- */
- live_tuples++;
-
- /*
- * Is the tuple definitely visible to all transactions?
- *
- * NB: Like with per-tuple hint bits, we can't set the
- * PD_ALL_VISIBLE flag if the inserter committed
- * asynchronously. See SetHintBits for more info. Check that
- * the tuple is hinted xmin-committed because of that.
- */
- if (all_visible)
- {
- TransactionId xmin;
-
- if (!HeapTupleHeaderXminCommitted(htup))
- {
- all_visible = false;
- break;
- }
-
- /*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
- */
- xmin = HeapTupleHeaderGetXmin(htup);
- if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
- {
- all_visible = false;
- break;
- }
-
- /* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, visibility_cutoff_xid) &&
- TransactionIdIsNormal(xmin))
- visibility_cutoff_xid = xmin;
- }
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
-
- /*
- * If tuple is recently dead then we must not remove it from
- * the relation. (We only remove items that are LP_DEAD from
- * pruning.)
- */
- recently_dead_tuples++;
- all_visible = false;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
-
- /*
- * We do not count these rows as live, because we expect the
- * inserting transaction to update the counters at commit, and
- * we assume that will happen only after we report our
- * results. This assumption is a bit shaky, but it is what
- * acquire_sample_rows() does, so be consistent.
- */
- all_visible = false;
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
- /* This is an expected case during concurrent vacuum */
- all_visible = false;
-
- /*
- * Count such rows as live. As above, we assume the deleting
- * transaction will commit and update the counters after we
- * report.
- */
- live_tuples++;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- break;
- }
-
- hastup = true; /* page makes rel truncation unsafe */
-
- /* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
- &frozen[tuples_frozen], &totally_frozen))
- {
- /* Save prepared freeze plan for later */
- frozen[tuples_frozen++].offset = offnum;
- }
-
- /*
- * If any tuple isn't either totally frozen already or eligible to
- * become totally frozen (according to its freeze plan), then the page
- * definitely cannot be set all-frozen in the visibility map later on
- */
- if (!totally_frozen)
- all_frozen = false;
- }
-
- /*
- * We have now divided every item on the page into either an LP_DEAD item
- * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
- * that remains and needs to be considered for freezing now (LP_UNUSED and
- * LP_REDIRECT items also remain, but are of no further interest to us).
- */
- vacrel->offnum = InvalidOffsetNumber;
-
- /*
- * Freeze the page when heap_prepare_freeze_tuple indicates that at least
- * one XID/MXID from before FreezeLimit/MultiXactCutoff is present. Also
- * freeze when pruning generated an FPI, if doing so means that we set the
- * page all-frozen afterwards (might not happen until final heap pass).
+ * The number of tuples removed from the page is returned in
+ * presult.ndeleted. It should not be confused with presult.lpdead_items;
+ * presult.lpdead_items's final value can be thought of as the number of
+ * tuples that were deleted from indexes.
+ *
+ * We will update the VM after collecting LP_DEAD items and freezing
+ * tuples. Pruning will have determined whether or not the page is
+ * all-visible.
*/
- if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (all_visible && all_frozen &&
- fpi_before != pgWalUsage.wal_fpi))
- {
- /*
- * We're freezing the page. Our final NewRelfrozenXid doesn't need to
- * be affected by the XIDs that are just about to be frozen anyway.
- */
- vacrel->NewRelfrozenXid = pagefrz.FreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.FreezePageRelminMxid;
-
- if (tuples_frozen == 0)
- {
- /*
- * We have no freeze plans to execute, so there's no added cost
- * from following the freeze path. That's why it was chosen. This
- * is important in the case where the page only contains totally
- * frozen tuples at this point (perhaps only following pruning).
- * Such pages can be marked all-frozen in the VM by our caller,
- * even though none of its tuples were newly frozen here (note
- * that the "no freeze" path never sets pages all-frozen).
- *
- * We never increment the frozen_pages instrumentation counter
- * here, since it only counts pages with newly frozen tuples
- * (don't confuse that with pages newly set all-frozen in VM).
- */
- }
- else
- {
- TransactionId snapshotConflictHorizon;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ if (vacrel->nindexes == 0)
+ prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- vacrel->frozen_pages++;
+ heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
+ &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ &vacrel->offnum,
+ &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
- /*
- * We can use visibility_cutoff_xid as our cutoff for conflicts
- * when the whole page is eligible to become all-frozen in the VM
- * once we're done with it. Otherwise we generate a conservative
- * cutoff by stepping back from OldestXmin.
- */
- if (all_visible && all_frozen)
- {
- /* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = visibility_cutoff_xid;
- visibility_cutoff_xid = InvalidTransactionId;
- }
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- snapshotConflictHorizon = vacrel->cutoffs.OldestXmin;
- TransactionIdRetreat(snapshotConflictHorizon);
- }
+ Assert(MultiXactIdIsValid(vacrel->NewRelminMxid));
+ Assert(TransactionIdIsValid(vacrel->NewRelfrozenXid));
- /* Execute all freeze plans for page as a single atomic action */
- heap_freeze_execute_prepared(vacrel->rel, buf,
- snapshotConflictHorizon,
- frozen, tuples_frozen);
- }
- }
- else
+ if (presult.nfrozen > 0)
{
/*
- * Page requires "no freeze" processing. It might be set all-visible
- * in the visibility map, but it can never be set all-frozen.
+ * We don't increment the frozen_pages instrumentation counter when
+ * nfrozen == 0, since it only counts pages with newly frozen tuples
+ * (don't confuse that with pages newly set all-frozen in VM).
*/
- vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
- vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- all_frozen = false;
- tuples_frozen = 0; /* avoid miscounts in instrumentation */
+ vacrel->frozen_pages++;
}
/*
@@ -1756,71 +1465,71 @@ lazy_scan_prune(LVRelState *vacrel,
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (all_visible && lpdead_items == 0)
+ if (presult.all_visible)
{
TransactionId debug_cutoff;
bool debug_all_frozen;
+ Assert(presult.lpdead_items == 0);
+
if (!heap_page_is_all_visible(vacrel, buf,
&debug_cutoff, &debug_all_frozen))
Assert(false);
+ Assert(presult.all_frozen == debug_all_frozen);
+
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == visibility_cutoff_xid);
+ debug_cutoff == presult.vm_conflict_horizon);
}
#endif
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
- if (lpdead_items > 0)
+ if (presult.lpdead_items > 0)
{
vacrel->lpdead_item_pages++;
- dead_items_add(vacrel, blkno, deadoffsets, lpdead_items);
-
/*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on
- * to make the choice of whether or not to freeze the page unaffected
- * by the short-term presence of LP_DEAD items. These LP_DEAD items
- * were effectively assumed to be LP_UNUSED items in the making. It
- * doesn't matter which heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible. It needs
- * to reflect the present state of things, as expected by our caller.
+ * deadoffsets are collected incrementally in
+ * heap_page_prune_and_freeze() as each dead line pointer is recorded,
+ * with an indeterminate order, but dead_items_add requires them to be
+ * sorted.
*/
- all_visible = false;
+ qsort(presult.deadoffsets, presult.lpdead_items, sizeof(OffsetNumber),
+ cmpOffsetNumbers);
+
+ dead_items_add(vacrel, blkno, presult.deadoffsets, presult.lpdead_items);
}
/* Finally, add page-local counts to whole-VACUUM counts */
vacrel->tuples_deleted += presult.ndeleted;
- vacrel->tuples_frozen += tuples_frozen;
- vacrel->lpdead_items += lpdead_items;
- vacrel->live_tuples += live_tuples;
- vacrel->recently_dead_tuples += recently_dead_tuples;
+ vacrel->tuples_frozen += presult.nfrozen;
+ vacrel->lpdead_items += presult.lpdead_items;
+ vacrel->live_tuples += presult.live_tuples;
+ vacrel->recently_dead_tuples += presult.recently_dead_tuples;
/* Can't truncate this page */
- if (hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
/* Did we find LP_DEAD items? */
- *has_lpdead_items = (lpdead_items > 0);
+ *has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_visible || !(*has_lpdead_items));
/*
* Handle setting visibility map bit based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (all_frozen)
+ if (presult.all_frozen)
{
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1840,7 +1549,7 @@ lazy_scan_prune(LVRelState *vacrel,
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
+ vmbuffer, presult.vm_conflict_horizon,
flags);
}
@@ -1873,7 +1582,7 @@ lazy_scan_prune(LVRelState *vacrel,
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
* however.
*/
- else if (lpdead_items > 0 && PageIsAllVisible(page))
+ else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
@@ -1888,8 +1597,8 @@ lazy_scan_prune(LVRelState *vacrel,
* it as all-frozen. Note that all_frozen is only valid if all_visible is
* true, so we must check both all_visible and all_frozen.
*/
- else if (all_visible_according_to_vm && all_visible &&
- all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if (all_visible_according_to_vm && presult.all_visible &&
+ presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
@@ -1905,11 +1614,11 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* Set the page all-frozen (and all-visible) in the VM.
*
- * We can pass InvalidTransactionId as our visibility_cutoff_xid,
- * since a snapshotConflictHorizon sufficient to make everything safe
- * for REDO was logged when the page's tuples were frozen.
+ * We can pass InvalidTransactionId as our cutoff_xid, since a
+ * snapshotConflictHorizon sufficient to make everything safe for REDO
+ * was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b632fe953c4..536711d98e0 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -36,8 +36,9 @@
#define HEAP_INSERT_NO_LOGICAL TABLE_INSERT_NO_LOGICAL
#define HEAP_INSERT_SPECULATIVE 0x0010
-/* "options" flag bits for heap_page_prune */
+/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
+#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -195,24 +196,47 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
/*
- * Per-page state returned from pruning
+ * Per-page state returned by heap_page_prune_and_freeze()
*/
-typedef struct PruneResult
+typedef struct PruneFreezeResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+ int nfrozen; /* Number of tuples we froze */
+
+ /* Number of live and recently dead tuples on the page, after pruning */
+ int live_tuples;
+ int recently_dead_tuples;
/*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ * all_visible and all_frozen indicate if the all-visible and all-frozen
+ * bits in the visibility map can be set for this page, after pruning.
+ *
+ * vm_conflict_horizon is the newest xmin of live tuples on the page. The
+ * caller can use it as the conflict horizon when setting the VM bits. It
+ * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
+ * true.
*
- * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
- * 1. Otherwise every access would need to subtract 1.
+ * These are only set if the HEAP_PRUNE_FREEZE option is set.
*/
- int8 htsv[MaxHeapTuplesPerPage + 1];
-} PruneResult;
+ bool all_visible;
+ bool all_frozen;
+ TransactionId vm_conflict_horizon;
+
+ /*
+ * Whether or not the page makes rel truncation unsafe. This is set to
+ * 'true', even if the page contains LP_DEAD items. VACUUM will remove
+ * them before attempting to truncate.
+ */
+ bool hastup;
+
+ /*
+ * LP_DEAD items on the page after pruning. Includes existing LP_DEAD
+ * items.
+ */
+ int lpdead_items;
+ OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
+} PruneFreezeResult;
/* 'reason' codes for heap_page_prune() */
typedef enum
@@ -222,20 +246,6 @@ typedef enum
PRUNE_VACUUM_CLEANUP, /* VACUUM 2nd heap pass */
} PruneReason;
-/*
- * Pruning calculates tuple visibility once and saves the results in an array
- * of int8. See PruneResult.htsv for details. This helper function is meant to
- * guard against examining visibility status array members which have not yet
- * been computed.
- */
-static inline HTSV_Result
-htsv_get_valid_status(int status)
-{
- Assert(status >= HEAPTUPLE_DEAD &&
- status <= HEAPTUPLE_DELETE_IN_PROGRESS);
- return (HTSV_Result) status;
-}
-
/* ----------------
* function prototypes for heap access method
*
@@ -309,9 +319,11 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
const struct VacuumCutoffs *cutoffs,
HeapPageFreeze *pagefrz,
HeapTupleFreeze *frz, bool *totally_frozen);
-extern void heap_freeze_execute_prepared(Relation rel, Buffer buffer,
- TransactionId snapshotConflictHorizon,
- HeapTupleFreeze *tuples, int ntuples);
+
+extern void heap_pre_freeze_checks(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
+extern void heap_freeze_prepared_tuples(Buffer buffer,
+ HeapTupleFreeze *tuples, int ntuples);
extern bool heap_freeze_tuple(HeapTupleHeader tuple,
TransactionId relfrozenxid, TransactionId relminmxid,
TransactionId FreezeLimit, TransactionId MultiXactCutoff);
@@ -332,12 +344,15 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern void heap_page_prune(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
- int options,
- PruneResult *presult,
- PruneReason reason,
- OffsetNumber *off_loc);
+extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ struct GlobalVisState *vistest,
+ int options,
+ struct VacuumCutoffs *cutoffs,
+ PruneFreezeResult *presult,
+ PruneReason reason,
+ OffsetNumber *off_loc,
+ TransactionId *new_relfrozen_xid,
+ MultiXactId *new_relmin_mxid);
extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2b01a3081e3..fa1ede5fe7a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2195,7 +2195,7 @@ PromptInterruptContext
ProtocolVersion
PrsStorage
PruneReason
-PruneResult
+PruneFreezeResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.39.2
On Wed, Apr 3, 2024 at 8:39 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 02/04/2024 16:11, Heikki Linnakangas wrote:
On 01/04/2024 20:22, Melanie Plageman wrote:
Review for 0003-0006 (I didn't have any new thoughts on 0002). I know
you didn't modify them much/at all, but I noticed some things in my code
that could be better.Ok, here's what I have now. I made a lot of small comment changes here
and there, and some minor local refactorings, but nothing major. I lost
track of all the individual changes I'm afraid, so I'm afraid you'll
have to just diff against the previous version if you want to see what's
changed. I hope I didn't break anything.I'm pretty happy with this now. I will skim through it one more time
later today or tomorrow, and commit. Please review once more if you have
a chance.This probably doesn't belong here. I noticed spgdoinsert.c had a static
function for sorting OffsetNumbers, but I didn't see anything general
purpose anywhere else.I copied the spgdoinsert.c implementation to vacuumlazy.c as is. Would
be nice to have just one copy of this in some common place, but I also
wasn't sure where to put it.One more version, with two small fixes:
1. I fumbled the offsetnumber-cmp function at the last minute so that it
didn't compile. Fixed. that
I noticed you didn't make the comment updates I suggested in my
version 13 here [1]/messages/by-id/CAAKRu_aPqZkThyfr0USaHp-3cN_ruEdAHBKtNQJqXDTjWUz0rw@mail.gmail.com. A few of them are outdated references to
heap_page_prune() and one to a now deleted variable name
(all_visible_except_removable).
I applied them to your v13 and attached the diff.
Off-list, Melanie reported that there is a small regression with the
benchmark script she posted yesterday, after all, but I'm not able to
reproduce that.
Actually, I think it was noise.
- Melanie
[1]: /messages/by-id/CAAKRu_aPqZkThyfr0USaHp-3cN_ruEdAHBKtNQJqXDTjWUz0rw@mail.gmail.com
Attachments:
0003-comment-updates.patchtext/x-patch; charset=US-ASCII; name=0003-comment-updates.patchDownload
From 882e937c122f5e83bc9ba643443c1a27c807d82e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 3 Apr 2024 10:12:44 -0400
Subject: [PATCH 3/3] comment updates
---
src/backend/access/heap/pruneheap.c | 53 +++++++++++++----------------
src/backend/storage/ipc/procarray.c | 6 ++--
src/include/access/heapam.h | 2 +-
3 files changed, 28 insertions(+), 33 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8ed44ba93d..4a6a4cee4d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -328,7 +328,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
- * heap_page_prune_and_freeze() is responsible for initializing it.
+ * heap_page_prune_and_freeze() is responsible for initializing it. Required by
+ * all callers.
*
* reason indicates why the pruning is performed. It is included in the WAL
* record for debugging and analysis purposes, but otherwise has no effect.
@@ -393,6 +394,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.pagefrz.freeze_required = false;
if (prstate.freeze)
{
+ Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
prstate.pagefrz.NoFreezePageRelfrozenXid = *new_relfrozen_xid;
prstate.pagefrz.FreezePageRelminMxid = *new_relmin_mxid;
@@ -415,19 +417,19 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We keep track of whether
- * the page will be all_visible and all_frozen, once we're done with the
- * pruning and freezing, to help the caller to do that.
+ * Caller may update the VM after we're done. We can keep track of
+ * whether the page will be all-visible and all-frozen after pruning and
+ * freezing to help the caller to do that.
*
* Currently, only VACUUM sets the VM bits. To save the effort, only do
- * only the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag, if you wanted
- * to update the VM bits without also freezing, or freezing without
+ * the bookkeeping if the caller needs it. Currently, that's tied to
+ * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
+ * to update the VM bits without also freezing or freeze without also
* setting the VM bits.
*
* In addition to telling the caller whether it can set the VM bit, we
* also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page will become frozen, we consider opportunistically
+ * the whole page would become frozen, we consider opportunistically
* freezing tuples. We will not be able to freeze the whole page if there
* are tuples present that are not visible to everyone or if there are
* dead tuples which are not yet removable. However, dead tuples which
@@ -681,16 +683,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
else
{
/*
- * Opportunistically freeze the page, if we are generating an FPI
- * anyway, and if doing so means that we can set the page
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page
* all-frozen afterwards (might not happen until VACUUM's final
* heap pass).
*
* XXX: Previously, we knew if pruning emitted an FPI by checking
* pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records are combined, this heuristic couldn't be used
- * anymore. The opportunistic freeze heuristic must be improved;
- * however, for now, try to approximate the old logic.
+ * and prune records were combined, this heuristic couldn't be
+ * used anymore. The opportunistic freeze heuristic must be
+ * improved; however, for now, try to approximate the old logic.
*/
if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
{
@@ -766,7 +768,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we will also freeze or prune the page, we will mark the
+ * hint. If we are going to freeze or prune the page, we will mark the
* buffer dirty below.
*/
if (!do_freeze && !do_prune)
@@ -854,12 +856,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* make the choice of whether or not to freeze the page unaffected by the
* short-term presence of LP_DEAD items. These LP_DEAD items were
* effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which heap pass (initial pass or final pass) ends up setting the
- * page all-frozen, as long as the ongoing VACUUM does it.
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
*
* Now that freezing has been finalized, unset all_visible if there are
* any LP_DEAD items on the page. It needs to reflect the present state
- * of things, as expected by our caller.
+ * of the page, as expected by our caller.
*/
if (prstate.all_visible && prstate.lpdead_items == 0)
{
@@ -1232,14 +1234,7 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page. After
- * finishing this first pass of tuple visibility checks, initialize
- * all_visible_except_removable with the current value of all_visible to
- * indicate whether or not the page is all visible except for dead tuples.
- * This will allow us to attempt to freeze the page after pruning. Later
- * during pruning, if we encounter an LP_DEAD item or are setting an item
- * LP_DEAD, we will unset all_visible. As long as we unset it before
- * updating the visibility map, this will be correct.
+ * Removable dead tuples shouldn't preclude freezing the page.
*/
/* Record the dead offset for vacuum */
@@ -1663,10 +1658,10 @@ heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
else
{
/*
- * When heap_page_prune() was called, mark_unused_now may have
- * been passed as true, which allows would-be LP_DEAD items to be
- * made LP_UNUSED instead. This is only possible if the relation
- * has no indexes. If there are any dead items, then
+ * When heap_page_prune_and_freeze() was called, mark_unused_now
+ * may have been passed as true, which allows would-be LP_DEAD
+ * items to be made LP_UNUSED instead. This is only possible if
+ * the relation has no indexes. If there are any dead items, then
* mark_unused_now was not true and every item being marked
* LP_UNUSED must refer to a heap-only tuple.
*/
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index b3cd248fb6..88a6d504df 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1715,9 +1715,9 @@ TransactionIdIsActive(TransactionId xid)
* Note: the approximate horizons (see definition of GlobalVisState) are
* updated by the computations done here. That's currently required for
* correctness and a small optimization. Without doing so it's possible that
- * heap vacuum's call to heap_page_prune() uses a more conservative horizon
- * than later when deciding which tuples can be removed - which the code
- * doesn't expect (breaking HOT).
+ * heap vacuum's call to heap_page_prune_and_freeze() uses a more conservative
+ * horizon than later when deciding which tuples can be removed - which the
+ * code doesn't expect (breaking HOT).
*/
static void
ComputeXidHorizons(ComputeXidHorizonsResult *h)
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 536711d98e..a307fb5f24 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -238,7 +238,7 @@ typedef struct PruneFreezeResult
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
} PruneFreezeResult;
-/* 'reason' codes for heap_page_prune() */
+/* 'reason' codes for heap_page_prune_and_freeze() */
typedef enum
{
PRUNE_ON_ACCESS, /* on-access pruning */
--
2.40.1
On 03/04/2024 17:18, Melanie Plageman wrote:
I noticed you didn't make the comment updates I suggested in my
version 13 here [1]. A few of them are outdated references to
heap_page_prune() and one to a now deleted variable name
(all_visible_except_removable).I applied them to your v13 and attached the diff.
Applied those changes, and committed. Thank you!
Off-list, Melanie reported that there is a small regression with the
benchmark script she posted yesterday, after all, but I'm not able to
reproduce that.Actually, I think it was noise.
Ok, phew.
--
Heikki Linnakangas
Neon (https://neon.tech)
On Wed, Apr 3, 2024 at 12:34 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 03/04/2024 17:18, Melanie Plageman wrote:
I noticed you didn't make the comment updates I suggested in my
version 13 here [1]. A few of them are outdated references to
heap_page_prune() and one to a now deleted variable name
(all_visible_except_removable).I applied them to your v13 and attached the diff.
Applied those changes, and committed. Thank you!
Thanks! And thanks for updating the commitfest entry!
- Melanie