Eliminate redundant tuple visibility check in vacuum
While working on a set of patches to combine the freeze and visibility
map WAL records into the prune record, I wrote the attached patches
reusing the tuple visibility information collected in heap_page_prune()
back in lazy_scan_prune().
heap_page_prune() collects the HTSV_Result for every tuple on a page
and saves it in an array used by heap_prune_chain(). If we make that
array available to lazy_scan_prune(), it can use it when collecting
stats for vacuum and determining whether or not to freeze tuples.
This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in
the page.
It also gets rid of the retry loop in lazy_scan_prune().
- Melanie
Attachments:
v1-0002-Reuse-heap_page_prune-tuple-visibility-statuses.patchapplication/x-patch; name=v1-0002-Reuse-heap_page_prune-tuple-visibility-statuses.patchDownload
From bdb432b220571086077085b29d3c90a3a1c68c47 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 3 Aug 2023 15:37:39 -0400
Subject: [PATCH v1 2/2] Reuse heap_page_prune() tuple visibility statuses
heap_page_prune() obtains the HTSV_Result (tuple visibility status)
returned from HeapTupleSatisfiesVacuum() for every tuple on the page and
stores them in an array. By making this array available to
heap_page_prune()'s caller lazy_scan_prune(), we can avoid an additional
call to HeapTupleSatisfiesVacuum() when freezing the tuples and
recording LP_DEAD items for vacuum. This saves resources and eliminates
the possibility that vacuuming corrupted data results in a hang due to
endless retry looping.
We pass the HTSV array from heap_page_prune() back to lazy_scan_prune()
in PruneResult. Now that we are passing in PruneResult to
heap_page_prune(), we can move the other output parameters passed
individually into heap_page_prune() into the PruneResult.
---
src/backend/access/heap/pruneheap.c | 68 +++++++++++++++-------------
src/backend/access/heap/vacuumlazy.c | 66 ++++++++-------------------
src/include/access/heapam.h | 14 +++++-
3 files changed, 68 insertions(+), 80 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 47b9e20915..b5bc126399 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -65,16 +65,6 @@ typedef struct
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
-
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
- *
- * Same indexing as ->marked.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
} PruneState;
/* Local functions */
@@ -83,7 +73,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- PruneState *prstate);
+ PruneState *prstate, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -191,10 +181,25 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
+ PruneResult presult;
+
/* OK, try to get exclusive buffer lock */
if (!ConditionalLockBufferForCleanup(buffer))
return;
+ /* Initialize prune result fields */
+ presult.hastup = false;
+ presult.has_lpdead_items = false;
+ presult.all_visible = true;
+ presult.all_frozen = true;
+ presult.visibility_cutoff_xid = InvalidTransactionId;
+
+ /*
+ * nnewlpdead only includes those items which were newly set to
+ * LP_DEAD during pruning.
+ */
+ presult.nnewlpdead = 0;
+
/*
* Now that we have buffer lock, get accurate information about the
* page's free space, and recheck the heuristic about whether to
@@ -202,11 +207,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
- int ndeleted,
- nnewlpdead;
-
- ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
- limited_ts, &nnewlpdead, NULL);
+ int ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
+ limited_ts, &presult, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -222,9 +224,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* tracks ndeleted, since it will set the same LP_DEAD items to
* LP_UNUSED separately.
*/
- if (ndeleted > nnewlpdead)
+ if (ndeleted > presult.nnewlpdead)
pgstat_update_heap_dead_tuples(relation,
- ndeleted - nnewlpdead);
+ ndeleted - presult.nnewlpdead);
}
/* And release buffer lock */
@@ -253,12 +255,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* either have been set by TransactionIdLimitedForOldSnapshots, or
* InvalidTransactionId/0 respectively.
*
- * Sets *nnewlpdead for caller, indicating the number of items that were
- * newly set LP_DEAD during prune operation.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
+ * presult contains output parameters relevant to the caller.
+ * For example, sets presult->nnewlpdead for caller, indicating the number of
+ * items that were newly set LP_DEAD during prune operation.
+ *
* Returns the number of tuples deleted from the page during this call.
*/
int
@@ -266,7 +269,7 @@ heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
TransactionId old_snap_xmin,
TimestampTz old_snap_ts,
- int *nnewlpdead,
+ PruneResult *presult,
OffsetNumber *off_loc)
{
int ndeleted = 0;
@@ -331,7 +334,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsNormal(itemid))
{
- prstate.htsv[offnum] = -1;
+ presult->htsv[offnum] = -1;
continue;
}
@@ -347,8 +350,8 @@ heap_page_prune(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
- prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
}
/* Scan the page */
@@ -372,7 +375,7 @@ heap_page_prune(Relation relation, Buffer buffer,
continue;
/* Process this item or chain of items */
- ndeleted += heap_prune_chain(buffer, offnum, &prstate);
+ ndeleted += heap_prune_chain(buffer, offnum, &prstate, presult);
}
/* Clear the offset information once we have processed the given page. */
@@ -473,7 +476,7 @@ heap_page_prune(Relation relation, Buffer buffer,
END_CRIT_SECTION();
/* Record number of newly-set-LP_DEAD items for caller */
- *nnewlpdead = prstate.ndead;
+ presult->nnewlpdead = prstate.ndead;
return ndeleted;
}
@@ -588,7 +591,8 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Returns the number of tuples (to be) deleted from the page.
*/
static int
-heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
+heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
+ PruneState *prstate, PruneResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -609,7 +613,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(prstate->htsv[rootoffnum] != -1);
+ Assert(presult->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -632,7 +636,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -700,7 +704,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
break;
Assert(ItemIdIsNormal(lp));
- Assert(prstate->htsv[offnum] != -1);
+ Assert(presult->htsv[offnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, lp);
/*
@@ -720,7 +724,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
*/
tupdead = recent_dead = false;
- switch ((HTSV_Result) prstate->htsv[offnum])
+ switch ((HTSV_Result) presult->htsv[offnum])
{
case HEAPTUPLE_DEAD:
tupdead = true;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d4dad0e6ea..8fc781efaa 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1497,21 +1497,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
* Caller must hold pin and buffer cleanup lock on the buffer.
- *
- * Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
- * was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
- * whether or not a tuple should be considered DEAD. This happened when an
- * inserting transaction concurrently aborted (after our heap_page_prune()
- * call, before our HeapTupleSatisfiesVacuum() call). There was rather a lot
- * of complexity just so we could deal with tuples that were DEAD to VACUUM,
- * but nevertheless were left with storage after pruning.
- *
- * The approach we take now is to restart pruning when the race condition is
- * detected. This allows heap_page_prune() to prune the tuples inserted by
- * the now-aborted transaction. This is a little crude, but it guarantees
- * that any items that make it into the dead_items array are simple LP_DEAD
- * line pointers, and that every remaining item with tuple storage is
- * considered as a candidate for freezing.
*/
static void
lazy_scan_prune(LVRelState *vacrel,
@@ -1524,14 +1509,11 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
- HeapTupleData tuple;
- HTSV_Result res;
int tuples_deleted,
tuples_frozen,
lpdead_items,
live_tuples,
recently_dead_tuples;
- int nnewlpdead;
HeapPageFreeze pagefrz;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1546,14 +1528,23 @@ lazy_scan_prune(LVRelState *vacrel,
*/
maxoff = PageGetMaxOffsetNumber(page);
-retry:
-
/* Initialize (or reset) page-level state */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
+ presult->hastup = false;
+ presult->has_lpdead_items = false;
+ presult->all_visible = true;
+ presult->all_frozen = true;
+ presult->visibility_cutoff_xid = InvalidTransactionId;
+
+ /*
+ * nnewlpdead only includes those items which were newly set to LP_DEAD
+ * during pruning.
+ */
+ presult->nnewlpdead = 0;
tuples_deleted = 0;
tuples_frozen = 0;
lpdead_items = 0;
@@ -1570,23 +1561,19 @@ retry:
* that were deleted from indexes.
*/
tuples_deleted = heap_page_prune(rel, buf, vacrel->vistest,
- InvalidTransactionId, 0, &nnewlpdead,
+ InvalidTransactionId, 0,
+ presult,
&vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and check for tuples
* requiring freezing among remaining tuples with storage
*/
- presult->hastup = false;
- presult->has_lpdead_items = false;
- presult->all_visible = true;
- presult->all_frozen = true;
- presult->visibility_cutoff_xid = InvalidTransactionId;
-
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
+ HeapTupleHeader htup;
bool totally_frozen;
/*
@@ -1629,22 +1616,7 @@ retry:
Assert(ItemIdIsNormal(itemid));
- ItemPointerSet(&(tuple.t_self), blkno, offnum);
- tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
- tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(rel);
-
- /*
- * DEAD tuples are almost always pruned into LP_DEAD line pointers by
- * heap_page_prune(), but it's possible that the tuple state changed
- * since heap_page_prune() looked. Handle that here by restarting.
- * (See comments at the top of function for a full explanation.)
- */
- res = HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
- buf);
-
- if (unlikely(res == HEAPTUPLE_DEAD))
- goto retry;
+ htup = (HeapTupleHeader) PageGetItem(page, itemid);
/*
* The criteria for counting a tuple as live in this block need to
@@ -1665,7 +1637,7 @@ retry:
* (Cases where we bypass index vacuuming will violate this optimistic
* assumption, but the overall impact of that should be negligible.)
*/
- switch (res)
+ switch (presult->htsv[offnum])
{
case HEAPTUPLE_LIVE:
@@ -1687,7 +1659,7 @@ retry:
{
TransactionId xmin;
- if (!HeapTupleHeaderXminCommitted(tuple.t_data))
+ if (!HeapTupleHeaderXminCommitted(htup))
{
presult->all_visible = false;
break;
@@ -1697,7 +1669,7 @@ retry:
* The inserter definitely committed. But is it old enough
* that everyone sees it as committed?
*/
- xmin = HeapTupleHeaderGetXmin(tuple.t_data);
+ xmin = HeapTupleHeaderGetXmin(htup);
if (!TransactionIdPrecedes(xmin,
vacrel->cutoffs.OldestXmin))
{
@@ -1751,7 +1723,7 @@ retry:
presult->hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(tuple.t_data, &vacrel->cutoffs, &pagefrz,
+ if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
&frozen[tuples_frozen], &totally_frozen))
{
/* Save prepared freeze plan for later */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9a07828e5a..bf94bda634 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -199,6 +199,8 @@ typedef struct PruneResult
bool hastup; /* Page prevents rel truncation? */
bool has_lpdead_items; /* includes existing LP_DEAD items */
+ int nnewlpdead; /* Number of newly LP_DEAD items */
+
/*
* State describes the proper VM bit states to set for the page following
* pruning and freezing. all_visible implies !has_lpdead_items, but don't
@@ -207,6 +209,16 @@ typedef struct PruneResult
bool all_visible; /* Every item visible to all? */
bool all_frozen; /* provided all_visible is also true */
TransactionId visibility_cutoff_xid; /* For recovery conflicts */
+
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune() for details.
+ * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
+ * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ *
+ * Same indexing as ->marked.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
} PruneResult;
@@ -307,7 +319,7 @@ extern int heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
TransactionId old_snap_xmin,
TimestampTz old_snap_ts,
- int *nnewlpdead,
+ PruneResult *presult,
OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer,
OffsetNumber *redirected, int nredirected,
--
2.37.2
v1-0001-Rebrand-LVPagePruneState-as-PruneResult.patchapplication/x-patch; name=v1-0001-Rebrand-LVPagePruneState-as-PruneResult.patchDownload
From 10d7ea5952c6bd372e8e2b33e2bd54a940e1fff7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 3 Aug 2023 15:08:58 -0400
Subject: [PATCH v1 1/2] Rebrand LVPagePruneState as PruneResult
With this rename, and relocation to heapam.h, we can use PruneResult as
an output parameter for on-access pruning as well. It also makes it
harder to confuse with PruneState.
We'll be adding and removing PruneResult fields in future commits, but
this rename and relocation is separate to make the diff easier to
understand.
---
src/backend/access/heap/vacuumlazy.c | 114 +++++++++++----------------
src/include/access/heapam.h | 19 +++++
src/tools/pgindent/typedefs.list | 2 +-
3 files changed, 68 insertions(+), 67 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6a41ee635d..d4dad0e6ea 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -212,24 +212,6 @@ typedef struct LVRelState
int64 missed_dead_tuples; /* # removable, but not removed */
} LVRelState;
-/*
- * State returned by lazy_scan_prune()
- */
-typedef struct LVPagePruneState
-{
- bool hastup; /* Page prevents rel truncation? */
- bool has_lpdead_items; /* includes existing LP_DEAD items */
-
- /*
- * State describes the proper VM bit states to set for the page following
- * pruning and freezing. all_visible implies !has_lpdead_items, but don't
- * trust all_frozen result unless all_visible is also set to true.
- */
- bool all_visible; /* Every item visible to all? */
- bool all_frozen; /* provided all_visible is also true */
- TransactionId visibility_cutoff_xid; /* For recovery conflicts */
-} LVPagePruneState;
-
/* Struct for saving and restoring vacuum error information. */
typedef struct LVSavedErrInfo
{
@@ -250,7 +232,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
bool sharelock, Buffer vmbuffer);
static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
- LVPagePruneState *prunestate);
+ PruneResult *presult);
static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool *hastup, bool *recordfreespace);
@@ -854,7 +836,7 @@ lazy_scan_heap(LVRelState *vacrel)
Buffer buf;
Page page;
bool all_visible_according_to_vm;
- LVPagePruneState prunestate;
+ PruneResult presult;
if (blkno == next_unskippable_block)
{
@@ -1019,12 +1001,12 @@ lazy_scan_heap(LVRelState *vacrel)
* were pruned some time earlier. Also considers freezing XIDs in the
* tuple headers of remaining items with storage.
*/
- lazy_scan_prune(vacrel, buf, blkno, page, &prunestate);
+ lazy_scan_prune(vacrel, buf, blkno, page, &presult);
- Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
+ Assert(!presult.all_visible || !presult.has_lpdead_items);
/* Remember the location of the last page with nonremovable tuples */
- if (prunestate.hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
if (vacrel->nindexes == 0)
@@ -1037,7 +1019,7 @@ lazy_scan_heap(LVRelState *vacrel)
* performed here can be thought of as the one-pass equivalent of
* a call to lazy_vacuum().
*/
- if (prunestate.has_lpdead_items)
+ if (presult.has_lpdead_items)
{
Size freespace;
@@ -1064,8 +1046,8 @@ lazy_scan_heap(LVRelState *vacrel)
*
* Our call to lazy_vacuum_heap_page() will have considered if
* it's possible to set all_visible/all_frozen independently
- * of lazy_scan_prune(). Note that prunestate was invalidated
- * by lazy_vacuum_heap_page() call.
+ * of lazy_scan_prune(). Note that prune result was
+ * invalidated by lazy_vacuum_heap_page() call.
*/
freespace = PageGetHeapFreeSpace(page);
@@ -1078,23 +1060,23 @@ lazy_scan_heap(LVRelState *vacrel)
* There was no call to lazy_vacuum_heap_page() because pruning
* didn't encounter/create any LP_DEAD items that needed to be
* vacuumed. Prune state has not been invalidated, so proceed
- * with prunestate-driven visibility map and FSM steps (just like
- * the two-pass strategy).
+ * with prune result-driven visibility map and FSM steps (just
+ * like the two-pass strategy).
*/
Assert(dead_items->num_items == 0);
}
/*
* Handle setting visibility map bit based on information from the VM
- * (as of last lazy_scan_skip() call), and from prunestate
+ * (as of last lazy_scan_skip() call), and from presult
*/
- if (!all_visible_according_to_vm && prunestate.all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (prunestate.all_frozen)
+ if (presult.all_frozen)
{
- Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1114,7 +1096,7 @@ lazy_scan_heap(LVRelState *vacrel)
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, prunestate.visibility_cutoff_xid,
+ vmbuffer, presult.visibility_cutoff_xid,
flags);
}
@@ -1148,7 +1130,7 @@ lazy_scan_heap(LVRelState *vacrel)
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE
* set, however.
*/
- else if (prunestate.has_lpdead_items && PageIsAllVisible(page))
+ else if (presult.has_lpdead_items && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
@@ -1161,16 +1143,16 @@ lazy_scan_heap(LVRelState *vacrel)
/*
* If the all-visible page is all-frozen but not marked as such yet,
* mark it as all-frozen. Note that all_frozen is only valid if
- * all_visible is true, so we must check both prunestate fields.
+ * all_visible is true, so we must check both presult fields.
*/
- else if (all_visible_according_to_vm && prunestate.all_visible &&
- prunestate.all_frozen &&
+ else if (all_visible_according_to_vm && presult.all_visible &&
+ presult.all_frozen &&
!VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
* page-level PD_ALL_VISIBLE bit being set, since it might have
- * become stale -- even when all_visible is set in prunestate
+ * become stale -- even when all_visible is set in presult
*/
if (!PageIsAllVisible(page))
{
@@ -1185,7 +1167,7 @@ lazy_scan_heap(LVRelState *vacrel)
* since a snapshotConflictHorizon sufficient to make everything
* safe for REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
@@ -1196,7 +1178,7 @@ lazy_scan_heap(LVRelState *vacrel)
* Final steps for block: drop cleanup lock, record free space in the
* FSM
*/
- if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
+ if (presult.has_lpdead_items && vacrel->do_index_vacuuming)
{
/*
* Wait until lazy_vacuum_heap_rel() to save free space. This
@@ -1536,7 +1518,7 @@ lazy_scan_prune(LVRelState *vacrel,
Buffer buf,
BlockNumber blkno,
Page page,
- LVPagePruneState *prunestate)
+ PruneResult *presult)
{
Relation rel = vacrel->rel;
OffsetNumber offnum,
@@ -1595,11 +1577,11 @@ retry:
* Now scan the page to collect LP_DEAD items and check for tuples
* requiring freezing among remaining tuples with storage
*/
- prunestate->hastup = false;
- prunestate->has_lpdead_items = false;
- prunestate->all_visible = true;
- prunestate->all_frozen = true;
- prunestate->visibility_cutoff_xid = InvalidTransactionId;
+ presult->hastup = false;
+ presult->has_lpdead_items = false;
+ presult->all_visible = true;
+ presult->all_frozen = true;
+ presult->visibility_cutoff_xid = InvalidTransactionId;
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -1621,7 +1603,7 @@ retry:
if (ItemIdIsRedirected(itemid))
{
/* page makes rel truncation unsafe */
- prunestate->hastup = true;
+ presult->hastup = true;
continue;
}
@@ -1701,13 +1683,13 @@ retry:
* asynchronously. See SetHintBits for more info. Check that
* the tuple is hinted xmin-committed because of that.
*/
- if (prunestate->all_visible)
+ if (presult->all_visible)
{
TransactionId xmin;
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
- prunestate->all_visible = false;
+ presult->all_visible = false;
break;
}
@@ -1719,14 +1701,14 @@ retry:
if (!TransactionIdPrecedes(xmin,
vacrel->cutoffs.OldestXmin))
{
- prunestate->all_visible = false;
+ presult->all_visible = false;
break;
}
/* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, prunestate->visibility_cutoff_xid) &&
+ if (TransactionIdFollows(xmin, presult->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
- prunestate->visibility_cutoff_xid = xmin;
+ presult->visibility_cutoff_xid = xmin;
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
@@ -1737,7 +1719,7 @@ retry:
* pruning.)
*/
recently_dead_tuples++;
- prunestate->all_visible = false;
+ presult->all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1748,11 +1730,11 @@ retry:
* results. This assumption is a bit shaky, but it is what
* acquire_sample_rows() does, so be consistent.
*/
- prunestate->all_visible = false;
+ presult->all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
/* This is an expected case during concurrent vacuum */
- prunestate->all_visible = false;
+ presult->all_visible = false;
/*
* Count such rows as live. As above, we assume the deleting
@@ -1766,7 +1748,7 @@ retry:
break;
}
- prunestate->hastup = true; /* page makes rel truncation unsafe */
+ presult->hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
if (heap_prepare_freeze_tuple(tuple.t_data, &vacrel->cutoffs, &pagefrz,
@@ -1782,7 +1764,7 @@ retry:
* definitely cannot be set all-frozen in the visibility map later on
*/
if (!totally_frozen)
- prunestate->all_frozen = false;
+ presult->all_frozen = false;
}
/*
@@ -1800,7 +1782,7 @@ retry:
* page all-frozen afterwards (might not happen until final heap pass).
*/
if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (prunestate->all_visible && prunestate->all_frozen &&
+ (presult->all_visible && presult->all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1838,11 +1820,11 @@ retry:
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (prunestate->all_visible && prunestate->all_frozen)
+ if (presult->all_visible && presult->all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
- prunestate->visibility_cutoff_xid = InvalidTransactionId;
+ snapshotConflictHorizon = presult->visibility_cutoff_xid;
+ presult->visibility_cutoff_xid = InvalidTransactionId;
}
else
{
@@ -1865,7 +1847,7 @@ retry:
*/
vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- prunestate->all_frozen = false;
+ presult->all_frozen = false;
tuples_frozen = 0; /* avoid miscounts in instrumentation */
}
@@ -1878,7 +1860,7 @@ retry:
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (prunestate->all_visible && lpdead_items == 0)
+ if (presult->all_visible && lpdead_items == 0)
{
TransactionId cutoff;
bool all_frozen;
@@ -1887,7 +1869,7 @@ retry:
Assert(false);
Assert(!TransactionIdIsValid(cutoff) ||
- cutoff == prunestate->visibility_cutoff_xid);
+ cutoff == presult->visibility_cutoff_xid);
}
#endif
@@ -1900,7 +1882,7 @@ retry:
ItemPointerData tmp;
vacrel->lpdead_item_pages++;
- prunestate->has_lpdead_items = true;
+ presult->has_lpdead_items = true;
ItemPointerSetBlockNumber(&tmp, blkno);
@@ -1925,7 +1907,7 @@ retry:
* Now that freezing has been finalized, unset all_visible. It needs
* to reflect the present state of things, as expected by our caller.
*/
- prunestate->all_visible = false;
+ presult->all_visible = false;
}
/* Finally, add page-local counts to whole-VACUUM counts */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index faf5026519..9a07828e5a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -191,6 +191,25 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
+/*
+ * State returned from pruning
+ */
+typedef struct PruneResult
+{
+ bool hastup; /* Page prevents rel truncation? */
+ bool has_lpdead_items; /* includes existing LP_DEAD items */
+
+ /*
+ * State describes the proper VM bit states to set for the page following
+ * pruning and freezing. all_visible implies !has_lpdead_items, but don't
+ * trust all_frozen result unless all_visible is also set to true.
+ */
+ bool all_visible; /* Every item visible to all? */
+ bool all_frozen; /* provided all_visible is also true */
+ TransactionId visibility_cutoff_xid; /* For recovery conflicts */
+} PruneResult;
+
+
/* ----------------
* function prototypes for heap access method
*
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 49a33c0387..a7a8d2acd1 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1394,7 +1394,6 @@ LPVOID
LPWSTR
LSEG
LUID
-LVPagePruneState
LVRelState
LVSavedErrInfo
LWLock
@@ -2152,6 +2151,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
+PruneResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.37.2
Hi Melanie,
On 8/29/23 01:49, Melanie Plageman wrote:
While working on a set of patches to combine the freeze and visibility
map WAL records into the prune record, I wrote the attached patches
reusing the tuple visibility information collected in heap_page_prune()
back in lazy_scan_prune().heap_page_prune() collects the HTSV_Result for every tuple on a page
and saves it in an array used by heap_prune_chain(). If we make that
array available to lazy_scan_prune(), it can use it when collecting
stats for vacuum and determining whether or not to freeze tuples.
This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in
the page.It also gets rid of the retry loop in lazy_scan_prune().
How did you test this change?
Could you measure any performance difference?
If so could you provide your test case?
--
David Geier
(ServiceNow)
Hi David,
Thanks for taking a look!
On Tue, Aug 29, 2023 at 5:07 AM David Geier <geidav.pg@gmail.com> wrote:
Hi Melanie,
On 8/29/23 01:49, Melanie Plageman wrote:
While working on a set of patches to combine the freeze and visibility
map WAL records into the prune record, I wrote the attached patches
reusing the tuple visibility information collected in heap_page_prune()
back in lazy_scan_prune().heap_page_prune() collects the HTSV_Result for every tuple on a page
and saves it in an array used by heap_prune_chain(). If we make that
array available to lazy_scan_prune(), it can use it when collecting
stats for vacuum and determining whether or not to freeze tuples.
This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in
the page.It also gets rid of the retry loop in lazy_scan_prune().
How did you test this change?
I didn't add a new test, but you can confirm some existing test
coverage if you, for example, set every HTSV_Result in the array to
HEAPTUPLE_LIVE and run the regression tests. Tests relying on vacuum
removing the right tuples may fail. For example, select count(*) from
gin_test_tbl where i @> array[1, 999]; in src/test/regress/sql/gin.sql
fails for me since it sees a tuple it shouldn't.
Could you measure any performance difference?
If so could you provide your test case?
I created a large table and then updated a tuple on every page in the
relation and vacuumed it. I saw a consistent slight improvement in
vacuum execution time. I profiled a bit with perf stat as well. The
difference is relatively small for this kind of example, but I can
work on a more compelling, realistic example. I think eliminating the
retry loop is also useful, as I have heard that users have had to
cancel vacuums which were in this retry loop for too long.
- Melanie
On Tue, Aug 29, 2023 at 5:07 AM David Geier <geidav.pg@gmail.com> wrote:
Could you measure any performance difference?
If so could you provide your test case?
I created a large table and then updated a tuple on every page in the
relation and vacuumed it. I saw a consistent slight improvement in
vacuum execution time. I profiled a bit with perf stat as well. The
difference is relatively small for this kind of example, but I can
work on a more compelling, realistic example. I think eliminating the
retry loop is also useful, as I have heard that users have had to
cancel vacuums which were in this retry loop for too long.
Just to provide a specific test case, if you create a small table like this
create table foo (a int, b int, c int) with(autovacuum_enabled=false);
insert into foo select i, i, i from generate_series(1, 10000000);
And then vacuum it. I find that with my patch applied I see a
consistent ~9% speedup (averaged across multiple runs).
master: ~533ms
patch: ~487ms
And in the profile, with my patch applied, you notice less time spent
in HeapTupleSatisfiesVacuumHorizon()
master:
11.83% postgres postgres [.] heap_page_prune
11.59% postgres postgres [.] heap_prepare_freeze_tuple
8.77% postgres postgres [.] lazy_scan_prune
8.01% postgres postgres [.] HeapTupleSatisfiesVacuumHorizon
7.79% postgres postgres [.] heap_tuple_should_freeze
patch:
13.41% postgres postgres [.] heap_prepare_freeze_tuple
9.88% postgres postgres [.] heap_page_prune
8.61% postgres postgres [.] lazy_scan_prune
7.00% postgres postgres [.] heap_tuple_should_freeze
6.43% postgres postgres [.] HeapTupleSatisfiesVacuumHorizon
- Melanie
Hi Melanie,
On 8/31/23 02:59, Melanie Plageman wrote:
I created a large table and then updated a tuple on every page in the
relation and vacuumed it. I saw a consistent slight improvement in
vacuum execution time. I profiled a bit with perf stat as well. The
difference is relatively small for this kind of example, but I can
work on a more compelling, realistic example. I think eliminating the
retry loop is also useful, as I have heard that users have had to
cancel vacuums which were in this retry loop for too long.Just to provide a specific test case, if you create a small table like this
create table foo (a int, b int, c int) with(autovacuum_enabled=false);
insert into foo select i, i, i from generate_series(1, 10000000);And then vacuum it. I find that with my patch applied I see a
consistent ~9% speedup (averaged across multiple runs).master: ~533ms
patch: ~487msAnd in the profile, with my patch applied, you notice less time spent
in HeapTupleSatisfiesVacuumHorizon()master:
11.83% postgres postgres [.] heap_page_prune
11.59% postgres postgres [.] heap_prepare_freeze_tuple
8.77% postgres postgres [.] lazy_scan_prune
8.01% postgres postgres [.] HeapTupleSatisfiesVacuumHorizon
7.79% postgres postgres [.] heap_tuple_should_freezepatch:
13.41% postgres postgres [.] heap_prepare_freeze_tuple
9.88% postgres postgres [.] heap_page_prune
8.61% postgres postgres [.] lazy_scan_prune
7.00% postgres postgres [.] heap_tuple_should_freeze
6.43% postgres postgres [.] HeapTupleSatisfiesVacuumHorizon
Thanks a lot for providing additional information and the test case.
I tried it on a release build and I also see a 10% speed-up. I reset the
visibility map between VACUUM runs, see:
CREATE EXTENSION pg_visibility; CREATE TABLE foo (a INT, b INT, c INT)
WITH(autovacuum_enabled=FALSE); INSERT INTO foo SELECT i, i, i from
generate_series(1, 10000000) i; VACUUM foo; SELECT
pg_truncate_visibility_map('foo'); VACUUM foo; SELECT
pg_truncate_visibility_map('foo'); VACUUM foo; ...
The first patch, which refactors the code so we can pass the result of
the visibility checks to the caller, looks good to me.
Regarding the 2nd patch (disclaimer: I'm not too familiar with that area
of the code): I don't completely understand why the retry loop is not
needed anymore and how you now detect/handle the possible race
condition? It can still happen that an aborting transaction changes the
state of a row after heap_page_prune() looked at that row. Would that
case now not be ignored?
--
David Geier
(ServiceNow)
On Mon, Aug 28, 2023 at 7:49 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
While working on a set of patches to combine the freeze and visibility
map WAL records into the prune record, I wrote the attached patches
reusing the tuple visibility information collected in heap_page_prune()
back in lazy_scan_prune().heap_page_prune() collects the HTSV_Result for every tuple on a page
and saves it in an array used by heap_prune_chain(). If we make that
array available to lazy_scan_prune(), it can use it when collecting
stats for vacuum and determining whether or not to freeze tuples.
This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in
the page.It also gets rid of the retry loop in lazy_scan_prune().
In general, I like these patches. I think it's a clever approach, and
I don't really see any down side. It should just be straight-up better
than what we have now, and if it's not better, it still shouldn't be
any worse.
I have a few suggestions:
- Rather than removing the rather large comment block at the top of
lazy_scan_prune() I'd like to see it rewritten to explain how we now
deal with the problem. I'd suggest leaving the first paragraph ("Prior
to...") just as it is and replace all the words following "The
approach we take now is" with a description of the approach that this
patch takes to the problem.
- I'm not a huge fan of the caller of heap_page_prune() having to know
how to initialize the PruneResult. Can heap_page_prune() itself do
that work, so that presult is an out parameter rather than an in-out
parameter? Or, if not, can it be moved to a new helper function, like
heap_page_prune_init(), rather than having that logic in 2+ places?
- int ndeleted,
- nnewlpdead;
-
- ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
- limited_ts, &nnewlpdead, NULL);
+ int ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
+ limited_ts, &presult, NULL);
- I don't particularly like merging the declaration with the
assignment unless the call is narrow enough that we don't need a line
break in there, which is not the case here.
- I haven't thoroughly investigated yet, but I wonder if there are any
other places where comments need updating. As a general note, I find
it desirable for a function's header comment to mention whether any
pointer parameters are in parameters, out parameters, or in-out
parameters, and what the contract between caller and callee actually
is.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Aug 31, 2023 at 2:03 PM Robert Haas <robertmhaas@gmail.com> wrote:
I have a few suggestions:
- Rather than removing the rather large comment block at the top of
lazy_scan_prune() I'd like to see it rewritten to explain how we now
deal with the problem. I'd suggest leaving the first paragraph ("Prior
to...") just as it is and replace all the words following "The
approach we take now is" with a description of the approach that this
patch takes to the problem.
Good idea. I've updated the comment. I also explain why this new
approach works in the commit message and reference the commit which
added the previous approach.
- I'm not a huge fan of the caller of heap_page_prune() having to know
how to initialize the PruneResult. Can heap_page_prune() itself do
that work, so that presult is an out parameter rather than an in-out
parameter? Or, if not, can it be moved to a new helper function, like
heap_page_prune_init(), rather than having that logic in 2+ places?
Ah, yes. Now that it has two callers, and since it is exclusively an
output parameter, it is quite ugly to initialize it in both callers.
Fixed in the attached.
- int ndeleted, - nnewlpdead; - - ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin, - limited_ts, &nnewlpdead, NULL); + int ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin, + limited_ts, &presult, NULL);- I don't particularly like merging the declaration with the
assignment unless the call is narrow enough that we don't need a line
break in there, which is not the case here.
I have changed this.
- I haven't thoroughly investigated yet, but I wonder if there are any
other places where comments need updating. As a general note, I find
it desirable for a function's header comment to mention whether any
pointer parameters are in parameters, out parameters, or in-out
parameters, and what the contract between caller and callee actually
is.
I've investigated vacuumlazy.c and pruneheap.c and looked at the
commit that added the retry loop (8523492d4e349) to see everywhere it
added comments and don't see anywhere else that needs updating.
I have updated lazy_scan_prune()'s function header comment to describe
the nature of the in-out and output parameters and the contract.
- Melanie
Attachments:
v2-0001-Rebrand-LVPagePruneState-as-PruneResult.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Rebrand-LVPagePruneState-as-PruneResult.patchDownload
From 6ea7a2aa5698e08ce67d47efd3dbfb54be30d9cb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 3 Aug 2023 15:08:58 -0400
Subject: [PATCH v2 1/2] Rebrand LVPagePruneState as PruneResult
With this rename, and relocation to heapam.h, we can use PruneResult as
an output parameter for on-access pruning as well. It also makes it
harder to confuse with PruneState.
We'll be adding and removing PruneResult fields in future commits, but
this rename and relocation is separate to make the diff easier to
understand.
Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 122 ++++++++++++---------------
src/include/access/heapam.h | 19 +++++
src/tools/pgindent/typedefs.list | 2 +-
3 files changed, 76 insertions(+), 67 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6a41ee635d..5e0a7422bb 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -212,24 +212,6 @@ typedef struct LVRelState
int64 missed_dead_tuples; /* # removable, but not removed */
} LVRelState;
-/*
- * State returned by lazy_scan_prune()
- */
-typedef struct LVPagePruneState
-{
- bool hastup; /* Page prevents rel truncation? */
- bool has_lpdead_items; /* includes existing LP_DEAD items */
-
- /*
- * State describes the proper VM bit states to set for the page following
- * pruning and freezing. all_visible implies !has_lpdead_items, but don't
- * trust all_frozen result unless all_visible is also set to true.
- */
- bool all_visible; /* Every item visible to all? */
- bool all_frozen; /* provided all_visible is also true */
- TransactionId visibility_cutoff_xid; /* For recovery conflicts */
-} LVPagePruneState;
-
/* Struct for saving and restoring vacuum error information. */
typedef struct LVSavedErrInfo
{
@@ -250,7 +232,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
bool sharelock, Buffer vmbuffer);
static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
- LVPagePruneState *prunestate);
+ PruneResult *presult);
static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool *hastup, bool *recordfreespace);
@@ -854,7 +836,7 @@ lazy_scan_heap(LVRelState *vacrel)
Buffer buf;
Page page;
bool all_visible_according_to_vm;
- LVPagePruneState prunestate;
+ PruneResult presult;
if (blkno == next_unskippable_block)
{
@@ -1019,12 +1001,12 @@ lazy_scan_heap(LVRelState *vacrel)
* were pruned some time earlier. Also considers freezing XIDs in the
* tuple headers of remaining items with storage.
*/
- lazy_scan_prune(vacrel, buf, blkno, page, &prunestate);
+ lazy_scan_prune(vacrel, buf, blkno, page, &presult);
- Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
+ Assert(!presult.all_visible || !presult.has_lpdead_items);
/* Remember the location of the last page with nonremovable tuples */
- if (prunestate.hastup)
+ if (presult.hastup)
vacrel->nonempty_pages = blkno + 1;
if (vacrel->nindexes == 0)
@@ -1037,7 +1019,7 @@ lazy_scan_heap(LVRelState *vacrel)
* performed here can be thought of as the one-pass equivalent of
* a call to lazy_vacuum().
*/
- if (prunestate.has_lpdead_items)
+ if (presult.has_lpdead_items)
{
Size freespace;
@@ -1064,8 +1046,8 @@ lazy_scan_heap(LVRelState *vacrel)
*
* Our call to lazy_vacuum_heap_page() will have considered if
* it's possible to set all_visible/all_frozen independently
- * of lazy_scan_prune(). Note that prunestate was invalidated
- * by lazy_vacuum_heap_page() call.
+ * of lazy_scan_prune(). Note that prune result was
+ * invalidated by lazy_vacuum_heap_page() call.
*/
freespace = PageGetHeapFreeSpace(page);
@@ -1078,23 +1060,23 @@ lazy_scan_heap(LVRelState *vacrel)
* There was no call to lazy_vacuum_heap_page() because pruning
* didn't encounter/create any LP_DEAD items that needed to be
* vacuumed. Prune state has not been invalidated, so proceed
- * with prunestate-driven visibility map and FSM steps (just like
- * the two-pass strategy).
+ * with prune result-driven visibility map and FSM steps (just
+ * like the two-pass strategy).
*/
Assert(dead_items->num_items == 0);
}
/*
* Handle setting visibility map bit based on information from the VM
- * (as of last lazy_scan_skip() call), and from prunestate
+ * (as of last lazy_scan_skip() call), and from presult
*/
- if (!all_visible_according_to_vm && prunestate.all_visible)
+ if (!all_visible_according_to_vm && presult.all_visible)
{
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
- if (prunestate.all_frozen)
+ if (presult.all_frozen)
{
- Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
flags |= VISIBILITYMAP_ALL_FROZEN;
}
@@ -1114,7 +1096,7 @@ lazy_scan_heap(LVRelState *vacrel)
PageSetAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
- vmbuffer, prunestate.visibility_cutoff_xid,
+ vmbuffer, presult.visibility_cutoff_xid,
flags);
}
@@ -1148,7 +1130,7 @@ lazy_scan_heap(LVRelState *vacrel)
* There should never be LP_DEAD items on a page with PD_ALL_VISIBLE
* set, however.
*/
- else if (prunestate.has_lpdead_items && PageIsAllVisible(page))
+ else if (presult.has_lpdead_items && PageIsAllVisible(page))
{
elog(WARNING, "page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
vacrel->relname, blkno);
@@ -1161,16 +1143,16 @@ lazy_scan_heap(LVRelState *vacrel)
/*
* If the all-visible page is all-frozen but not marked as such yet,
* mark it as all-frozen. Note that all_frozen is only valid if
- * all_visible is true, so we must check both prunestate fields.
+ * all_visible is true, so we must check both presult fields.
*/
- else if (all_visible_according_to_vm && prunestate.all_visible &&
- prunestate.all_frozen &&
+ else if (all_visible_according_to_vm && presult.all_visible &&
+ presult.all_frozen &&
!VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
/*
* Avoid relying on all_visible_according_to_vm as a proxy for the
* page-level PD_ALL_VISIBLE bit being set, since it might have
- * become stale -- even when all_visible is set in prunestate
+ * become stale -- even when all_visible is set in presult
*/
if (!PageIsAllVisible(page))
{
@@ -1185,7 +1167,7 @@ lazy_scan_heap(LVRelState *vacrel)
* since a snapshotConflictHorizon sufficient to make everything
* safe for REDO was logged when the page's tuples were frozen.
*/
- Assert(!TransactionIdIsValid(prunestate.visibility_cutoff_xid));
+ Assert(!TransactionIdIsValid(presult.visibility_cutoff_xid));
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
@@ -1196,7 +1178,7 @@ lazy_scan_heap(LVRelState *vacrel)
* Final steps for block: drop cleanup lock, record free space in the
* FSM
*/
- if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
+ if (presult.has_lpdead_items && vacrel->do_index_vacuuming)
{
/*
* Wait until lazy_vacuum_heap_rel() to save free space. This
@@ -1514,8 +1496,16 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
+ * Prune and freeze a single page.
* Caller must hold pin and buffer cleanup lock on the buffer.
*
+ * vacrel is an in-out parameter lazy_scan_prune() references for relation-wide
+ * state and updates with information needed by lazy_scan_heap() to reap dead
+ * tuples, update statistics, and move the relfrozenxid forward.
+ *
+ * presult is an output parameter initialized and updated exclusively by
+ * lazy_scan_prune().
+ *
* Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
* was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
* whether or not a tuple should be considered DEAD. This happened when an
@@ -1536,7 +1526,7 @@ lazy_scan_prune(LVRelState *vacrel,
Buffer buf,
BlockNumber blkno,
Page page,
- LVPagePruneState *prunestate)
+ PruneResult *presult)
{
Relation rel = vacrel->rel;
OffsetNumber offnum,
@@ -1595,11 +1585,11 @@ retry:
* Now scan the page to collect LP_DEAD items and check for tuples
* requiring freezing among remaining tuples with storage
*/
- prunestate->hastup = false;
- prunestate->has_lpdead_items = false;
- prunestate->all_visible = true;
- prunestate->all_frozen = true;
- prunestate->visibility_cutoff_xid = InvalidTransactionId;
+ presult->hastup = false;
+ presult->has_lpdead_items = false;
+ presult->all_visible = true;
+ presult->all_frozen = true;
+ presult->visibility_cutoff_xid = InvalidTransactionId;
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
@@ -1621,7 +1611,7 @@ retry:
if (ItemIdIsRedirected(itemid))
{
/* page makes rel truncation unsafe */
- prunestate->hastup = true;
+ presult->hastup = true;
continue;
}
@@ -1701,13 +1691,13 @@ retry:
* asynchronously. See SetHintBits for more info. Check that
* the tuple is hinted xmin-committed because of that.
*/
- if (prunestate->all_visible)
+ if (presult->all_visible)
{
TransactionId xmin;
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
- prunestate->all_visible = false;
+ presult->all_visible = false;
break;
}
@@ -1719,14 +1709,14 @@ retry:
if (!TransactionIdPrecedes(xmin,
vacrel->cutoffs.OldestXmin))
{
- prunestate->all_visible = false;
+ presult->all_visible = false;
break;
}
/* Track newest xmin on page. */
- if (TransactionIdFollows(xmin, prunestate->visibility_cutoff_xid) &&
+ if (TransactionIdFollows(xmin, presult->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
- prunestate->visibility_cutoff_xid = xmin;
+ presult->visibility_cutoff_xid = xmin;
}
break;
case HEAPTUPLE_RECENTLY_DEAD:
@@ -1737,7 +1727,7 @@ retry:
* pruning.)
*/
recently_dead_tuples++;
- prunestate->all_visible = false;
+ presult->all_visible = false;
break;
case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1748,11 +1738,11 @@ retry:
* results. This assumption is a bit shaky, but it is what
* acquire_sample_rows() does, so be consistent.
*/
- prunestate->all_visible = false;
+ presult->all_visible = false;
break;
case HEAPTUPLE_DELETE_IN_PROGRESS:
/* This is an expected case during concurrent vacuum */
- prunestate->all_visible = false;
+ presult->all_visible = false;
/*
* Count such rows as live. As above, we assume the deleting
@@ -1766,7 +1756,7 @@ retry:
break;
}
- prunestate->hastup = true; /* page makes rel truncation unsafe */
+ presult->hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
if (heap_prepare_freeze_tuple(tuple.t_data, &vacrel->cutoffs, &pagefrz,
@@ -1782,7 +1772,7 @@ retry:
* definitely cannot be set all-frozen in the visibility map later on
*/
if (!totally_frozen)
- prunestate->all_frozen = false;
+ presult->all_frozen = false;
}
/*
@@ -1800,7 +1790,7 @@ retry:
* page all-frozen afterwards (might not happen until final heap pass).
*/
if (pagefrz.freeze_required || tuples_frozen == 0 ||
- (prunestate->all_visible && prunestate->all_frozen &&
+ (presult->all_visible && presult->all_frozen &&
fpi_before != pgWalUsage.wal_fpi))
{
/*
@@ -1838,11 +1828,11 @@ retry:
* once we're done with it. Otherwise we generate a conservative
* cutoff by stepping back from OldestXmin.
*/
- if (prunestate->all_visible && prunestate->all_frozen)
+ if (presult->all_visible && presult->all_frozen)
{
/* Using same cutoff when setting VM is now unnecessary */
- snapshotConflictHorizon = prunestate->visibility_cutoff_xid;
- prunestate->visibility_cutoff_xid = InvalidTransactionId;
+ snapshotConflictHorizon = presult->visibility_cutoff_xid;
+ presult->visibility_cutoff_xid = InvalidTransactionId;
}
else
{
@@ -1865,7 +1855,7 @@ retry:
*/
vacrel->NewRelfrozenXid = pagefrz.NoFreezePageRelfrozenXid;
vacrel->NewRelminMxid = pagefrz.NoFreezePageRelminMxid;
- prunestate->all_frozen = false;
+ presult->all_frozen = false;
tuples_frozen = 0; /* avoid miscounts in instrumentation */
}
@@ -1878,7 +1868,7 @@ retry:
*/
#ifdef USE_ASSERT_CHECKING
/* Note that all_frozen value does not matter when !all_visible */
- if (prunestate->all_visible && lpdead_items == 0)
+ if (presult->all_visible && lpdead_items == 0)
{
TransactionId cutoff;
bool all_frozen;
@@ -1887,7 +1877,7 @@ retry:
Assert(false);
Assert(!TransactionIdIsValid(cutoff) ||
- cutoff == prunestate->visibility_cutoff_xid);
+ cutoff == presult->visibility_cutoff_xid);
}
#endif
@@ -1900,7 +1890,7 @@ retry:
ItemPointerData tmp;
vacrel->lpdead_item_pages++;
- prunestate->has_lpdead_items = true;
+ presult->has_lpdead_items = true;
ItemPointerSetBlockNumber(&tmp, blkno);
@@ -1925,7 +1915,7 @@ retry:
* Now that freezing has been finalized, unset all_visible. It needs
* to reflect the present state of things, as expected by our caller.
*/
- prunestate->all_visible = false;
+ presult->all_visible = false;
}
/* Finally, add page-local counts to whole-VACUUM counts */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index faf5026519..9a07828e5a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -191,6 +191,25 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
+/*
+ * State returned from pruning
+ */
+typedef struct PruneResult
+{
+ bool hastup; /* Page prevents rel truncation? */
+ bool has_lpdead_items; /* includes existing LP_DEAD items */
+
+ /*
+ * State describes the proper VM bit states to set for the page following
+ * pruning and freezing. all_visible implies !has_lpdead_items, but don't
+ * trust all_frozen result unless all_visible is also set to true.
+ */
+ bool all_visible; /* Every item visible to all? */
+ bool all_frozen; /* provided all_visible is also true */
+ TransactionId visibility_cutoff_xid; /* For recovery conflicts */
+} PruneResult;
+
+
/* ----------------
* function prototypes for heap access method
*
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 49a33c0387..a7a8d2acd1 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1394,7 +1394,6 @@ LPVOID
LPWSTR
LSEG
LUID
-LVPagePruneState
LVRelState
LVSavedErrInfo
LWLock
@@ -2152,6 +2151,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
+PruneResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.37.2
v2-0002-Reuse-heap_page_prune-tuple-visibility-statuses.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Reuse-heap_page_prune-tuple-visibility-statuses.patchDownload
From ee66907ed8a3af73ec974347dadab433d1a0a80f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 31 Aug 2023 18:15:35 -0400
Subject: [PATCH v2 2/2] Reuse heap_page_prune() tuple visibility statuses
heap_page_prune() obtains the HTSV_Result (tuple visibility status)
returned from HeapTupleSatisfiesVacuum() for every tuple on the page and
stores them in an array. By making this array available to
heap_page_prune()'s caller lazy_scan_prune(), we can avoid an additional
call to HeapTupleSatisfiesVacuum() when freezing the tuples and
recording LP_DEAD items for vacuum. This saves resources and eliminates
the possibility that vacuuming corrupted data results in a hang due to
endless retry looping.
This replaces the retry mechanism introduced in 8523492d4 to handle cases in
which a tuple's inserting transaction aborted between the visibility check in
heap_page_prune() and lazy_scan_prune()'s call to HeapTupleSatisfiesVacuum() --
rendering it dead but without a dead line pointer. We can instead reuse the
tuple's original visibility status, circumventing any disagreements.
We pass the HTSV array from heap_page_prune() back to lazy_scan_prune() in
PruneResult. It is convenient to move heap_page_prune()'s other output
parameters into the PruneResult.
Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 66 +++++++++++++++-------------
src/backend/access/heap/vacuumlazy.c | 54 +++++++----------------
src/include/access/heapam.h | 14 +++++-
3 files changed, 64 insertions(+), 70 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 47b9e20915..512f62e27d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -65,16 +65,6 @@ typedef struct
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
-
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
- *
- * Same indexing as ->marked.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
} PruneState;
/* Local functions */
@@ -83,7 +73,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
- PruneState *prstate);
+ PruneState *prstate, PruneResult *presult);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -191,6 +181,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
+ PruneResult presult;
+
/* OK, try to get exclusive buffer lock */
if (!ConditionalLockBufferForCleanup(buffer))
return;
@@ -202,11 +194,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
- int ndeleted,
- nnewlpdead;
+ int ndeleted;
ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
- limited_ts, &nnewlpdead, NULL);
+ limited_ts, &presult, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -222,9 +213,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* tracks ndeleted, since it will set the same LP_DEAD items to
* LP_UNUSED separately.
*/
- if (ndeleted > nnewlpdead)
+ if (ndeleted > presult.nnewlpdead)
pgstat_update_heap_dead_tuples(relation,
- ndeleted - nnewlpdead);
+ ndeleted - presult.nnewlpdead);
}
/* And release buffer lock */
@@ -253,12 +244,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* either have been set by TransactionIdLimitedForOldSnapshots, or
* InvalidTransactionId/0 respectively.
*
- * Sets *nnewlpdead for caller, indicating the number of items that were
- * newly set LP_DEAD during prune operation.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
+ * presult contains output parameters relevant to the caller.
+ * For example, sets presult->nnewlpdead for caller, indicating the number of
+ * items that were newly set LP_DEAD during prune operation.
+ *
* Returns the number of tuples deleted from the page during this call.
*/
int
@@ -266,7 +258,7 @@ heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
TransactionId old_snap_xmin,
TimestampTz old_snap_ts,
- int *nnewlpdead,
+ PruneResult *presult,
OffsetNumber *off_loc)
{
int ndeleted = 0;
@@ -301,6 +293,19 @@ heap_page_prune(Relation relation, Buffer buffer,
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(prstate.rel);
+ /* Initialize PruneResult */
+ presult->hastup = false;
+ presult->has_lpdead_items = false;
+ presult->all_visible = true;
+ presult->all_frozen = true;
+ presult->visibility_cutoff_xid = InvalidTransactionId;
+
+ /*
+ * nnewlpdead only includes those items which were newly set to LP_DEAD
+ * during pruning.
+ */
+ presult->nnewlpdead = 0;
+
/*
* Determine HTSV for all tuples.
*
@@ -331,7 +336,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsNormal(itemid))
{
- prstate.htsv[offnum] = -1;
+ presult->htsv[offnum] = -1;
continue;
}
@@ -347,8 +352,8 @@ heap_page_prune(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
- prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
}
/* Scan the page */
@@ -372,7 +377,7 @@ heap_page_prune(Relation relation, Buffer buffer,
continue;
/* Process this item or chain of items */
- ndeleted += heap_prune_chain(buffer, offnum, &prstate);
+ ndeleted += heap_prune_chain(buffer, offnum, &prstate, presult);
}
/* Clear the offset information once we have processed the given page. */
@@ -473,7 +478,7 @@ heap_page_prune(Relation relation, Buffer buffer,
END_CRIT_SECTION();
/* Record number of newly-set-LP_DEAD items for caller */
- *nnewlpdead = prstate.ndead;
+ presult->nnewlpdead = prstate.ndead;
return ndeleted;
}
@@ -588,7 +593,8 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Returns the number of tuples (to be) deleted from the page.
*/
static int
-heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
+heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
+ PruneState *prstate, PruneResult *presult)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -609,7 +615,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(prstate->htsv[rootoffnum] != -1);
+ Assert(presult->htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -632,7 +638,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (presult->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -700,7 +706,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
break;
Assert(ItemIdIsNormal(lp));
- Assert(prstate->htsv[offnum] != -1);
+ Assert(presult->htsv[offnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, lp);
/*
@@ -720,7 +726,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
*/
tupdead = recent_dead = false;
- switch ((HTSV_Result) prstate->htsv[offnum])
+ switch ((HTSV_Result) presult->htsv[offnum])
{
case HEAPTUPLE_DEAD:
tupdead = true;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5e0a7422bb..25ed500b45 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1504,7 +1504,7 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
* tuples, update statistics, and move the relfrozenxid forward.
*
* presult is an output parameter initialized and updated exclusively by
- * lazy_scan_prune().
+ * lazy_scan_prune() and heap_page_prune().
*
* Prior to PostgreSQL 14 there were very rare cases where heap_page_prune()
* was allowed to disagree with our HeapTupleSatisfiesVacuum() call about
@@ -1514,12 +1514,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
* of complexity just so we could deal with tuples that were DEAD to VACUUM,
* but nevertheless were left with storage after pruning.
*
- * The approach we take now is to restart pruning when the race condition is
- * detected. This allows heap_page_prune() to prune the tuples inserted by
- * the now-aborted transaction. This is a little crude, but it guarantees
- * that any items that make it into the dead_items array are simple LP_DEAD
- * line pointers, and that every remaining item with tuple storage is
- * considered as a candidate for freezing.
+ * As of Postgres 17, we circumvent this problem altogether by resusing the
+ * result of heap_page_prune()'s visibility check. Without the second call to
+ * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
+ * disagreement. The tuple's TID won't be added to the array of dead tuple TIDs
+ * for vacuum, thus vacuum will still never be tasked with reaping a tuple with
+ * storage.
*/
static void
lazy_scan_prune(LVRelState *vacrel,
@@ -1532,14 +1532,11 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
- HeapTupleData tuple;
- HTSV_Result res;
int tuples_deleted,
tuples_frozen,
lpdead_items,
live_tuples,
recently_dead_tuples;
- int nnewlpdead;
HeapPageFreeze pagefrz;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1554,8 +1551,6 @@ lazy_scan_prune(LVRelState *vacrel,
*/
maxoff = PageGetMaxOffsetNumber(page);
-retry:
-
/* Initialize (or reset) page-level state */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
@@ -1578,23 +1573,19 @@ retry:
* that were deleted from indexes.
*/
tuples_deleted = heap_page_prune(rel, buf, vacrel->vistest,
- InvalidTransactionId, 0, &nnewlpdead,
+ InvalidTransactionId, 0,
+ presult,
&vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and check for tuples
* requiring freezing among remaining tuples with storage
*/
- presult->hastup = false;
- presult->has_lpdead_items = false;
- presult->all_visible = true;
- presult->all_frozen = true;
- presult->visibility_cutoff_xid = InvalidTransactionId;
-
for (offnum = FirstOffsetNumber;
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
+ HeapTupleHeader htup;
bool totally_frozen;
/*
@@ -1637,22 +1628,7 @@ retry:
Assert(ItemIdIsNormal(itemid));
- ItemPointerSet(&(tuple.t_self), blkno, offnum);
- tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
- tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(rel);
-
- /*
- * DEAD tuples are almost always pruned into LP_DEAD line pointers by
- * heap_page_prune(), but it's possible that the tuple state changed
- * since heap_page_prune() looked. Handle that here by restarting.
- * (See comments at the top of function for a full explanation.)
- */
- res = HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
- buf);
-
- if (unlikely(res == HEAPTUPLE_DEAD))
- goto retry;
+ htup = (HeapTupleHeader) PageGetItem(page, itemid);
/*
* The criteria for counting a tuple as live in this block need to
@@ -1673,7 +1649,7 @@ retry:
* (Cases where we bypass index vacuuming will violate this optimistic
* assumption, but the overall impact of that should be negligible.)
*/
- switch (res)
+ switch (presult->htsv[offnum])
{
case HEAPTUPLE_LIVE:
@@ -1695,7 +1671,7 @@ retry:
{
TransactionId xmin;
- if (!HeapTupleHeaderXminCommitted(tuple.t_data))
+ if (!HeapTupleHeaderXminCommitted(htup))
{
presult->all_visible = false;
break;
@@ -1705,7 +1681,7 @@ retry:
* The inserter definitely committed. But is it old enough
* that everyone sees it as committed?
*/
- xmin = HeapTupleHeaderGetXmin(tuple.t_data);
+ xmin = HeapTupleHeaderGetXmin(htup);
if (!TransactionIdPrecedes(xmin,
vacrel->cutoffs.OldestXmin))
{
@@ -1759,7 +1735,7 @@ retry:
presult->hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(tuple.t_data, &vacrel->cutoffs, &pagefrz,
+ if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
&frozen[tuples_frozen], &totally_frozen))
{
/* Save prepared freeze plan for later */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9a07828e5a..bf94bda634 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -199,6 +199,8 @@ typedef struct PruneResult
bool hastup; /* Page prevents rel truncation? */
bool has_lpdead_items; /* includes existing LP_DEAD items */
+ int nnewlpdead; /* Number of newly LP_DEAD items */
+
/*
* State describes the proper VM bit states to set for the page following
* pruning and freezing. all_visible implies !has_lpdead_items, but don't
@@ -207,6 +209,16 @@ typedef struct PruneResult
bool all_visible; /* Every item visible to all? */
bool all_frozen; /* provided all_visible is also true */
TransactionId visibility_cutoff_xid; /* For recovery conflicts */
+
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune() for details.
+ * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
+ * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ *
+ * Same indexing as ->marked.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
} PruneResult;
@@ -307,7 +319,7 @@ extern int heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
TransactionId old_snap_xmin,
TimestampTz old_snap_ts,
- int *nnewlpdead,
+ PruneResult *presult,
OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer,
OffsetNumber *redirected, int nredirected,
--
2.37.2
On Thu, Aug 31, 2023 at 5:39 AM David Geier <geidav.pg@gmail.com> wrote:
Regarding the 2nd patch (disclaimer: I'm not too familiar with that area
of the code): I don't completely understand why the retry loop is not
needed anymore and how you now detect/handle the possible race
condition? It can still happen that an aborting transaction changes the
state of a row after heap_page_prune() looked at that row. Would that
case now not be ignored?
Thanks for asking. I've updated the comment in the code and the commit
message about this, as it seems important to be clear.
Any inserting transaction which aborts after heap_page_prune()'s
visibility check will now be of no concern to lazy_scan_prune(). Since
we don't do the visibility check again, we won't find the tuple
HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to
the array of tuples for vacuum to reap. This does mean that we won't
reap and remove tuples whose inserting transactions abort right after
heap_page_prune()'s visibility check. But, this doesn't seem like an
issue. They may not be removed until the next vacuum, but ISTM it is
actually worse to pay the cost of re-pruning the whole page just to
clean up that one tuple. Maybe if that renders the page all visible
and we can mark it as such in the visibility map -- but that seems
like a relatively narrow use case.
- Melanie
On Thu, Aug 31, 2023 at 3:35 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
Any inserting transaction which aborts after heap_page_prune()'s
visibility check will now be of no concern to lazy_scan_prune(). Since
we don't do the visibility check again, we won't find the tuple
HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to
the array of tuples for vacuum to reap. This does mean that we won't
reap and remove tuples whose inserting transactions abort right after
heap_page_prune()'s visibility check. But, this doesn't seem like an
issue.
It's definitely not an issue.
The loop is essentially a hacky way of getting a consistent picture of
which tuples should be treated as HEAPTUPLE_DEAD, and which tuples
need to be left behind (consistent at the level of each page and each
HOT chain, at least). Making that explicit seems strictly better.
They may not be removed until the next vacuum, but ISTM it is
actually worse to pay the cost of re-pruning the whole page just to
clean up that one tuple. Maybe if that renders the page all visible
and we can mark it as such in the visibility map -- but that seems
like a relatively narrow use case.
The chances of actually hitting the retry are microscopic anyway. It
has nothing to do with making sure that dead tuples from aborted
tuples get removed for its own sake, or anything. Rather, the retry is
all about making sure that all TIDs that get removed from indexes can
only point to LP_DEAD stubs. Prior to Postgres 14, HEAPTUPLE_DEAD
tuples with storage would very occasionally be left behind, which made
life difficult in a bunch of other places -- for no good reason.
--
Peter Geoghegan
Hi,
On 9/1/23 03:25, Peter Geoghegan wrote:
On Thu, Aug 31, 2023 at 3:35 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:Any inserting transaction which aborts after heap_page_prune()'s
visibility check will now be of no concern to lazy_scan_prune(). Since
we don't do the visibility check again, we won't find the tuple
HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to
the array of tuples for vacuum to reap. This does mean that we won't
reap and remove tuples whose inserting transactions abort right after
heap_page_prune()'s visibility check. But, this doesn't seem like an
issue.It's definitely not an issue.
The loop is essentially a hacky way of getting a consistent picture of
which tuples should be treated as HEAPTUPLE_DEAD, and which tuples
need to be left behind (consistent at the level of each page and each
HOT chain, at least). Making that explicit seems strictly better.They may not be removed until the next vacuum, but ISTM it is
actually worse to pay the cost of re-pruning the whole page just to
clean up that one tuple. Maybe if that renders the page all visible
and we can mark it as such in the visibility map -- but that seems
like a relatively narrow use case.The chances of actually hitting the retry are microscopic anyway. It
has nothing to do with making sure that dead tuples from aborted
tuples get removed for its own sake, or anything. Rather, the retry is
all about making sure that all TIDs that get removed from indexes can
only point to LP_DEAD stubs. Prior to Postgres 14, HEAPTUPLE_DEAD
tuples with storage would very occasionally be left behind, which made
life difficult in a bunch of other places -- for no good reason.
That makes sense and seems like a much better compromise. Thanks for the
explanations. Please update the comment to document the corner case and
how we handle it.
--
David Geier
(ServiceNow)
On Thu, Aug 31, 2023 at 6:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
I have changed this.
I spent a bunch of time today looking at this, thinking maybe I could
commit it. But then I got cold feet.
With these patches applied, PruneResult ends up being declared in
heapam.h, with a comment that says /* State returned from pruning */.
But that comment isn't accurate. The two new members that get added to
the structure by 0002, namely nnewlpdead and htsv, are in fact state
that is returned from pruning. But the other 5 members aren't. They're
just initialized to constant values by pruning and then filled in for
real by the vacuum logic. That's extremely weird. It would be fine if
heap_page_prune() just grew a new output argument that only returned
the HTSV results, or perhaps it could make sense to bundle any
existing out parameters together into a struct and then add new things
to that struct instead of adding even more parameters to the function
itself. But there doesn't seem to be any good reason to muddle
together the new output parameters for heap_page_prune() with a bunch
of state that is currently internal to vacuumlazy.c.
I realize that the shape of the patches probably stems from the fact
that they started out life as part of a bigger patch set. But to be
committed independently, they need to be shaped in a way that makes
sense independently, and I don't think this qualifies. On the plus
side, it seems to me that it's probably not that much work to fix this
issue and that the result would likely be a smaller patch than what
you have now, which is something.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Sep 6, 2023 at 1:04 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Aug 31, 2023 at 6:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:I have changed this.
I spent a bunch of time today looking at this, thinking maybe I could
commit it. But then I got cold feet.With these patches applied, PruneResult ends up being declared in
heapam.h, with a comment that says /* State returned from pruning */.
But that comment isn't accurate. The two new members that get added to
the structure by 0002, namely nnewlpdead and htsv, are in fact state
that is returned from pruning. But the other 5 members aren't. They're
just initialized to constant values by pruning and then filled in for
real by the vacuum logic. That's extremely weird. It would be fine if
heap_page_prune() just grew a new output argument that only returned
the HTSV results, or perhaps it could make sense to bundle any
existing out parameters together into a struct and then add new things
to that struct instead of adding even more parameters to the function
itself. But there doesn't seem to be any good reason to muddle
together the new output parameters for heap_page_prune() with a bunch
of state that is currently internal to vacuumlazy.c.I realize that the shape of the patches probably stems from the fact
that they started out life as part of a bigger patch set. But to be
committed independently, they need to be shaped in a way that makes
sense independently, and I don't think this qualifies. On the plus
side, it seems to me that it's probably not that much work to fix this
issue and that the result would likely be a smaller patch than what
you have now, which is something.
Yeah, I think this is a fair concern. I have addressed it in the
attached patches.
I thought a lot about whether or not adding a PruneResult which
contains only the output parameters and result of heap_page_prune() is
annoying since we have so many other *Prune* data structures. I
decided it's not annoying. In some cases, four outputs don't merit a
new structure. In this case, I think it declutters the code a bit --
independent of any other patches I may be writing :)
- Melanie
Attachments:
v3-0001-Move-heap_page_prune-output-parameters-into-struc.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Move-heap_page_prune-output-parameters-into-struc.patchDownload
From 11f5d895e4efe7b3b3a7bbc58efbb4d6c40274c0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 6 Sep 2023 14:57:20 -0400
Subject: [PATCH v3 1/2] Move heap_page_prune output parameters into struct
Add PruneResult, a structure containing the output parameters and result
of heap_page_prune(). Reorganizing the results of heap_page_prune() into
a struct simplifies the function signature and provides a location for
future commits to store additional output parameters.
Note that heap_page_prune() no longer NULL checks off_loc, the current
tuple's offset in the line pointer array. It will never be NULL now that
it is a member of a required output parameter, PruneResult.
Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 47 +++++++++++-----------------
src/backend/access/heap/vacuumlazy.c | 19 +++++------
src/include/access/heapam.h | 20 ++++++++++--
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 45 insertions(+), 42 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 18193efa23..f0847795a7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -155,15 +155,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
- int ndeleted,
- nnewlpdead;
+ PruneResult presult;
- ndeleted = heap_page_prune(relation, buffer, vistest,
- &nnewlpdead, NULL);
+ heap_page_prune(relation, buffer, vistest, &presult);
/*
* Report the number of tuples reclaimed to pgstats. This is
- * ndeleted minus the number of newly-LP_DEAD-set items.
+ * presult.ndeleted minus the number of newly-LP_DEAD-set items.
*
* We derive the number of dead tuples like this to avoid totally
* forgetting about items that were set to LP_DEAD, since they
@@ -175,9 +173,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* tracks ndeleted, since it will set the same LP_DEAD items to
* LP_UNUSED separately.
*/
- if (ndeleted > nnewlpdead)
+ if (presult.ndeleted > presult.nnewlpdead)
pgstat_update_heap_dead_tuples(relation,
- ndeleted - nnewlpdead);
+ presult.ndeleted - presult.nnewlpdead);
}
/* And release buffer lock */
@@ -204,21 +202,15 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* (see heap_prune_satisfies_vacuum and
* HeapTupleSatisfiesVacuum).
*
- * Sets *nnewlpdead for caller, indicating the number of items that were
- * newly set LP_DEAD during prune operation.
- *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
- * Returns the number of tuples deleted from the page during this call.
+ * presult contains output parameters needed by callers such as the number of
+ * tuples removed and the number of line pointers newly marked LP_DEAD.
+ * heap_page_prune() is responsible for initializing it.
*/
-int
+void
heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
- int *nnewlpdead,
- OffsetNumber *off_loc)
+ PruneResult *presult)
{
- int ndeleted = 0;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
OffsetNumber offnum,
@@ -244,6 +236,10 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
+ presult->ndeleted = 0;
+ presult->nnewlpdead = 0;
+ presult->off_loc = InvalidOffsetNumber;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(prstate.rel);
@@ -290,8 +286,7 @@ heap_page_prune(Relation relation, Buffer buffer,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- if (off_loc)
- *off_loc = offnum;
+ presult->off_loc = offnum;
prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
buffer);
@@ -309,8 +304,7 @@ heap_page_prune(Relation relation, Buffer buffer,
continue;
/* see preceding loop */
- if (off_loc)
- *off_loc = offnum;
+ presult->off_loc = offnum;
/* Nothing to do if slot is empty or already dead */
itemid = PageGetItemId(page, offnum);
@@ -318,12 +312,11 @@ heap_page_prune(Relation relation, Buffer buffer,
continue;
/* Process this item or chain of items */
- ndeleted += heap_prune_chain(buffer, offnum, &prstate);
+ presult->ndeleted += heap_prune_chain(buffer, offnum, &prstate);
}
/* Clear the offset information once we have processed the given page. */
- if (off_loc)
- *off_loc = InvalidOffsetNumber;
+ presult->off_loc = InvalidOffsetNumber;
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -419,9 +412,7 @@ heap_page_prune(Relation relation, Buffer buffer,
END_CRIT_SECTION();
/* Record number of newly-set-LP_DEAD items for caller */
- *nnewlpdead = prstate.ndead;
-
- return ndeleted;
+ presult->nnewlpdead = prstate.ndead;
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1a05adfa61..a3c756ab2e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1544,12 +1544,11 @@ lazy_scan_prune(LVRelState *vacrel,
ItemId itemid;
HeapTupleData tuple;
HTSV_Result res;
- int tuples_deleted,
- tuples_frozen,
+ PruneResult presult;
+ int tuples_frozen,
lpdead_items,
live_tuples,
recently_dead_tuples;
- int nnewlpdead;
HeapPageFreeze pagefrz;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1572,7 +1571,6 @@ retry:
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
- tuples_deleted = 0;
tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
@@ -1581,15 +1579,14 @@ retry:
/*
* Prune all HOT-update chains in this page.
*
- * We count tuples removed by the pruning step as tuples_deleted. Its
- * final value can be thought of as the number of tuples that have been
- * deleted from the table. It should not be confused with lpdead_items;
+ * We count the number of tuples removed from the page by the pruning step
+ * in presult.ndeleted. It should not be confused with lpdead_items;
* lpdead_items's final value can be thought of as the number of tuples
* that were deleted from indexes.
*/
- tuples_deleted = heap_page_prune(rel, buf, vacrel->vistest,
- &nnewlpdead,
- &vacrel->offnum);
+ heap_page_prune(rel, buf, vacrel->vistest, &presult);
+
+ vacrel->offnum = presult.off_loc;
/*
* Now scan the page to collect LP_DEAD items and check for tuples
@@ -1929,7 +1926,7 @@ retry:
}
/* Finally, add page-local counts to whole-VACUUM counts */
- vacrel->tuples_deleted += tuples_deleted;
+ vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += tuples_frozen;
vacrel->lpdead_items += lpdead_items;
vacrel->live_tuples += live_tuples;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 6598c4d7d8..a2ea3be52b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -191,6 +191,21 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
+/*
+ * Per-page state returned from pruning
+ */
+typedef struct PruneResult
+{
+ int ndeleted; /* Number of tuples deleted from the page */
+ int nnewlpdead; /* Number of newly LP_DEAD items */
+
+ /*
+ * Current tuple's offset in the line pointer array, used for error
+ * callback.
+ */
+ OffsetNumber off_loc;
+} PruneResult;
+
/* ----------------
* function prototypes for heap access method
*
@@ -284,10 +299,9 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern int heap_page_prune(Relation relation, Buffer buffer,
+extern void heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
- int *nnewlpdead,
- OffsetNumber *off_loc);
+ PruneResult *presult);
extern void heap_page_prune_execute(Buffer buffer,
OffsetNumber *redirected, int nredirected,
OffsetNumber *nowdead, int ndead,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0656c94416..8c30c0b1b6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2150,6 +2150,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
+PruneResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.37.2
v3-0002-Reuse-heap_page_prune-tuple-visibility-statuses.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Reuse-heap_page_prune-tuple-visibility-statuses.patchDownload
From a02e108f9637443251f1b9d48fadbf339f710408 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 6 Sep 2023 16:54:41 -0400
Subject: [PATCH v3 2/2] Reuse heap_page_prune() tuple visibility statuses
heap_page_prune() obtains the HTSV_Result (tuple visibility status)
returned from HeapTupleSatisfiesVacuum() for every tuple on the page and
stores them in an array. By making this array available to
heap_page_prune()'s caller lazy_scan_prune(), we can avoid an additional
call to HeapTupleSatisfiesVacuum() when freezing the tuples and
recording LP_DEAD items for vacuum. This saves resources and eliminates
the possibility that vacuuming corrupted data results in a hang due to
endless retry looping.
This replaces the retry mechanism introduced in 8523492d4 to handle cases in
which a tuple's inserting transaction aborted between the visibility check in
heap_page_prune() and lazy_scan_prune()'s call to HeapTupleSatisfiesVacuum() --
rendering it dead but without a dead line pointer. We can instead reuse the
tuple's original visibility status, circumventing any disagreements.
Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 37 ++++++++++++------------
src/backend/access/heap/vacuumlazy.c | 42 ++++++++--------------------
src/include/access/heapam.h | 11 ++++++++
3 files changed, 41 insertions(+), 49 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f0847795a7..2ca10b6dcf 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -53,16 +53,6 @@ typedef struct
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
-
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
- *
- * Same indexing as ->marked.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
} PruneState;
/* Local functions */
@@ -71,6 +61,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
+ int8 *htsv,
PruneState *prstate);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
@@ -236,6 +227,10 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
+ /*
+ * presult->htsv is not initialized here because all ntuple spots in the
+ * array will be set either to a valid HTSV_Result value or -1.
+ */
presult->ndeleted = 0;
presult->nnewlpdead = 0;
presult->off_loc = InvalidOffsetNumber;
@@ -273,7 +268,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsNormal(itemid))
{
- prstate.htsv[offnum] = -1;
+ presult->htsv[offnum] = -1;
continue;
}
@@ -288,8 +283,8 @@ heap_page_prune(Relation relation, Buffer buffer,
*/
presult->off_loc = offnum;
- prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
}
/* Scan the page */
@@ -312,7 +307,8 @@ heap_page_prune(Relation relation, Buffer buffer,
continue;
/* Process this item or chain of items */
- presult->ndeleted += heap_prune_chain(buffer, offnum, &prstate);
+ presult->ndeleted += heap_prune_chain(buffer, offnum,
+ presult->htsv, &prstate);
}
/* Clear the offset information once we have processed the given page. */
@@ -440,6 +436,8 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
+ * Tuple visibility information is provided in htsv.
+ *
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
* chain. We also prune any RECENTLY_DEAD tuples preceding a DEAD tuple.
@@ -467,7 +465,8 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Returns the number of tuples (to be) deleted from the page.
*/
static int
-heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
+heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
+ int8 *htsv, PruneState *prstate)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -488,7 +487,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(prstate->htsv[rootoffnum] != -1);
+ Assert(htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -511,7 +510,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -579,7 +578,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
break;
Assert(ItemIdIsNormal(lp));
- Assert(prstate->htsv[offnum] != -1);
+ Assert(htsv[offnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, lp);
/*
@@ -599,7 +598,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
*/
tupdead = recent_dead = false;
- switch ((HTSV_Result) prstate->htsv[offnum])
+ switch ((HTSV_Result) htsv[offnum])
{
case HEAPTUPLE_DEAD:
tupdead = true;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a3c756ab2e..ae813483a5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1524,12 +1524,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
* of complexity just so we could deal with tuples that were DEAD to VACUUM,
* but nevertheless were left with storage after pruning.
*
- * The approach we take now is to restart pruning when the race condition is
- * detected. This allows heap_page_prune() to prune the tuples inserted by
- * the now-aborted transaction. This is a little crude, but it guarantees
- * that any items that make it into the dead_items array are simple LP_DEAD
- * line pointers, and that every remaining item with tuple storage is
- * considered as a candidate for freezing.
+ * As of Postgres 17, we circumvent this problem altogether by reusing the
+ * result of heap_page_prune()'s visibility check. Without the second call to
+ * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
+ * disagreement. The tuple's TID won't be added to the array of dead tuple TIDs
+ * for vacuum, thus vacuum will never be tasked with reaping a tuple with
+ * storage.
*/
static void
lazy_scan_prune(LVRelState *vacrel,
@@ -1542,8 +1542,6 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
- HeapTupleData tuple;
- HTSV_Result res;
PruneResult presult;
int tuples_frozen,
lpdead_items,
@@ -1563,8 +1561,6 @@ lazy_scan_prune(LVRelState *vacrel,
*/
maxoff = PageGetMaxOffsetNumber(page);
-retry:
-
/* Initialize (or reset) page-level state */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
@@ -1602,6 +1598,7 @@ retry:
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
+ HeapTupleHeader htup;
bool totally_frozen;
/*
@@ -1644,22 +1641,7 @@ retry:
Assert(ItemIdIsNormal(itemid));
- ItemPointerSet(&(tuple.t_self), blkno, offnum);
- tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
- tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(rel);
-
- /*
- * DEAD tuples are almost always pruned into LP_DEAD line pointers by
- * heap_page_prune(), but it's possible that the tuple state changed
- * since heap_page_prune() looked. Handle that here by restarting.
- * (See comments at the top of function for a full explanation.)
- */
- res = HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
- buf);
-
- if (unlikely(res == HEAPTUPLE_DEAD))
- goto retry;
+ htup = (HeapTupleHeader) PageGetItem(page, itemid);
/*
* The criteria for counting a tuple as live in this block need to
@@ -1680,7 +1662,7 @@ retry:
* (Cases where we bypass index vacuuming will violate this optimistic
* assumption, but the overall impact of that should be negligible.)
*/
- switch (res)
+ switch ((HTSV_Result) presult.htsv[offnum])
{
case HEAPTUPLE_LIVE:
@@ -1702,7 +1684,7 @@ retry:
{
TransactionId xmin;
- if (!HeapTupleHeaderXminCommitted(tuple.t_data))
+ if (!HeapTupleHeaderXminCommitted(htup))
{
prunestate->all_visible = false;
break;
@@ -1712,7 +1694,7 @@ retry:
* The inserter definitely committed. But is it old enough
* that everyone sees it as committed?
*/
- xmin = HeapTupleHeaderGetXmin(tuple.t_data);
+ xmin = HeapTupleHeaderGetXmin(htup);
if (!TransactionIdPrecedes(xmin,
vacrel->cutoffs.OldestXmin))
{
@@ -1766,7 +1748,7 @@ retry:
prunestate->hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(tuple.t_data, &vacrel->cutoffs, &pagefrz,
+ if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
&frozen[tuples_frozen], &totally_frozen))
{
/* Save prepared freeze plan for later */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2ea3be52b..1fe8354975 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -204,6 +204,17 @@ typedef struct PruneResult
* callback.
*/
OffsetNumber off_loc;
+
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune() for details.
+ * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
+ * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
} PruneResult;
/* ----------------
--
2.37.2
On Wed, Sep 6, 2023 at 5:21 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
Yeah, I think this is a fair concern. I have addressed it in the
attached patches.I thought a lot about whether or not adding a PruneResult which
contains only the output parameters and result of heap_page_prune() is
annoying since we have so many other *Prune* data structures. I
decided it's not annoying. In some cases, four outputs don't merit a
new structure. In this case, I think it declutters the code a bit --
independent of any other patches I may be writing :)
I took a look at 0001 and I think that it's incorrect. In the existing
code, *off_loc gets updated before each call to
heap_prune_satisfies_vacuum(). This means that if an error occurs in
heap_prune_satisfies_vacuum(), *off_loc will as of that moment contain
the relevant offset number. In your version, the relevant offset
number will only be stored in some local structure to which the caller
doesn't yet have access. The difference is meaningful. lazy_scan_prune
passes off_loc as vacrel->offnum, which means that if an error
happens, vacrel->offnum will have the right value, and so when the
error context callback installed by heap_vacuum_rel() fires, namely
vacuum_error_callback(), it can look at vacrel->offnum and know where
the error happened. With your patch, I think that would no longer
work.
I haven't run the regression suite with 0001 applied. I'm guessing
that you did, and that they passed, which if true means that we don't
have a test for this, which is a shame, although writing such a test
might be a bit tricky. If there is a test case for this and you didn't
run it, woops. This is also why I think it's *extremely* important for
the header comment of a function that takes pointer parameters to
document the semantics of those pointers. Normally they are in
parameters or out parameters or in-out parameters, but here it's
something even more complicated. The existing header comment says
"off_loc is the offset location required by the caller to use in error
callback," which I didn't really understand until I actually looked at
what the code is doing, so I consider that somebody could've done a
better job writing this comment, but in theory you could've also
noticed that, at least AFAICS, there's no way for the function to
return with *off_loc set to anything other than InvalidOffsetNumber.
That means that the statements which set *off_loc to other values must
have some other purpose.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Sep 7, 2023 at 1:37 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Sep 6, 2023 at 5:21 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:Yeah, I think this is a fair concern. I have addressed it in the
attached patches.I thought a lot about whether or not adding a PruneResult which
contains only the output parameters and result of heap_page_prune() is
annoying since we have so many other *Prune* data structures. I
decided it's not annoying. In some cases, four outputs don't merit a
new structure. In this case, I think it declutters the code a bit --
independent of any other patches I may be writing :)I took a look at 0001 and I think that it's incorrect. In the existing
code, *off_loc gets updated before each call to
heap_prune_satisfies_vacuum(). This means that if an error occurs in
heap_prune_satisfies_vacuum(), *off_loc will as of that moment contain
the relevant offset number. In your version, the relevant offset
number will only be stored in some local structure to which the caller
doesn't yet have access. The difference is meaningful. lazy_scan_prune
passes off_loc as vacrel->offnum, which means that if an error
happens, vacrel->offnum will have the right value, and so when the
error context callback installed by heap_vacuum_rel() fires, namely
vacuum_error_callback(), it can look at vacrel->offnum and know where
the error happened. With your patch, I think that would no longer
work.
You are right. That is a major problem. Thank you for finding that. I
was able to confirm the breakage by patching in an error to
heap_page_prune() after we have set off_loc and confirming that the
error context has the offset in master and is missing it with my patch
applied.
I can fix it by changing the type of PruneResult->off_loc to be an
OffsetNumber pointer. This does mean that PruneResult will be
initialized partially by heap_page_prune() callers. I wonder if you
think that undermines the case for making a new struct.
I still want to eliminate the NULL check of off_loc in
heap_page_prune() by making it a required parameter. Even though
on-access pruning does not have an error callback mechanism which uses
the offset, it seems better to have a pointless local variable in
heap_page_prune_opt() than to do a NULL check for every tuple pruned.
I haven't run the regression suite with 0001 applied. I'm guessing
that you did, and that they passed, which if true means that we don't
have a test for this, which is a shame, although writing such a test
might be a bit tricky. If there is a test case for this and you didn't
run it, woops.
There is no test coverage for the vacuum error callback context
currently (tests passed for me). I looked into how we might add such a
test. First, I investigated what kind of errors might occur during
heap_prune_satisfies_vacuum(). Some of the multixact functions called
by HeapTupleSatisfiesVacuumHorizon() could error out -- for example
GetMultiXactIdMembers(). It seems difficult to trigger the errors in
GetMultiXactIdMembers(), as we would have to cause wraparound. It
would be even more difficult to ensure that we hit those specific
errors from a call stack containing heap_prune_satisfies_vacuum(). As
such, I'm not sure I can think of a way to protect future developers
from repeating my mistake--apart from improving the comment like you
mentioned.
- Melanie
On Thu, Sep 7, 2023 at 3:10 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
I can fix it by changing the type of PruneResult->off_loc to be an
OffsetNumber pointer. This does mean that PruneResult will be
initialized partially by heap_page_prune() callers. I wonder if you
think that undermines the case for making a new struct.
I think that it undermines the case for including that particular
argument in the struct. It's not really a Prune*Result* if the caller
initializes it in part. It seems fairly reasonable to still have a
PruneResult struct for the other stuff, though, at least to me. How do
you see it?
(One could also argue that this is a somewhat more byzantine way of
doing error reporting than would be desirable, but fixing that problem
doesn't seem too straightforward so perhaps it's prudent to leave it
well enough alone.)
I still want to eliminate the NULL check of off_loc in
heap_page_prune() by making it a required parameter. Even though
on-access pruning does not have an error callback mechanism which uses
the offset, it seems better to have a pointless local variable in
heap_page_prune_opt() than to do a NULL check for every tuple pruned.
It doesn't seem important to me unless it improves performance. If
it's just stylistic, I don't object, but I also don't really see a
reason to care.
I haven't run the regression suite with 0001 applied. I'm guessing
that you did, and that they passed, which if true means that we don't
have a test for this, which is a shame, although writing such a test
might be a bit tricky. If there is a test case for this and you didn't
run it, woops.There is no test coverage for the vacuum error callback context
currently (tests passed for me). I looked into how we might add such a
test. First, I investigated what kind of errors might occur during
heap_prune_satisfies_vacuum(). Some of the multixact functions called
by HeapTupleSatisfiesVacuumHorizon() could error out -- for example
GetMultiXactIdMembers(). It seems difficult to trigger the errors in
GetMultiXactIdMembers(), as we would have to cause wraparound. It
would be even more difficult to ensure that we hit those specific
errors from a call stack containing heap_prune_satisfies_vacuum(). As
such, I'm not sure I can think of a way to protect future developers
from repeating my mistake--apart from improving the comment like you
mentioned.
004_verify_heapam.pl has some tests that intentionally corrupt pages
and then use pg_amcheck to detect the corruption. Such an approach
could probably also be used here. But it's a pain to get such tests
right, because any change to the page format due to endianness,
different block size, or whatever can make carefully-written tests go
boom.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Sep 7, 2023 at 3:30 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 7, 2023 at 3:10 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:I can fix it by changing the type of PruneResult->off_loc to be an
OffsetNumber pointer. This does mean that PruneResult will be
initialized partially by heap_page_prune() callers. I wonder if you
think that undermines the case for making a new struct.I think that it undermines the case for including that particular
argument in the struct. It's not really a Prune*Result* if the caller
initializes it in part. It seems fairly reasonable to still have a
PruneResult struct for the other stuff, though, at least to me. How do
you see it?
Yes, I think off_loc probably didn't belong in PruneResult to begin with.
It is inherently not a result of pruning since it will only be used in
the event that pruning doesn't complete (it errors out).
In the attached v4 patch set, I have both PruneResult and off_loc as
parameters to heap_page_prune(). I have also added a separate commit
which adds comments both above heap_page_prune()'s call site in
lazy_scan_prune() and in the heap_page_prune() function header
clarifying the various points we discussed.
I still want to eliminate the NULL check of off_loc in
heap_page_prune() by making it a required parameter. Even though
on-access pruning does not have an error callback mechanism which uses
the offset, it seems better to have a pointless local variable in
heap_page_prune_opt() than to do a NULL check for every tuple pruned.It doesn't seem important to me unless it improves performance. If
it's just stylistic, I don't object, but I also don't really see a
reason to care.
I did some performance testing but, as expected, I couldn't concoct a
scenario where the overhead was noticeable in a profile. So much else
is happening in that code, the NULL check basically doesn't matter
(even though it isn't optimized out).
I mostly wanted to remove the NULL checks because I found them
distracting (so, a stylistic complaint). However, upon further
reflection, I actually think it is better if heap_page_prune_opt()
passes NULL. heap_page_prune() has no error callback mechanism that
could use it, and passing a valid value implies otherwise. Also, as
you said, off_loc will always be InvalidOffsetNumber when
heap_page_prune() returns normally, so having heap_page_prune_opt()
pass a dummy value might actually be more confusing for future
programmers.
I haven't run the regression suite with 0001 applied. I'm guessing
that you did, and that they passed, which if true means that we don't
have a test for this, which is a shame, although writing such a test
might be a bit tricky. If there is a test case for this and you didn't
run it, woops.There is no test coverage for the vacuum error callback context
currently (tests passed for me). I looked into how we might add such a
test. First, I investigated what kind of errors might occur during
heap_prune_satisfies_vacuum(). Some of the multixact functions called
by HeapTupleSatisfiesVacuumHorizon() could error out -- for example
GetMultiXactIdMembers(). It seems difficult to trigger the errors in
GetMultiXactIdMembers(), as we would have to cause wraparound. It
would be even more difficult to ensure that we hit those specific
errors from a call stack containing heap_prune_satisfies_vacuum(). As
such, I'm not sure I can think of a way to protect future developers
from repeating my mistake--apart from improving the comment like you
mentioned.004_verify_heapam.pl has some tests that intentionally corrupt pages
and then use pg_amcheck to detect the corruption. Such an approach
could probably also be used here. But it's a pain to get such tests
right, because any change to the page format due to endianness,
different block size, or whatever can make carefully-written tests go
boom.
Cool! I hadn't examined how these tests work until now. I took a stab
at writing a test in the existing 0004_verify_heapam.pl. The simplest
thing would be if we could just vacuum the corrupted table ("test")
after running pg_amcheck and compare the error context to our
expectation. I found that this didn't work, though. In an assert
build, vacuum trips an assert before it hits an error while vacuuming
a corrupted tuple in the "test" table. There might be a way of
modifying the existing test code to avoid this, but I tried the next
easiest thing -- corrupting a tuple in the other existing table in the
file, "junk". This is possible to do, but we have to repeat a lot of
the setup code done for the "test" table to get the line pointer array
and loop through and corrupt a tuple. In order to do this well, I
would want to refactor some of the boilerplate into a function. There
are other fiddly bits as well that I needed to consider. I think a
test like this could be useful coverage of the some of the possible
errors that could happen in heap_prune_satisfies_vacuum(), but it
definitely isn't coverage of pg_amcheck (and thus shouldn't go in that
file) and a whole new test which spins up a Postgres to cover
vacuum_error_callback() seemed like a bit much.
- Melanie
Attachments:
v4-0002-Move-heap_page_prune-output-parameters-into-struc.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Move-heap_page_prune-output-parameters-into-struc.patchDownload
From e986940e546171d1f1d06f62a101d695a8481e7a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 6 Sep 2023 14:57:20 -0400
Subject: [PATCH v4 2/3] Move heap_page_prune output parameters into struct
Add PruneResult, a structure containing the output parameters and result
of heap_page_prune(). Reorganizing the results of heap_page_prune() into
a struct simplifies the function signature and provides a location for
future commits to store additional output parameters.
Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 33 +++++++++++++---------------
src/backend/access/heap/vacuumlazy.c | 17 +++++---------
src/include/access/heapam.h | 13 +++++++++--
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 33 insertions(+), 31 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 392b54f093..5ac286e152 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -155,15 +155,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
- int ndeleted,
- nnewlpdead;
+ PruneResult presult;
- ndeleted = heap_page_prune(relation, buffer, vistest,
- &nnewlpdead, NULL);
+ heap_page_prune(relation, buffer, vistest, &presult, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
- * ndeleted minus the number of newly-LP_DEAD-set items.
+ * presult.ndeleted minus the number of newly-LP_DEAD-set items.
*
* We derive the number of dead tuples like this to avoid totally
* forgetting about items that were set to LP_DEAD, since they
@@ -175,9 +173,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* tracks ndeleted, since it will set the same LP_DEAD items to
* LP_UNUSED separately.
*/
- if (ndeleted > nnewlpdead)
+ if (presult.ndeleted > presult.nnewlpdead)
pgstat_update_heap_dead_tuples(relation,
- ndeleted - nnewlpdead);
+ presult.ndeleted - presult.nnewlpdead);
}
/* And release buffer lock */
@@ -204,24 +202,22 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* (see heap_prune_satisfies_vacuum and
* HeapTupleSatisfiesVacuum).
*
- * Sets *nnewlpdead for caller, indicating the number of items that were
- * newly set LP_DEAD during prune operation.
+ * presult contains output parameters needed by callers such as the number of
+ * tuples removed and the number of line pointers newly marked LP_DEAD.
+ * heap_page_prune() is responsible for initializing it.
*
* off_loc is the current offset into the line pointer array while pruning.
* This is used by vacuum to populate the error context message. On-access
* pruning has no such callback mechanism for populating the error context, so
* it passes NULL. When provided by the caller, off_loc is set exclusively by
* heap_page_prune().
- *
- * Returns the number of tuples deleted from the page during this call.
*/
-int
+void
heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
- int *nnewlpdead,
+ PruneResult *presult,
OffsetNumber *off_loc)
{
- int ndeleted = 0;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
OffsetNumber offnum,
@@ -247,6 +243,9 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
+ presult->ndeleted = 0;
+ presult->nnewlpdead = 0;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(prstate.rel);
@@ -321,7 +320,7 @@ heap_page_prune(Relation relation, Buffer buffer,
continue;
/* Process this item or chain of items */
- ndeleted += heap_prune_chain(buffer, offnum, &prstate);
+ presult->ndeleted += heap_prune_chain(buffer, offnum, &prstate);
}
/* Clear the offset information once we have processed the given page. */
@@ -422,9 +421,7 @@ heap_page_prune(Relation relation, Buffer buffer,
END_CRIT_SECTION();
/* Record number of newly-set-LP_DEAD items for caller */
- *nnewlpdead = prstate.ndead;
-
- return ndeleted;
+ presult->nnewlpdead = prstate.ndead;
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 102cc97358..7ead9cfe9d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1544,12 +1544,11 @@ lazy_scan_prune(LVRelState *vacrel,
ItemId itemid;
HeapTupleData tuple;
HTSV_Result res;
- int tuples_deleted,
- tuples_frozen,
+ PruneResult presult;
+ int tuples_frozen,
lpdead_items,
live_tuples,
recently_dead_tuples;
- int nnewlpdead;
HeapPageFreeze pagefrz;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1572,7 +1571,6 @@ retry:
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
- tuples_deleted = 0;
tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
@@ -1581,9 +1579,8 @@ retry:
/*
* Prune all HOT-update chains in this page.
*
- * We count tuples removed by the pruning step as tuples_deleted. Its
- * final value can be thought of as the number of tuples that have been
- * deleted from the table. It should not be confused with lpdead_items;
+ * We count the number of tuples removed from the page by the pruning step
+ * in presult.ndeleted. It should not be confused with lpdead_items;
* lpdead_items's final value can be thought of as the number of tuples
* that were deleted from indexes.
*
@@ -1591,9 +1588,7 @@ retry:
* current offset when populating the error context message, so it is
* imperative that we pass its location to heap_page_prune.
*/
- tuples_deleted = heap_page_prune(rel, buf, vacrel->vistest,
- &nnewlpdead,
- &vacrel->offnum);
+ heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and check for tuples
@@ -1933,7 +1928,7 @@ retry:
}
/* Finally, add page-local counts to whole-VACUUM counts */
- vacrel->tuples_deleted += tuples_deleted;
+ vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += tuples_frozen;
vacrel->lpdead_items += lpdead_items;
vacrel->live_tuples += live_tuples;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 6598c4d7d8..2d3f149e4f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -191,6 +191,15 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
+/*
+ * Per-page state returned from pruning
+ */
+typedef struct PruneResult
+{
+ int ndeleted; /* Number of tuples deleted from the page */
+ int nnewlpdead; /* Number of newly LP_DEAD items */
+} PruneResult;
+
/* ----------------
* function prototypes for heap access method
*
@@ -284,9 +293,9 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern int heap_page_prune(Relation relation, Buffer buffer,
+extern void heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
- int *nnewlpdead,
+ PruneResult *presult,
OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer,
OffsetNumber *redirected, int nredirected,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0656c94416..8c30c0b1b6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2150,6 +2150,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
+PruneResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.37.2
v4-0003-Reuse-heap_page_prune-tuple-visibility-statuses.patchtext/x-patch; charset=US-ASCII; name=v4-0003-Reuse-heap_page_prune-tuple-visibility-statuses.patchDownload
From 1a621864420cd17db1bef0546990d8298e3a3d34 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 6 Sep 2023 16:54:41 -0400
Subject: [PATCH v4 3/3] Reuse heap_page_prune() tuple visibility statuses
heap_page_prune() obtains the HTSV_Result (tuple visibility status)
returned from HeapTupleSatisfiesVacuum() for every tuple on the page and
stores them in an array. By making this array available to
heap_page_prune()'s caller lazy_scan_prune(), we can avoid an additional
call to HeapTupleSatisfiesVacuum() when freezing the tuples and
recording LP_DEAD items for vacuum. This saves resources and eliminates
the possibility that vacuuming corrupted data results in a hang due to
endless retry looping.
This replaces the retry mechanism introduced in 8523492d4 to handle cases in
which a tuple's inserting transaction aborted between the visibility check in
heap_page_prune() and lazy_scan_prune()'s call to HeapTupleSatisfiesVacuum() --
rendering it dead but without a dead line pointer. We can instead reuse the
tuple's original visibility status, circumventing any disagreements.
Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 37 ++++++++++++------------
src/backend/access/heap/vacuumlazy.c | 42 ++++++++--------------------
src/include/access/heapam.h | 11 ++++++++
3 files changed, 41 insertions(+), 49 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5ac286e152..256039c6be 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -53,16 +53,6 @@ typedef struct
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
-
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
- *
- * Same indexing as ->marked.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
} PruneState;
/* Local functions */
@@ -71,6 +61,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
+ int8 *htsv,
PruneState *prstate);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
@@ -243,6 +234,10 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
+ /*
+ * presult->htsv is not initialized here because all ntuple spots in the
+ * array will be set either to a valid HTSV_Result value or -1.
+ */
presult->ndeleted = 0;
presult->nnewlpdead = 0;
@@ -279,7 +274,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsNormal(itemid))
{
- prstate.htsv[offnum] = -1;
+ presult->htsv[offnum] = -1;
continue;
}
@@ -295,8 +290,8 @@ heap_page_prune(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
- prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
}
/* Scan the page */
@@ -320,7 +315,8 @@ heap_page_prune(Relation relation, Buffer buffer,
continue;
/* Process this item or chain of items */
- presult->ndeleted += heap_prune_chain(buffer, offnum, &prstate);
+ presult->ndeleted += heap_prune_chain(buffer, offnum,
+ presult->htsv, &prstate);
}
/* Clear the offset information once we have processed the given page. */
@@ -449,6 +445,8 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
+ * Tuple visibility information is provided in htsv.
+ *
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
* chain. We also prune any RECENTLY_DEAD tuples preceding a DEAD tuple.
@@ -476,7 +474,8 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Returns the number of tuples (to be) deleted from the page.
*/
static int
-heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
+heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
+ int8 *htsv, PruneState *prstate)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -497,7 +496,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(prstate->htsv[rootoffnum] != -1);
+ Assert(htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -520,7 +519,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -588,7 +587,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
break;
Assert(ItemIdIsNormal(lp));
- Assert(prstate->htsv[offnum] != -1);
+ Assert(htsv[offnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, lp);
/*
@@ -608,7 +607,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
*/
tupdead = recent_dead = false;
- switch ((HTSV_Result) prstate->htsv[offnum])
+ switch ((HTSV_Result) htsv[offnum])
{
case HEAPTUPLE_DEAD:
tupdead = true;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7ead9cfe9d..6c328f3ed5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1524,12 +1524,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
* of complexity just so we could deal with tuples that were DEAD to VACUUM,
* but nevertheless were left with storage after pruning.
*
- * The approach we take now is to restart pruning when the race condition is
- * detected. This allows heap_page_prune() to prune the tuples inserted by
- * the now-aborted transaction. This is a little crude, but it guarantees
- * that any items that make it into the dead_items array are simple LP_DEAD
- * line pointers, and that every remaining item with tuple storage is
- * considered as a candidate for freezing.
+ * As of Postgres 17, we circumvent this problem altogether by reusing the
+ * result of heap_page_prune()'s visibility check. Without the second call to
+ * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
+ * disagreement. The tuple's TID won't be added to the array of dead tuple TIDs
+ * for vacuum, thus vacuum will never be tasked with reaping a tuple with
+ * storage.
*/
static void
lazy_scan_prune(LVRelState *vacrel,
@@ -1542,8 +1542,6 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
- HeapTupleData tuple;
- HTSV_Result res;
PruneResult presult;
int tuples_frozen,
lpdead_items,
@@ -1563,8 +1561,6 @@ lazy_scan_prune(LVRelState *vacrel,
*/
maxoff = PageGetMaxOffsetNumber(page);
-retry:
-
/* Initialize (or reset) page-level state */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
@@ -1604,6 +1600,7 @@ retry:
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
+ HeapTupleHeader htup;
bool totally_frozen;
/*
@@ -1646,22 +1643,7 @@ retry:
Assert(ItemIdIsNormal(itemid));
- ItemPointerSet(&(tuple.t_self), blkno, offnum);
- tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
- tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(rel);
-
- /*
- * DEAD tuples are almost always pruned into LP_DEAD line pointers by
- * heap_page_prune(), but it's possible that the tuple state changed
- * since heap_page_prune() looked. Handle that here by restarting.
- * (See comments at the top of function for a full explanation.)
- */
- res = HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
- buf);
-
- if (unlikely(res == HEAPTUPLE_DEAD))
- goto retry;
+ htup = (HeapTupleHeader) PageGetItem(page, itemid);
/*
* The criteria for counting a tuple as live in this block need to
@@ -1682,7 +1664,7 @@ retry:
* (Cases where we bypass index vacuuming will violate this optimistic
* assumption, but the overall impact of that should be negligible.)
*/
- switch (res)
+ switch ((HTSV_Result) presult.htsv[offnum])
{
case HEAPTUPLE_LIVE:
@@ -1704,7 +1686,7 @@ retry:
{
TransactionId xmin;
- if (!HeapTupleHeaderXminCommitted(tuple.t_data))
+ if (!HeapTupleHeaderXminCommitted(htup))
{
prunestate->all_visible = false;
break;
@@ -1714,7 +1696,7 @@ retry:
* The inserter definitely committed. But is it old enough
* that everyone sees it as committed?
*/
- xmin = HeapTupleHeaderGetXmin(tuple.t_data);
+ xmin = HeapTupleHeaderGetXmin(htup);
if (!TransactionIdPrecedes(xmin,
vacrel->cutoffs.OldestXmin))
{
@@ -1768,7 +1750,7 @@ retry:
prunestate->hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(tuple.t_data, &vacrel->cutoffs, &pagefrz,
+ if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
&frozen[tuples_frozen], &totally_frozen))
{
/* Save prepared freeze plan for later */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2d3f149e4f..864ae561c4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,6 +198,17 @@ typedef struct PruneResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune() for details.
+ * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
+ * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
} PruneResult;
/* ----------------
--
2.37.2
v4-0001-Clarify-usage-of-heap_page_prune-parameter-off_lo.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Clarify-usage-of-heap_page_prune-parameter-off_lo.patchDownload
From 661d9837ba34552fa301dc17b1fd74dd833aec72 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Thu, 7 Sep 2023 17:35:46 -0400
Subject: [PATCH v4 1/3] Clarify usage of heap_page_prune parameter off_loc
heap_page_prune() takes an optional parameter, off_loc, where it saves
the current offset while pruning. Vacuum passes the address of
vacrel->offnum, which is used by vacuum_error_callback() to populate the
error context message. On-access pruning has no such callback mechanism,
so it passes NULL. When off_loc is not provided the error message
context simply omits the offset. Because vacrel->offnum is specifically
used by vacuum_error_callback(), it is required that it be passed to
heap_page_prune() for the error message to be correctly populated.
Clarify this requirement in a comment in lazy_scan_prune() before
calling heap_page_prune().
While we are at it, provide some additional detail about off_loc in
heap_page_prune()'s function header comment.
Discussion: https://postgr.es/m/CA%2BTgmoZ8E7DXzAZkV%3DRF9tfJkQFH%3Da3KkjQhYNY9W26Jh7AN%2Bw%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 7 +++++--
src/backend/access/heap/vacuumlazy.c | 4 ++++
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 18193efa23..392b54f093 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -207,8 +207,11 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* Sets *nnewlpdead for caller, indicating the number of items that were
* newly set LP_DEAD during prune operation.
*
- * off_loc is the offset location required by the caller to use in error
- * callback.
+ * off_loc is the current offset into the line pointer array while pruning.
+ * This is used by vacuum to populate the error context message. On-access
+ * pruning has no such callback mechanism for populating the error context, so
+ * it passes NULL. When provided by the caller, off_loc is set exclusively by
+ * heap_page_prune().
*
* Returns the number of tuples deleted from the page during this call.
*/
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1a05adfa61..102cc97358 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1586,6 +1586,10 @@ retry:
* deleted from the table. It should not be confused with lpdead_items;
* lpdead_items's final value can be thought of as the number of tuples
* that were deleted from indexes.
+ *
+ * vacuum_error_callback() looks specifically at vacrel->offnum for the
+ * current offset when populating the error context message, so it is
+ * imperative that we pass its location to heap_page_prune.
*/
tuples_deleted = heap_page_prune(rel, buf, vacrel->vistest,
&nnewlpdead,
--
2.37.2
On Thu, Sep 7, 2023 at 6:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
I mostly wanted to remove the NULL checks because I found them
distracting (so, a stylistic complaint). However, upon further
reflection, I actually think it is better if heap_page_prune_opt()
passes NULL. heap_page_prune() has no error callback mechanism that
could use it, and passing a valid value implies otherwise. Also, as
you said, off_loc will always be InvalidOffsetNumber when
heap_page_prune() returns normally, so having heap_page_prune_opt()
pass a dummy value might actually be more confusing for future
programmers.
I'll look at the new patches more next week, but I wanted to comment
on this point. I think this is kind of six of one, half a dozen of the
other. It's not that hard to spot a variable that's only used in a
function call and never initialized beforehand or used afterward, and
if someone really feels the need to hammer home the point, they could
always name it dummy or dummy_loc or whatever. So this point doesn't
really carry a lot of weight with me. I actually think that the
proposed change is probably better, but it seems like such a minor
improvement that I get slightly reluctant to make it, only because
churning the source code for arguable points sometimes annoys other
developers.
But I also had the thought that maybe it wouldn't be such a terrible
idea if heap_page_prune_opt() actually used off_loc for some error
reporting goodness. I mean, if HOT pruning fails, and we don't get the
detail as to which tuple caused the failure, we can always run VACUUM
and it will give us that information, assuming of course that the same
failure happens again. But is there any reason why HOT pruning
shouldn't include that error detail? If it did, then off_loc would be
passed by all callers, at which point we surely would want to get rid
of the branches.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Sep 8, 2023 at 11:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 7, 2023 at 6:23 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:I mostly wanted to remove the NULL checks because I found them
distracting (so, a stylistic complaint). However, upon further
reflection, I actually think it is better if heap_page_prune_opt()
passes NULL. heap_page_prune() has no error callback mechanism that
could use it, and passing a valid value implies otherwise. Also, as
you said, off_loc will always be InvalidOffsetNumber when
heap_page_prune() returns normally, so having heap_page_prune_opt()
pass a dummy value might actually be more confusing for future
programmers.I'll look at the new patches more next week, but I wanted to comment
on this point. I think this is kind of six of one, half a dozen of the
other. It's not that hard to spot a variable that's only used in a
function call and never initialized beforehand or used afterward, and
if someone really feels the need to hammer home the point, they could
always name it dummy or dummy_loc or whatever. So this point doesn't
really carry a lot of weight with me. I actually think that the
proposed change is probably better, but it seems like such a minor
improvement that I get slightly reluctant to make it, only because
churning the source code for arguable points sometimes annoys other
developers.But I also had the thought that maybe it wouldn't be such a terrible
idea if heap_page_prune_opt() actually used off_loc for some error
reporting goodness. I mean, if HOT pruning fails, and we don't get the
detail as to which tuple caused the failure, we can always run VACUUM
and it will give us that information, assuming of course that the same
failure happens again. But is there any reason why HOT pruning
shouldn't include that error detail? If it did, then off_loc would be
passed by all callers, at which point we surely would want to get rid
of the branches.
This is a good idea. I will work on a separate patch set to add an
error context callback for on-access HOT pruning.
- Melanie
Hi,
On 2023-09-07 18:23:22 -0400, Melanie Plageman wrote:
From e986940e546171d1f1d06f62a101d695a8481e7a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 6 Sep 2023 14:57:20 -0400
Subject: [PATCH v4 2/3] Move heap_page_prune output parameters into structAdd PruneResult, a structure containing the output parameters and result
of heap_page_prune(). Reorganizing the results of heap_page_prune() into
a struct simplifies the function signature and provides a location for
future commits to store additional output parameters.Discussion: /messages/by-id/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA@mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 33 +++++++++++++---------------
src/backend/access/heap/vacuumlazy.c | 17 +++++---------
src/include/access/heapam.h | 13 +++++++++--
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 33 insertions(+), 31 deletions(-)diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index 392b54f093..5ac286e152 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -155,15 +155,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer) */ if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree) { - int ndeleted, - nnewlpdead; + PruneResult presult;- ndeleted = heap_page_prune(relation, buffer, vistest, - &nnewlpdead, NULL); + heap_page_prune(relation, buffer, vistest, &presult, NULL);/* * Report the number of tuples reclaimed to pgstats. This is - * ndeleted minus the number of newly-LP_DEAD-set items. + * presult.ndeleted minus the number of newly-LP_DEAD-set items. * * We derive the number of dead tuples like this to avoid totally * forgetting about items that were set to LP_DEAD, since they @@ -175,9 +173,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer) * tracks ndeleted, since it will set the same LP_DEAD items to * LP_UNUSED separately. */ - if (ndeleted > nnewlpdead) + if (presult.ndeleted > presult.nnewlpdead) pgstat_update_heap_dead_tuples(relation, - ndeleted - nnewlpdead); + presult.ndeleted - presult.nnewlpdead); }/* And release buffer lock */ @@ -204,24 +202,22 @@ heap_page_prune_opt(Relation relation, Buffer buffer) * (see heap_prune_satisfies_vacuum and * HeapTupleSatisfiesVacuum). * - * Sets *nnewlpdead for caller, indicating the number of items that were - * newly set LP_DEAD during prune operation. + * presult contains output parameters needed by callers such as the number of + * tuples removed and the number of line pointers newly marked LP_DEAD. + * heap_page_prune() is responsible for initializing it. * * off_loc is the current offset into the line pointer array while pruning. * This is used by vacuum to populate the error context message. On-access * pruning has no such callback mechanism for populating the error context, so * it passes NULL. When provided by the caller, off_loc is set exclusively by * heap_page_prune(). - * - * Returns the number of tuples deleted from the page during this call. */ -int +void heap_page_prune(Relation relation, Buffer buffer, GlobalVisState *vistest, - int *nnewlpdead, + PruneResult *presult, OffsetNumber *off_loc) { - int ndeleted = 0; Page page = BufferGetPage(buffer); BlockNumber blockno = BufferGetBlockNumber(buffer); OffsetNumber offnum, @@ -247,6 +243,9 @@ heap_page_prune(Relation relation, Buffer buffer, prstate.nredirected = prstate.ndead = prstate.nunused = 0; memset(prstate.marked, 0, sizeof(prstate.marked));+ presult->ndeleted = 0; + presult->nnewlpdead = 0; + maxoff = PageGetMaxOffsetNumber(page); tup.t_tableOid = RelationGetRelid(prstate.rel);@@ -321,7 +320,7 @@ heap_page_prune(Relation relation, Buffer buffer,
continue;/* Process this item or chain of items */ - ndeleted += heap_prune_chain(buffer, offnum, &prstate); + presult->ndeleted += heap_prune_chain(buffer, offnum, &prstate); }/* Clear the offset information once we have processed the given page. */
@@ -422,9 +421,7 @@ heap_page_prune(Relation relation, Buffer buffer,
END_CRIT_SECTION();/* Record number of newly-set-LP_DEAD items for caller */ - *nnewlpdead = prstate.ndead; - - return ndeleted; + presult->nnewlpdead = prstate.ndead; }diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 102cc97358..7ead9cfe9d 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1544,12 +1544,11 @@ lazy_scan_prune(LVRelState *vacrel, ItemId itemid; HeapTupleData tuple; HTSV_Result res; - int tuples_deleted, - tuples_frozen, + PruneResult presult; + int tuples_frozen, lpdead_items, live_tuples, recently_dead_tuples; - int nnewlpdead; HeapPageFreeze pagefrz; int64 fpi_before = pgWalUsage.wal_fpi; OffsetNumber deadoffsets[MaxHeapTuplesPerPage]; @@ -1572,7 +1571,6 @@ retry: pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid; pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid; pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid; - tuples_deleted = 0; tuples_frozen = 0; lpdead_items = 0; live_tuples = 0; @@ -1581,9 +1579,8 @@ retry: /* * Prune all HOT-update chains in this page. * - * We count tuples removed by the pruning step as tuples_deleted. Its - * final value can be thought of as the number of tuples that have been - * deleted from the table. It should not be confused with lpdead_items; + * We count the number of tuples removed from the page by the pruning step + * in presult.ndeleted. It should not be confused with lpdead_items; * lpdead_items's final value can be thought of as the number of tuples * that were deleted from indexes. * @@ -1591,9 +1588,7 @@ retry: * current offset when populating the error context message, so it is * imperative that we pass its location to heap_page_prune. */ - tuples_deleted = heap_page_prune(rel, buf, vacrel->vistest, - &nnewlpdead, - &vacrel->offnum); + heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum);/*
* Now scan the page to collect LP_DEAD items and check for tuples
@@ -1933,7 +1928,7 @@ retry:
}/* Finally, add page-local counts to whole-VACUUM counts */ - vacrel->tuples_deleted += tuples_deleted; + vacrel->tuples_deleted += presult.ndeleted; vacrel->tuples_frozen += tuples_frozen; vacrel->lpdead_items += lpdead_items; vacrel->live_tuples += live_tuples; diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index 6598c4d7d8..2d3f149e4f 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -191,6 +191,15 @@ typedef struct HeapPageFreeze} HeapPageFreeze;
+/* + * Per-page state returned from pruning + */ +typedef struct PruneResult +{ + int ndeleted; /* Number of tuples deleted from the page */ + int nnewlpdead; /* Number of newly LP_DEAD items */ +} PruneResult;
I think it might be worth making the names a bit less ambiguous than they
were. It's a bit odd that one has "new" in the name, the other doesn't,
despite both being about newly marked things. And "deleted" seems somewhat
ambiguous, it could also be understood as marking things LP_DEAD. Maybe
nnewunused?
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
+ int8 *htsv,
PruneState *prstate);
Hm, do we really want to pass this explicitly to a bunch of functions? Seems
like it might be better to either pass the PruneResult around or to have a
pointer in PruneState?
/* * The criteria for counting a tuple as live in this block need to @@ -1682,7 +1664,7 @@ retry: * (Cases where we bypass index vacuuming will violate this optimistic * assumption, but the overall impact of that should be negligible.) */ - switch (res) + switch ((HTSV_Result) presult.htsv[offnum]) { case HEAPTUPLE_LIVE:
I think we should assert that we have a valid HTSV_Result here, i.e. not
-1. You could wrap the cast and Assert into an inline funciton as well.
Greetings,
Andres Freund
[ sorry for the delay getting back to this ]
On Wed, Sep 13, 2023 at 1:29 PM Andres Freund <andres@anarazel.de> wrote:
I think it might be worth making the names a bit less ambiguous than they
were. It's a bit odd that one has "new" in the name, the other doesn't,
despite both being about newly marked things. And "deleted" seems somewhat
ambiguous, it could also be understood as marking things LP_DEAD. Maybe
nnewunused?
I like it the better the way Melanie did it. The current name may not
be for the best, but that can be changed some other time, in a
separate patch, if someone likes. For now, changing the name seems
like a can of worms we don't need to open; the existing names have
precedent on their side if nothing else.
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
+ int8 *htsv,
PruneState *prstate);Hm, do we really want to pass this explicitly to a bunch of functions? Seems
like it might be better to either pass the PruneResult around or to have a
pointer in PruneState?
As far as I can see, 0002 adds it to one function (heap_page_pune) and
0003 adds it to one more (heap_prune_chain). That's not much of a
bunch.
/* * The criteria for counting a tuple as live in this block need to @@ -1682,7 +1664,7 @@ retry: * (Cases where we bypass index vacuuming will violate this optimistic * assumption, but the overall impact of that should be negligible.) */ - switch (res) + switch ((HTSV_Result) presult.htsv[offnum]) { case HEAPTUPLE_LIVE:I think we should assert that we have a valid HTSV_Result here, i.e. not
-1. You could wrap the cast and Assert into an inline funciton as well.
This isn't a bad idea, although I don't find it completely necessary either.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Sep 21, 2023 at 3:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
+ int8 *htsv,
PruneState *prstate);Hm, do we really want to pass this explicitly to a bunch of functions? Seems
like it might be better to either pass the PruneResult around or to have a
pointer in PruneState?As far as I can see, 0002 adds it to one function (heap_page_pune) and
0003 adds it to one more (heap_prune_chain). That's not much of a
bunch.
I didn't read this carefully enough. Actually, heap_prune_chain() is
the *only* function that gets int8 *htsv as an argument. I don't
understand how that's a bunch ... unless there are later patches not
shown here that you're worried abot. What happens in 0002 is a
function getting PruneResult * as an argument, not int8 *htsv.
Honestly I think 0002 and 0003 are ready to commit, if you're not too
opposed to them, or if we can find some relatively small changes that
would address your objections.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Sep 21, 2023 at 3:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Sep 13, 2023 at 1:29 PM Andres Freund <andres@anarazel.de> wrote:
/* * The criteria for counting a tuple as live in this block need to @@ -1682,7 +1664,7 @@ retry: * (Cases where we bypass index vacuuming will violate this optimistic * assumption, but the overall impact of that should be negligible.) */ - switch (res) + switch ((HTSV_Result) presult.htsv[offnum]) { case HEAPTUPLE_LIVE:I think we should assert that we have a valid HTSV_Result here, i.e. not
-1. You could wrap the cast and Assert into an inline funciton as well.This isn't a bad idea, although I don't find it completely necessary either.
Attached v5 does this. Even though a value of -1 would hit the default
switch case and error out, I can see the argument for this validation
-- as all other places switching on an HTSV_Result are doing so on a
value which was always an HTSV_Result.
Once I started writing the function comment, however, I felt a bit
awkward. In order to make the function available to both pruneheap.c
and vacuumlazy.c, I had to put it in a header file. Writing a
function, available to anyone including heapam.h, which takes an int
and returns an HTSV_Result feels a bit odd. Do we want it to be common
practice to use an int value outside the valid enum range to store
"status not yet computed" for HTSV_Results?
Anyway, on a tactical note, I added the inline function to heapam.h
below the PruneResult definition since it is fairly tightly coupled to
the htsv array in PruneResult. All of the function prototypes are
under a comment that says "function prototypes for heap access method"
-- which didn't feel like an accurate description of this function. I
wonder if it makes sense to have pruneheap.c include vacuum.h and move
pruning specific stuff like this helper and PruneResult over there? I
can't remember why I didn't do this before, but maybe there is a
reason not to? I also wasn't sure if I needed to forward declare the
inline function or not.
Oh, and, one more note. I've dropped the former patch 0001 which
changed the function comment about off_loc above heap_page_prune(). I
have plans to write a separate patch adding an error context callback
for HOT pruning with the offset number and would include such a change
in that patch.
- Melanie
Attachments:
v5-0002-Reuse-heap_page_prune-tuple-visibility-statuses.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Reuse-heap_page_prune-tuple-visibility-statuses.patchDownload
From aa1ec4fa1a689f9c6328d02fd13af1a42ecb0396 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 6 Sep 2023 16:54:41 -0400
Subject: [PATCH v5 2/2] Reuse heap_page_prune() tuple visibility statuses
heap_page_prune() obtains the HTSV_Result (tuple visibility status)
returned from HeapTupleSatisfiesVacuum() for every tuple on the page and
stores them in an array. By making this array available to
heap_page_prune()'s caller lazy_scan_prune(), we can avoid an additional
call to HeapTupleSatisfiesVacuum() when freezing the tuples and
recording LP_DEAD items for vacuum. This saves resources and eliminates
the possibility that vacuuming corrupted data results in a hang due to
endless retry looping.
This replaces the retry mechanism introduced in 8523492d4 to handle cases in
which a tuple's inserting transaction aborted between the visibility check in
heap_page_prune() and lazy_scan_prune()'s call to HeapTupleSatisfiesVacuum() --
rendering it dead but without a dead line pointer. We can instead reuse the
tuple's original visibility status, circumventing any disagreements.
Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 36 +++++++++++-------------
src/backend/access/heap/vacuumlazy.c | 42 ++++++++--------------------
src/include/access/heapam.h | 25 +++++++++++++++++
3 files changed, 54 insertions(+), 49 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d5892a2db4..c5f1abd95a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -53,16 +53,6 @@ typedef struct
* 1. Otherwise every access would need to subtract 1.
*/
bool marked[MaxHeapTuplesPerPage + 1];
-
- /*
- * Tuple visibility is only computed once for each tuple, for correctness
- * and efficiency reasons; see comment in heap_page_prune() for details.
- * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
- * indicate no visibility has been computed, e.g. for LP_DEAD items.
- *
- * Same indexing as ->marked.
- */
- int8 htsv[MaxHeapTuplesPerPage + 1];
} PruneState;
/* Local functions */
@@ -71,6 +61,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
Buffer buffer);
static int heap_prune_chain(Buffer buffer,
OffsetNumber rootoffnum,
+ int8 *htsv,
PruneState *prstate);
static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
static void heap_prune_record_redirect(PruneState *prstate,
@@ -240,6 +231,10 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
+ /*
+ * presult->htsv is not initialized here because all ntuple spots in the
+ * array will be set either to a valid HTSV_Result value or -1.
+ */
presult->ndeleted = 0;
presult->nnewlpdead = 0;
@@ -276,7 +271,7 @@ heap_page_prune(Relation relation, Buffer buffer,
/* Nothing to do if slot doesn't contain a tuple */
if (!ItemIdIsNormal(itemid))
{
- prstate.htsv[offnum] = -1;
+ presult->htsv[offnum] = -1;
continue;
}
@@ -292,8 +287,8 @@ heap_page_prune(Relation relation, Buffer buffer,
if (off_loc)
*off_loc = offnum;
- prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
- buffer);
+ presult->htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
+ buffer);
}
/* Scan the page */
@@ -317,7 +312,8 @@ heap_page_prune(Relation relation, Buffer buffer,
continue;
/* Process this item or chain of items */
- presult->ndeleted += heap_prune_chain(buffer, offnum, &prstate);
+ presult->ndeleted += heap_prune_chain(buffer, offnum,
+ presult->htsv, &prstate);
}
/* Clear the offset information once we have processed the given page. */
@@ -446,6 +442,8 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
/*
* Prune specified line pointer or a HOT chain originating at line pointer.
*
+ * Tuple visibility information is provided in htsv.
+ *
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
* chain. We also prune any RECENTLY_DEAD tuples preceding a DEAD tuple.
@@ -473,7 +471,8 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Returns the number of tuples (to be) deleted from the page.
*/
static int
-heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
+heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
+ int8 *htsv, PruneState *prstate)
{
int ndeleted = 0;
Page dp = (Page) BufferGetPage(buffer);
@@ -494,7 +493,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
*/
if (ItemIdIsNormal(rootlp))
{
- Assert(prstate->htsv[rootoffnum] != -1);
+ Assert(htsv[rootoffnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, rootlp);
if (HeapTupleHeaderIsHeapOnly(htup))
@@ -517,7 +516,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
* either here or while following a chain below. Whichever path
* gets there first will mark the tuple unused.
*/
- if (prstate->htsv[rootoffnum] == HEAPTUPLE_DEAD &&
+ if (htsv[rootoffnum] == HEAPTUPLE_DEAD &&
!HeapTupleHeaderIsHotUpdated(htup))
{
heap_prune_record_unused(prstate, rootoffnum);
@@ -585,7 +584,6 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
break;
Assert(ItemIdIsNormal(lp));
- Assert(prstate->htsv[offnum] != -1);
htup = (HeapTupleHeader) PageGetItem(dp, lp);
/*
@@ -605,7 +603,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
*/
tupdead = recent_dead = false;
- switch ((HTSV_Result) prstate->htsv[offnum])
+ switch (htsv_get_valid_status(htsv[offnum]))
{
case HEAPTUPLE_DEAD:
tupdead = true;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fa77ef7f4a..b8268a5cef 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1524,12 +1524,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
* of complexity just so we could deal with tuples that were DEAD to VACUUM,
* but nevertheless were left with storage after pruning.
*
- * The approach we take now is to restart pruning when the race condition is
- * detected. This allows heap_page_prune() to prune the tuples inserted by
- * the now-aborted transaction. This is a little crude, but it guarantees
- * that any items that make it into the dead_items array are simple LP_DEAD
- * line pointers, and that every remaining item with tuple storage is
- * considered as a candidate for freezing.
+ * As of Postgres 17, we circumvent this problem altogether by reusing the
+ * result of heap_page_prune()'s visibility check. Without the second call to
+ * HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and there can be no
+ * disagreement. The tuple's TID won't be added to the array of dead tuple TIDs
+ * for vacuum, thus vacuum will never be tasked with reaping a tuple with
+ * storage.
*/
static void
lazy_scan_prune(LVRelState *vacrel,
@@ -1542,8 +1542,6 @@ lazy_scan_prune(LVRelState *vacrel,
OffsetNumber offnum,
maxoff;
ItemId itemid;
- HeapTupleData tuple;
- HTSV_Result res;
PruneResult presult;
int tuples_frozen,
lpdead_items,
@@ -1563,8 +1561,6 @@ lazy_scan_prune(LVRelState *vacrel,
*/
maxoff = PageGetMaxOffsetNumber(page);
-retry:
-
/* Initialize (or reset) page-level state */
pagefrz.freeze_required = false;
pagefrz.FreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
@@ -1600,6 +1596,7 @@ retry:
offnum <= maxoff;
offnum = OffsetNumberNext(offnum))
{
+ HeapTupleHeader htup;
bool totally_frozen;
/*
@@ -1642,22 +1639,7 @@ retry:
Assert(ItemIdIsNormal(itemid));
- ItemPointerSet(&(tuple.t_self), blkno, offnum);
- tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
- tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(rel);
-
- /*
- * DEAD tuples are almost always pruned into LP_DEAD line pointers by
- * heap_page_prune(), but it's possible that the tuple state changed
- * since heap_page_prune() looked. Handle that here by restarting.
- * (See comments at the top of function for a full explanation.)
- */
- res = HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
- buf);
-
- if (unlikely(res == HEAPTUPLE_DEAD))
- goto retry;
+ htup = (HeapTupleHeader) PageGetItem(page, itemid);
/*
* The criteria for counting a tuple as live in this block need to
@@ -1678,7 +1660,7 @@ retry:
* (Cases where we bypass index vacuuming will violate this optimistic
* assumption, but the overall impact of that should be negligible.)
*/
- switch (res)
+ switch (htsv_get_valid_status(presult.htsv[offnum]))
{
case HEAPTUPLE_LIVE:
@@ -1700,7 +1682,7 @@ retry:
{
TransactionId xmin;
- if (!HeapTupleHeaderXminCommitted(tuple.t_data))
+ if (!HeapTupleHeaderXminCommitted(htup))
{
prunestate->all_visible = false;
break;
@@ -1710,7 +1692,7 @@ retry:
* The inserter definitely committed. But is it old enough
* that everyone sees it as committed?
*/
- xmin = HeapTupleHeaderGetXmin(tuple.t_data);
+ xmin = HeapTupleHeaderGetXmin(htup);
if (!TransactionIdPrecedes(xmin,
vacrel->cutoffs.OldestXmin))
{
@@ -1764,7 +1746,7 @@ retry:
prunestate->hastup = true; /* page makes rel truncation unsafe */
/* Tuple with storage -- consider need to freeze */
- if (heap_prepare_freeze_tuple(tuple.t_data, &vacrel->cutoffs, &pagefrz,
+ if (heap_prepare_freeze_tuple(htup, &vacrel->cutoffs, &pagefrz,
&frozen[tuples_frozen], &totally_frozen))
{
/* Save prepared freeze plan for later */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2d3f149e4f..62fac1d5d2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,8 +198,33 @@ typedef struct PruneResult
{
int ndeleted; /* Number of tuples deleted from the page */
int nnewlpdead; /* Number of newly LP_DEAD items */
+
+ /*
+ * Tuple visibility is only computed once for each tuple, for correctness
+ * and efficiency reasons; see comment in heap_page_prune() for details.
+ * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
+ * indicate no visibility has been computed, e.g. for LP_DEAD items.
+ *
+ * This needs to be MaxHeapTuplesPerPage + 1 long as FirstOffsetNumber is
+ * 1. Otherwise every access would need to subtract 1.
+ */
+ int8 htsv[MaxHeapTuplesPerPage + 1];
} PruneResult;
+/*
+ * Pruning calculates tuple visibility once and saves the results in an array
+ * of int8. See PruneResult.htsv for details. This helper function is meant to
+ * guard against examining visibility status array members which have not yet
+ * been computed.
+ */
+static inline HTSV_Result
+htsv_get_valid_status(int status)
+{
+ Assert(status >= HEAPTUPLE_DEAD &&
+ status <= HEAPTUPLE_DELETE_IN_PROGRESS);
+ return (HTSV_Result) status;
+}
+
/* ----------------
* function prototypes for heap access method
*
--
2.37.2
v5-0001-Move-heap_page_prune-output-parameters-into-struc.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Move-heap_page_prune-output-parameters-into-struc.patchDownload
From 430ff3458f0ac523f1c623d9bc2669981cc0e536 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Wed, 6 Sep 2023 14:57:20 -0400
Subject: [PATCH v5 1/2] Move heap_page_prune output parameters into struct
Add PruneResult, a structure containing the output parameters and result
of heap_page_prune(). Reorganizing the results of heap_page_prune() into
a struct simplifies the function signature and provides a location for
future commits to store additional output parameters.
Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 33 +++++++++++++---------------
src/backend/access/heap/vacuumlazy.c | 17 +++++---------
src/include/access/heapam.h | 13 +++++++++--
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 33 insertions(+), 31 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 18193efa23..d5892a2db4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -155,15 +155,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
{
- int ndeleted,
- nnewlpdead;
+ PruneResult presult;
- ndeleted = heap_page_prune(relation, buffer, vistest,
- &nnewlpdead, NULL);
+ heap_page_prune(relation, buffer, vistest, &presult, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
- * ndeleted minus the number of newly-LP_DEAD-set items.
+ * presult.ndeleted minus the number of newly-LP_DEAD-set items.
*
* We derive the number of dead tuples like this to avoid totally
* forgetting about items that were set to LP_DEAD, since they
@@ -175,9 +173,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* tracks ndeleted, since it will set the same LP_DEAD items to
* LP_UNUSED separately.
*/
- if (ndeleted > nnewlpdead)
+ if (presult.ndeleted > presult.nnewlpdead)
pgstat_update_heap_dead_tuples(relation,
- ndeleted - nnewlpdead);
+ presult.ndeleted - presult.nnewlpdead);
}
/* And release buffer lock */
@@ -204,21 +202,19 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* (see heap_prune_satisfies_vacuum and
* HeapTupleSatisfiesVacuum).
*
- * Sets *nnewlpdead for caller, indicating the number of items that were
- * newly set LP_DEAD during prune operation.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
- * Returns the number of tuples deleted from the page during this call.
+ * presult contains output parameters needed by callers such as the number of
+ * tuples removed and the number of line pointers newly marked LP_DEAD.
+ * heap_page_prune() is responsible for initializing it.
*/
-int
+void
heap_page_prune(Relation relation, Buffer buffer,
GlobalVisState *vistest,
- int *nnewlpdead,
+ PruneResult *presult,
OffsetNumber *off_loc)
{
- int ndeleted = 0;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
OffsetNumber offnum,
@@ -244,6 +240,9 @@ heap_page_prune(Relation relation, Buffer buffer,
prstate.nredirected = prstate.ndead = prstate.nunused = 0;
memset(prstate.marked, 0, sizeof(prstate.marked));
+ presult->ndeleted = 0;
+ presult->nnewlpdead = 0;
+
maxoff = PageGetMaxOffsetNumber(page);
tup.t_tableOid = RelationGetRelid(prstate.rel);
@@ -318,7 +317,7 @@ heap_page_prune(Relation relation, Buffer buffer,
continue;
/* Process this item or chain of items */
- ndeleted += heap_prune_chain(buffer, offnum, &prstate);
+ presult->ndeleted += heap_prune_chain(buffer, offnum, &prstate);
}
/* Clear the offset information once we have processed the given page. */
@@ -419,9 +418,7 @@ heap_page_prune(Relation relation, Buffer buffer,
END_CRIT_SECTION();
/* Record number of newly-set-LP_DEAD items for caller */
- *nnewlpdead = prstate.ndead;
-
- return ndeleted;
+ presult->nnewlpdead = prstate.ndead;
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b3c24d68f0..fa77ef7f4a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1544,12 +1544,11 @@ lazy_scan_prune(LVRelState *vacrel,
ItemId itemid;
HeapTupleData tuple;
HTSV_Result res;
- int tuples_deleted,
- tuples_frozen,
+ PruneResult presult;
+ int tuples_frozen,
lpdead_items,
live_tuples,
recently_dead_tuples;
- int nnewlpdead;
HeapPageFreeze pagefrz;
int64 fpi_before = pgWalUsage.wal_fpi;
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
@@ -1572,7 +1571,6 @@ retry:
pagefrz.FreezePageRelminMxid = vacrel->NewRelminMxid;
pagefrz.NoFreezePageRelfrozenXid = vacrel->NewRelfrozenXid;
pagefrz.NoFreezePageRelminMxid = vacrel->NewRelminMxid;
- tuples_deleted = 0;
tuples_frozen = 0;
lpdead_items = 0;
live_tuples = 0;
@@ -1581,15 +1579,12 @@ retry:
/*
* Prune all HOT-update chains in this page.
*
- * We count tuples removed by the pruning step as tuples_deleted. Its
- * final value can be thought of as the number of tuples that have been
- * deleted from the table. It should not be confused with lpdead_items;
+ * We count the number of tuples removed from the page by the pruning step
+ * in presult.ndeleted. It should not be confused with lpdead_items;
* lpdead_items's final value can be thought of as the number of tuples
* that were deleted from indexes.
*/
- tuples_deleted = heap_page_prune(rel, buf, vacrel->vistest,
- &nnewlpdead,
- &vacrel->offnum);
+ heap_page_prune(rel, buf, vacrel->vistest, &presult, &vacrel->offnum);
/*
* Now scan the page to collect LP_DEAD items and check for tuples
@@ -1929,7 +1924,7 @@ retry:
}
/* Finally, add page-local counts to whole-VACUUM counts */
- vacrel->tuples_deleted += tuples_deleted;
+ vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += tuples_frozen;
vacrel->lpdead_items += lpdead_items;
vacrel->live_tuples += live_tuples;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 6598c4d7d8..2d3f149e4f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -191,6 +191,15 @@ typedef struct HeapPageFreeze
} HeapPageFreeze;
+/*
+ * Per-page state returned from pruning
+ */
+typedef struct PruneResult
+{
+ int ndeleted; /* Number of tuples deleted from the page */
+ int nnewlpdead; /* Number of newly LP_DEAD items */
+} PruneResult;
+
/* ----------------
* function prototypes for heap access method
*
@@ -284,9 +293,9 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
-extern int heap_page_prune(Relation relation, Buffer buffer,
+extern void heap_page_prune(Relation relation, Buffer buffer,
struct GlobalVisState *vistest,
- int *nnewlpdead,
+ PruneResult *presult,
OffsetNumber *off_loc);
extern void heap_page_prune_execute(Buffer buffer,
OffsetNumber *redirected, int nredirected,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b5bbdd1608..8de90c4958 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2151,6 +2151,7 @@ ProjectionPath
PromptInterruptContext
ProtocolVersion
PrsStorage
+PruneResult
PruneState
PruneStepResult
PsqlScanCallbacks
--
2.37.2
On Thu, Sep 28, 2023 at 8:46 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
Once I started writing the function comment, however, I felt a bit
awkward. In order to make the function available to both pruneheap.c
and vacuumlazy.c, I had to put it in a header file. Writing a
function, available to anyone including heapam.h, which takes an int
and returns an HTSV_Result feels a bit odd. Do we want it to be common
practice to use an int value outside the valid enum range to store
"status not yet computed" for HTSV_Results?
I noticed the awkwardness of that return convention when I reviewed
the first version of this patch, but I decided it wasn't worth
spending time discussing. To avoid it, we'd either need to add a new
HTSV_Result that is only used here, or add a new type
HTSV_Result_With_An_Extra_Value and translate between the two, or pass
back a boolean + an enum instead of an array of int8. And all of those
seem to me to suck -- the first two are messy and the third would make
the return value much wider. So, no, I don't really like this, but
also, what would actually be any better? Also, IMV at least, it's more
of an issue of it being sort of ugly than of anything becoming common
practice, because how many callers of heap_page_prune() are there ever
going to be? AFAIK, we've only ever had two since forever, and even if
we grow one or two more at some point, that's still not that many.
I went ahead and committed 0001. If Andres still wants to push for
more renaming there, that can be a follow-up patch. And we can see if
he or anyone else has any comments on this new version of 0002. To me
we're down into the level of details that probably don't matter very
much one way or the other, but others may disagree.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
On 2023-09-28 11:25:04 -0400, Robert Haas wrote:
I went ahead and committed 0001. If Andres still wants to push for
more renaming there, that can be a follow-up patch.
Agreed.
And we can see if he or anyone else has any comments on this new version of
0002. To me we're down into the level of details that probably don't matter
very much one way or the other, but others may disagree.
The only thought I have is that it might be worth to amend the comment in
lazy_scan_prune() to mention that such a tuple won't need to be frozen,
because it was visible to another session when vacuum started.
Greetings,
Andres Freund
On Sat, Sep 30, 2023 at 1:02 PM Andres Freund <andres@anarazel.de> wrote:
The only thought I have is that it might be worth to amend the comment in
lazy_scan_prune() to mention that such a tuple won't need to be frozen,
because it was visible to another session when vacuum started.
I revised the comment a bit, incorporating that language, and committed.
--
Robert Haas
EDB: http://www.enterprisedb.com